Data Science Projects at CU Boulder

Description

This course explores concepts and techniques for design, formulation and execution of practical, applied data science. Topics covered will include experimental design, statistical analysis and predictive modeling, machine learning, data visualization, scientific writing and presentation. This course will involve a semester-long project to acquire, analyze, and understand data in support of a research question. In addition to traditional lectures, students will read and learn from published papers on data science topics, guest lectures from expert data scientists in the field, and group discussion.

Each class will be one of (1) Lecture (2) Guest Lecture (3) Reading Discussion (Seminar) or (4) Project Meetings or Presentations. In the beginning of the course there is a greater emphasis on skills development through reading and lectures, while later in the course, more time will be spent on project work. Students are expected to be prepared for discussion in seminar classes, and attendance is a required part of the grade. All students are solely responsible for their independent project and must complete all the work themselves. The end result will be a research paper, presentation and software repository summarizing the work done. A collected version of the resulting research papers will be published through the computer science deparment technical reports.

As this course includes both graduate and undergraduate students, the expectations are somewhat higher for graduate students. Graduate students are expected to choose a project with some novelty and perform a literature review in order to provide context for their work. The resulting project will have a greater depth of analysis. Graduate students are welcome to choose a project relevant to their own research, so long as it is on-topic for the course. Students attending for audit or as Pass/Fail will have the same expectations as other students attending for an A-F grade.

Prerequisites

Competent programming and scientific reasoning are necessary - prerequisite courses of CSCI 3104 and (APPM 3310 or CSCI 2820 or MATH 2130 or MATH 2135) (all minimum grade C-).

Beyond this, a working understanding of statistics and probability is essential to understanding much of the course material.

A strong practice of curiousity and a certain degree of stubbornness is invaluable in applied research.

Modern data science is polyglottal, but the lingua franca is Python. Those with some working familiarity with Python (or a similar language like Ruby) or R will be most successful.

Computers

Access to a computer will be required for course assignments and projects. The topics covered will be most approachable on a Unix or Unix-similar computer. The CS department has ample computing resources, particularly in the CSEL, which you are encouraged to make use of. If you are concerned about access to computing resources capable of completing the assignments, please let us know.

Textbook

There is no required text for this class. The following recommended texts is available for free (the former through the CU library, the latter on github.com):

All other readings are provided for download via the course website.

Grades

Grading will be based upon attendance and participation, quality of execution on assignments, and final project outcomes. A detailed rubric will be developed throughout the course, but a general breakdown is as follows:

Students who need a particular grade for their degree requirements are encouraged to inform the instructor so that extra-credit assignments might be assessed to bring a grade above threshold. There is absolutely no reason to get a grade below the one you desire. Note that that extra-credit work may be fairly complex or laborious as compared to the required course work. Because of that, I don’t advise relying on extra-credit work to bolster your grade, unless needed, or you’re simply interested in doing additional work. Students who are auditing the course are similarly encourage to discuss goals with the instructor.

Additional notes:

Policies

Academic Integrity

All students of the University of Colorado at Boulder are responsible for knowing and adhering to the academic integrity policy of this institution. Violations of this policy may include: cheating, plagiarism, aid of academic dishonesty, fabrication, lying, bribery, and threatening behavior. All incidents of academic misconduct shall be reported to the Honor Code Council (honor@colorado.edu; 303-735-2273). Students who are found to be in violation of the academic integrity policy will be subject to both academic sanctions from the faculty member and non-academic sanctions (including but not limited to university probation, suspension, or expulsion). Other information on the Honor Code can be found at http://www.colorado.edu/policies/honor.html and at http://www.colorado.edu/academics/honorcode/.

Classroom Behavior

Students and faculty each have responsibility for maintaining an appropriate learning environment. Those who fail to adhere to such behavioral standards may be subject to discipline. Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with differences of race, culture, religion, politics, sexual orientation, gender, gender variance, and nationalities. Class rosters are provided to the instructor with the student’s legal name. I will gladly honor your request to address you by an alternate name or gender pronoun. Please advise me of this preference early in the semester so that I may make appropriate changes to my records. See policies at http://www.colorado.edu/policies/classbehavior.html and at http://www.colorado.edu/studentaffairs/judicialaffairs/code.html#student_code

Disability

If you qualify for accommodations because of a disability, please submit to me a letter from Disability Services in a timely manner so that your needs be addressed. Disability Services determines accommodations based on documented disabilities. Contact: 303-492-8671, Willard 322, and http://www.Colorado.EDU/disabilityservices

Discrimination and Harassment

The University of Colorado at Boulder policy on Discrimination and Harassment, the University of Colorado policy on Sexual Harassment and the University of Colorado policy on Amorous Relationships apply to all students, staff and faculty. Any student, staff or faculty member who believes s/he has been the subject of discrimination or harassment based upon race, color, national origin, sex, age, disability, religion, sexual orientation, or veteran status should contact the Office of Discrimination and Harassment (ODH) at 303-492-2127 or the Office of Judicial Affairs at 303-492-5550. Information about the ODH, the above referenced policies and the campus resources available to assist individuals regarding discrimination or harassment can be obtained at http://www.colorado.edu/odh

Religious Observance

Campus policy regarding religious observances requires that faculty make every effort to deal reasonably and fairly with all students who, because of religious obligations, have conflicts with scheduled exams, assignments or required attendance. See full details at http://www.colorado.edu/policies/fac_relig.html