Elements of Data Science and Statistical Learning with R

Harvard Summer School

CSCI S-63C

Section 1

CRN 34799

View Course Details
One of the broad goals of data science is examining raw data with the purpose of identifying their structure and trends, and deriving conclusions and hypotheses from the latter. In the modern world awash with data, data analytics is more important than ever to fields ranging from biomedical research, space and weather science, finance, business operations, and production, through marketing and social media applications. This course provides an intensive introduction into various statistical learning methods; the R programming language, a very popular and powerful platform for scientific and statistical analysis and visualization, is also introduced and used throughout the course. We discuss the fundamentals of statistical testing and learning, and cover topics of linear and non-linear regression, regularization, unsupervised methods (principle component analysis [PCA] and clustering), and supervised classification, including support vector machines, random forests, and neural nets, using datasets drawn from diverse domains. This course is geared less toward theory (although some is presented, mostly qualitatively), and more toward developing intuition and the right way of thinking about statistical problems, as well as building practical skills through multiple, incremental assignments and extensive experimentation.

Instructor Info

Andrey Sivachenko, PhD

Scientist IV, Head of Bioinformatics, Cystic Fibrosis Foundation Therapeutics Lab


Meeting Info

TTh 6:30pm - 9:30pm (6/24 - 8/9)

Participation Option: Online Asynchronous or Online Synchronous

In online asynchronous courses, you are not required to attend class at a particular time. Instead you can complete the course work on your own schedule each week.

Deadlines

Last day to register: June 20, 2024

Prerequisites

Good programming skills, preferably in R or solid experience in other languages; good understanding of probability and statistics at the level of CSCI E-106 or STAT E-109. See the syllabus for the recommended pretest.

Notes

This course meets via web conference. Students may attend at the scheduled meeting time or watch recorded sessions asynchronously. The recorded sessions are typically available within a few hours of the end of class and no later than the following business day. Not open to Secondary School Program students.

Syllabus

All Sections of this Course

CRN Section # Participation Option(s) Instructor Section Status Meets Term Dates
15123 1 Online Asynchronous, Online Synchronous Team Taught Open T 8:10pm - 10:10pm
Sep 3 to Dec 21
34799 1 Online Asynchronous, Online Synchronous Andrey Sivachenko Open TTh 6:30pm - 9:30pm
Jun 24 to Aug 9
24748 1 Online Asynchronous, Online Synchronous Team Taught Open Th 7:40pm - 9:40pm
Jan 27 to May 17