Elements of Data Science and Statistical Learning with R
Harvard Summer School
CSCI S-63C
Section 1
CRN 34799
One of the broad goals of data science is examining raw data with the purpose of identifying their structure and trends, and deriving conclusions and hypotheses from the latter. In the modern world awash with data, data analytics is more important than ever to fields ranging from biomedical research, space and weather science, finance, business operations, and production, through marketing and social media applications. This course provides an intensive introduction into various statistical learning methods; the R programming language, a very popular and powerful platform for scientific and statistical analysis and visualization, is also introduced and used throughout the course. We discuss the fundamentals of statistical testing and learning, and cover topics of linear and non-linear regression, regularization, unsupervised methods (principle component analysis [PCA] and clustering), and supervised classification, including support vector machines, random forests, and neural nets, using datasets drawn from diverse domains. This course is geared less toward theory (although some is presented, mostly qualitatively), and more toward developing intuition and the right way of thinking about statistical problems, as well as building practical skills through multiple, incremental assignments and extensive experimentation.
Registration Closes: June 20, 2024
Credits: 4
View Tuition Information Term
Summer Term 2024
Part of Term
Full Term
Format
Flexible Attendance Web Conference
Credit Status
Graduate
Section Status
Open