Data Mining, Discovery, and Exploration

Harvard Summer School

CSCI S-108

Section 1

CRN 35576

View Course Details
Extracting actionable insights and relationships from massive complex data sets is the domain of data mining. Data mining has wide-ranging applications in science and technology. These include web search, interactions in social networks, recommender systems, processing signals in large internet-of-things (IoT) sensor networks, image search, genetic analysis, and discovery of interactions between drugs. This course surveys a range of unsupervised learning algorithms for data mining. The emphasis is on graph algorithms and scaling for massive datasets. The course comprises readings and lectures on theory along with hands-on exercises and projects where students apply the theory through Python coding. For the hands-on component of the course a variety of libraries in the Python language, including possibly Scikit-Learn, NetworkX, Neo4J, Scikit-Learn-Extra, Mlextend, and Surprise are used. Students enrolled for graduate credit are required to perform, present, and report on an independent project. This project must demonstrate a mastery of methods covered in the course as applied to a suitable real-world data set.

Instructor Info

Stephen Elston, PhD

Principal Consultant, Quantia Analytics LLC


Meeting Info

MW 6:30pm - 9:30pm (6/24 - 8/9)

Participation Option: Online Asynchronous or Online Synchronous

In online asynchronous courses, you are not required to attend class at a particular time. Instead you can complete the course work on your own schedule each week.

Deadlines

Last day to register: June 20, 2024

Additional Time Commitments

Optional sections to be arranged.

Prerequisites

Students enrolling in this course are expected to have some background in Python programming equivalent to CSCI S-7 or CSCI S-29 and exposure to basic machine learning and data science methods, equivalent to CSCI S-101. For those with limited Python programming experience, some experience programming, in any language, such as R, Matlab, or ++, is essential. Knowledge of basic linear algebra, including eigenvalue-eigenvector decomposition and some differential and integral calculus, equivalent to MATH S-21a, is essential.

Notes

This course meets via web conference. Students may attend at the scheduled meeting time or watch recorded sessions asynchronously. The recorded sessions are typically available within a few hours of the end of class and no later than the following business day. Open to admitted Secondary School Program students by petition.

Syllabus

All Sections of this Course

CRN Section # Participation Option(s) Instructor Section Status Meets Term Dates
35576 1 Online Asynchronous, Online Synchronous Stephen Elston Field not found in response. MW 6:30pm - 9:30pm
Jun 24 to Aug 9
26492 1 Online Asynchronous, Online Synchronous Stephen Elston Open W 6:00pm - 8:00pm
Jan 27 to May 17