Data is the new gold of the modern age. It affects all aspects of business and everyday lives: social media, communication, financial and health data, web and application logs, security, and threat mitigation—all rely on the ability to collect, process, and analyze terabytes and petabytes from numerous data sources. Modern cloud-based frameworks and infrastructure serve as a foundation and an enabler for most services. In this course, students learn how to navigate this extraordinarily diverse and fast-changing field through popular tools and frameworks to process and analyze data, such as Spark 3 and related application programming interfaces (APIs) and frameworks (Spark Core, Spark SQL, Spark MLLib, and GraphX). We cover the basics of machine learning and deploying models to the cloud, including how to design and organize data using modern distributed data storage options (such as Redshift and BigQuery); elements of data lakes and data warehouse design and evolution to data mesh architectures; trends in unified data analytics and modern data stack frameworks; and integration with business intelligence (BI) tools for data visualization (Looker or Amazon Web Services [AWS] Quicksight). We work hands-on with many of the above frameworks on AWS and Google Cloud Platform (GCP) cloud. We primarily use Python for those assignments that require programming.
Registration Closes: January 23, 2025
Credits: 4
View Tuition Information Term
Spring Term 2025
Part of Term
Full Term
Format
Flexible Attendance Web Conference
Credit Status
Graduate, Noncredit, Undergraduate
Section Status
Open