Data Engineering for Analytics to Solve Business Challenges

Harvard Extension School

CSCI E-103

Section 1

CRN 16694

View Course Details
In today's world, data is generated at an ever-increasing rate. The analytic platforms need to match this pace of generated data, digest it, and generate useful insights. The best decisions are made with informed data and as it changes, one needs to follow the signals and indicators embedded in the data. The technology space is evolving rapidly and choosing the right technology fit for the data at hand is an important decision. The next decision is to select the best architecture to provide the solution for technical challenges and helps the business improve its growth, revenue, and time to market. Spark provides a swiss army knife to handle the entire data life cycle, from ingestion to consumption. Newer offerings from the open source community around Delta and MLFlow help strengthen the data platform by making it performant, reliable, and repeatable. Often, innovation is left in proof of concept stages and does not see production because of the lack of foundational architectural components necessary for hardened and mature enterprise-grade deployments. This lost innovation translates to lost revenue and missed opportunities. This course helps students to appreciate the power of technology and skillfully apply it in practical situations in the real world. It leverages the Databricks platform on Amazon web services (AWS) to simplify the cluster setup so that students can focus on the data engineering aspects of getting the data ready for analytics.

Instructor Info

Eric Gieseke, ALM

Chief Executive Officer and Founder, Pago Capital


Anindita Mahapatra, ALM

Solutions Architect, Databricks


Meeting Info

T 5:30pm - 7:30pm (9/3 - 12/21)

Participation Option: Online Asynchronous or Online Synchronous

In online asynchronous courses, you are not required to attend class at a particular time. Instead you can complete the course work on your own schedule each week.

Deadlines

Last day to register: August 29, 2024

Additional Time Commitments

Required sections Thursdays, 6-7 pm.

Prerequisites

Familiarity with Amazon Web Services, structured query language (SQL), and Python. Some experience with big data, Spark, and data stores is good to have.

Notes

This course meets via web conference. Students may attend at the scheduled meeting time or watch recorded sessions asynchronously. Recorded sessions are typically available within a few hours of the end of class and no later than the following business day.

Syllabus

All Sections of this Course

CRN Section # Participation Option(s) Instructor Section Status Meets Term Dates
16694 1 Online Asynchronous, Online Synchronous Team Taught Open T 5:30pm - 7:30pm
Sep 3 to Dec 21