Introduction to Natural Language Processing

Harvard Extension School

CSCI E-89B

Section 1

CRN 17133

View Course Details
Students are introduced to modern techniques of natural language processing (NLP) and learn foundations of text classification, named entity recognition, parsing, language modeling including text generation, topic modeling, and machine translation. Methods for representing text as data studied in the course are tokenization, n-grams, bag of words, term frequency-inverse document frequency (TD-IDF) weighting, word embeddings like Word2Vec and GloVe, autoencoders, t-SNE, character embeddings, and topic modeling. The machine learning algorithms for NLP covered in the course are recurrent neural networks (RNNs) including long short-term memory (LSTM), conditional random fields (CRFs), bidirectional LSTM with a CRF (BiLSTM-CRF), generative adversarial networks (GANs), attention models, transformers, bidirectional encoder representations from transformers (BERT), latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), and structural topic modeling (STM). Students get hands-on experience using both Python and R.

Instructor Info

Dmitry V. Kurochkin, PhD

Senior Research Analyst, Faculty of Arts and Sciences Office for Faculty Affairs, Harvard University


Meeting Info

M 8:10pm - 10:10pm (9/3 - 12/21)

Participation Option: Online Asynchronous or Online Synchronous

In online asynchronous courses, you are not required to attend class at a particular time. Instead you can complete the course work on your own schedule each week.

Deadlines

Last day to register: August 29, 2024

Additional Time Commitments

Optional sections Fridays, time to be arranged.

Prerequisites

Students are expected to have taken Python programming course equivalent to CSCI E-7. Most of the problems will be solved in Python. The structural topic modeling will be performed using the 'stm' R package. Prior programming experience in R is helpful, but not required. In addition, basic knowledge of calculus, probability, and statistics is expected. Students need to have access to a computer with a 64-bit operating system and at least 8 GB of RAM. GPU is highly recommended.

Notes

This course meets via web conference. Students may attend at the scheduled meeting time or watch recorded sessions asynchronously. Recorded sessions are typically available within a few hours of the end of class and no later than the following business day.

All Sections of this Course

CRN Section # Participation Option(s) Instructor Section Status Meets Term Dates
17133 1 Online Asynchronous, Online Synchronous Dmitry Kurochkin Open M 8:10pm - 10:10pm
Sep 3 to Dec 21