Introduction to Natural Language Processing
Harvard Extension School
CSCI E-89B
Section 1
CRN 17133
Students are introduced to modern techniques of natural language processing (NLP) and learn foundations of text classification, named entity recognition, parsing, language modeling including text generation, topic modeling, and machine translation. Methods for representing text as data studied in the course are tokenization, n-grams, bag of words, term frequency-inverse document frequency (TD-IDF) weighting, word embeddings like Word2Vec and GloVe, autoencoders, t-SNE, character embeddings, and topic modeling. The machine learning algorithms for NLP covered in the course are recurrent neural networks (RNNs) including long short-term memory (LSTM), conditional random fields (CRFs), bidirectional LSTM with a CRF (BiLSTM-CRF), generative adversarial networks (GANs), attention models, transformers, bidirectional encoder representations from transformers (BERT), latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), and structural topic modeling (STM). Students get hands-on experience using both Python and R.
Credits: 4
View Tuition InformationTerm
Fall Term 2025
Part of Term
Full Term
Format
Flexible Attendance Web Conference
Credit Status
Graduate, Noncredit, Undergraduate
Section Status
Open