Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"

Language: TeX

CS 4650 and 7650

  • Course: Natural Language Understanding
  • Instructor: Jacob Eisenstein
  • Semester: Fall 2015
  • Time: Mondays and Wednesdays, 3:05-4:25pm
  • TAs: TBD
  • Schedule
  • Grading
  • Policies (including office hours)

This course gives an overview of modern statistical techniques for analyzing natural language. The rough organization is to move from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phemonena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.

Learning goals

  • Acquire the fundamental linguistic concepts that are relevant to language technology. This goal will be assessed in the short homework assignments, midterm, and class participation.
  • Analyze and understand state-of-the-art algorithms and statistical techniques for reasoning about linguistic data. This goal will be assessed in the midterm, the assigned projects, and class participation.
  • Implement state-of-the-art algorithms and statistical techniques for reasoning about linguistic data. This goal will be assessed in the assigned and independent projects.
  • Adapt and apply state-of-the-art language technology to new problems and settings. This goal will be assessed in the independent project.
  • (7650 only) Read and understand current research on natural language processing. This goal will be assessed in assigned projects and classroom participation.

The assignments, readings, and schedule are subject to change, but I will try to give as much advance notice as possible.


Readings will be drawn from my notes, from published papers and tutorials, and from the following two texts:

Supplemental textbooks

These are completely optional, but might deepen your understanding of the material.


The official prerequisite for CS 4650 is CS 3510/3511, "Design and Analysis of Algorithms." This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.

Furthermore, this course assumes:

  • Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
  • Background in basic probability, linear algebra, and calculus.
  • Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.

People sometimes want to take the course without having all of these prerequisites. Frequent cases are:

  • Junior CS students with strong programming skills but limited theoretical and mathematical background,
  • Non-CS students with strong mathematical background but limited programming experience.

Students in the first group suffer in the exam and don't understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.

Project Statistics

Sourcerank 6
Repository Size 316 MB
Stars 3,519
Forks 867
Watchers 306
Open issues 0
Dependencies 0
Contributors 4
Tags 4
Last updated
Last pushed

Top Contributors See all

Jacob Eisenstein Yangfeng Ji Sandeep Soni umashanthi

Recent Tags See all

1 November 17, 2015
semcor-3.0 August 28, 2015
amazon-fall-2015 August 27, 2015
imbd-fall-2015 August 19, 2015

Interesting Forks See all

Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"
TeX - Updated - 1 stars

Something wrong with this page? Make a suggestion

Last synced: 2017-11-24 18:48:52 UTC

Login to resync this repository