CS 4650 and 7650
- Course: Natural Language Understanding
- Instructor: Jacob Eisenstein
- Semester: Fall 2015
- Time: Mondays and Wednesdays, 3:05-4:25pm
- TAs: TBD
- Policies (including office hours)
This course gives an overview of modern statistical techniques for analyzing natural language. The rough organization is to move from shallow bag-of-words models to richer structural representations of how words interact to create meaning. At each level, we will discuss the salient linguistic phemonena and most successful computational models. Along the way we will cover machine learning techniques which are especially relevant to natural language processing.
- Acquire the fundamental linguistic concepts that are relevant to language technology. This goal will be assessed in the short homework assignments, midterm, and class participation.
- Analyze and understand state-of-the-art algorithms and statistical techniques for reasoning about linguistic data. This goal will be assessed in the midterm, the assigned projects, and class participation.
- Implement state-of-the-art algorithms and statistical techniques for reasoning about linguistic data. This goal will be assessed in the assigned and independent projects.
- Adapt and apply state-of-the-art language technology to new problems and settings. This goal will be assessed in the independent project.
- (7650 only) Read and understand current research on natural language processing. This goal will be assessed in assigned projects and classroom participation.
The assignments, readings, and schedule are subject to change, but I will try to give as much advance notice as possible.
Readings will be drawn from my notes, from published papers and tutorials, and from the following two texts:
- Linguistic Fundamentals for NLP. You should be able to access this PDF for free from a Georgia Tech computer.
- Foundations of Statistical NLP. A PDF version is accessible through the GT library.
These are completely optional, but might deepen your understanding of the material.
- Speech and Language Processing is the textbook most often used in NLP courses. It's a great reference for both the linguistics and algorithms we'll encounter in this course.
- Natural Language Processing with Python shows how to do hands-on work with Python's Natural Language Toolkit (NLTK), and also brings a strong linguistic perspective.
- Schaum's Outline of Probability and Statistics can help you review the probability and statistics that we use in this course.
The official prerequisite for CS 4650 is CS 3510/3511, "Design and Analysis of Algorithms." This prerequisite is essential because understanding natural language processing algorithms requires familiarity with dynamic programming, as well as automata and formal language theory: finite-state and context-free languages, NP-completeness, etc. While course prerequisites are not enforced for graduate students, prior exposure to analysis of algorithms is very strongly recommended.
Furthermore, this course assumes:
- Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
- Background in basic probability, linear algebra, and calculus.
- Familiarity with machine learning is helpful but not assumed. Of particular relevance are linear classifiers: perceptron, naive Bayes, and logistic regression.
People sometimes want to take the course without having all of these prerequisites. Frequent cases are:
- Junior CS students with strong programming skills but limited theoretical and mathematical background,
- Non-CS students with strong mathematical background but limited programming experience.
Students in the first group suffer in the exam and don't understand the lectures, and students in the second group suffer in the problem sets. My advice is to get the background material first, and then take this course.