1. Introduction to the course

1.1. Syllabus

LING 3820, 6820: Natural Language Processing

Fall 2016: MWF 3:00 - 3:50 in Newcomb 308

http://www.tulane.edu/~howard/NLP/

1.1.1. Objectives

This course teaches you how to make a computer perform various useful tasks with natural language.

Through it you’ll learn

  • some linguistics,
  • some algorithms,
  • some statistics,
  • and some computer programming in Python.

Hopefully you’ll finish the semester with an appreciation for the intricacies of modeling human languages, plus some practical knowledge about solving linguistic problems.

Our work will be a combination of learning new algorithms, discussing linguistics, and programming useful systems that operate on real data.

It is great training if you are interested in doing natural language processing work in industry, either in a research lab or in a startup.

1.1.1.1. Why should you care?

  1. An enormous amount of information is now available in machine readable form as natural language text.
  2. Digital assistants such as Apple’s Siri are becoming an important form of human-computer communication.
  3. Much of human-human communication is now mediated by computers, such as on-line social networks.

1.1.1.2. Intended audience

Students of

  • linguistics,
  • computational science,
  • artificial intelligence,
  • mathematics,
  • and any other discipline with an interest in how to process natural language by computer.

1.1.2. Prerequisites

There aren’t any.

I do not take anything for granted and so will explain all background information, or at least suggest sources where you can find it on your own.

1.1.3. How it fits into the Linguistics Program

Todo

Still working on this.

1.1.4. Outcomes

For you to demonstrate how well you have attained the objectives, you will perform the following tasks:

  1. Take a quiz or turn in a project almost every week, usually on Monday. [11-1 * 7.5% = 75%]

    • No quiz/project can be accepted late.
    • Even though these look like a lot of small grades, missing just one lowers your final grade almost an entire letter, as an unfortunate few of my students have found out the hard way.
    • If you know ahead of time that you will miss a quiz/project, send me an e-mail beforehand, and I will excuse you with no penalty.
  2. Present a final project to the class on the final exam day (Dec 19) and turn in a report of your project within two days. [25%]

    • This may be a group effort, but the entire group will receive the same grade
  3. There is a possibility for extra credit by participating in an EEG experiment. (max +3%)

_images/0-HappyEEGsubject.png

1.1.4.1. Participation

Note that there is no credit for class participation, but I will change a Y+ into the higher X- if I notice you participating in class.

There is no grade for class participation because:

  1. I will record every class as an mp3 and post it to the course website.
  2. I will post my PowerPoint presentation to the course website after every class.

So you don’t have to come to class, but you will miss all the fun, plus there were be exercises in class.

1.1.4.2. Mapping from numerical to letter grades

89.5-91.4 A- 91.5-100 A
79.5-81.4 B- 81.5-87.4 B 87.5-89.4 B+
69.5-71.4 C- 71.5-77.4 C 77.5-79.4 C+
59.5-61.4 D- 66.5-67.4 D 67.5-69.4 D+
0-59.4 F

1.1.5. Code of Academic Conduct

The Code of Academic Conduct begins as so:

The integrity of the Newcomb-Tulane College is based on the absolute honesty of the entire community in all academic endeavors. As part of the Tulane University community, undergraduate students have certain responsibilities regarding work that forms the basis for the evaluation of their academic achievement. Students are expected to be familiar with these responsibilities at all times. No member of the university community should tolerate any form of academic dishonesty because the scholarly community of the university depends on the willingness of both instructors and students to uphold the Code of Academic Conduct. When a violation of the Code of Academic Conduct is observed it is the duty of every member of the academic community who has evidence of the violation to take action. Students should take steps to uphold the code by reporting any suspected offense to the instructor or the Associate Dean of Newcomb-Tulane College. Students should under no circumstances tolerate any form of academic dishonesty.

For the rest of the Code and further information, point your browser at http://tulane.edu/college/code.cfm.

1.1.6. Students with disabilities

Students with disabilities who need academic accommodation should:

  1. Contact the Goldman Office of Disability Services (ODS).
  2. Bring official notice to me from the ODS indicating that you need academic accommodation. This should be done before the first quiz.

1.1.7. Schedule of assignments

Text: there is none other than this website.
Blackboard/MyTulane: additional readings
Day Date Wd Topic Assignment Quiz
08/29 M Introduction to the course    
    Introduction to Python    
    Introduction to NLP    
08/31 W Strings 1 4.2.5. Practice 1  
09/02 F Strings 2 4.3.4. Practice 2  
09/05 M LABOR DAY no class  
09/07 W Strings 3 4.6.4. Practice 5  
09/09 F Strings 4 finish chapter  
09/12 M Flat text 1   Q1
09/14 W Flat text 2    
09/16 F Flat text 3    
09/19 M cancelled due to illness   Q2
09/21 W Regular expressions 1    
09/23 F Regular expressions 2    
09/26 M Regular expressions 3   Q3
09/28 W Lists and tokenization    
09/30 F Control 1    
10/03 M Control 2   Q4
10/05 W Control 3    
10/07 F NLP 2    
10/10 M reviewed quiz   Q5
10/12 W YOM KIPPUR no class  
10/14 F FALL BREAK no class  
10/17 M NLP 3    
10/19 W NLP 4    
10/21 F Text stats 1    
10/24 M Text stats 2   Q6
10/26 W Text stats 3    
10/28 F Text stats 4    
10/31 M RSS, Twitter, Metadata, RESTful APIs   Q7
11/02 W Deep learning tutorials    
11/04 F Logistic regression    
11/07 M Multilayer perceptrons   Q8
11/09 W Convolutional neural networks    
11/11 F Autoencoders    
11/14 M Restricted Boltzman machines   Q9
11/16 W Deep belief networks    
11/18 F Hybrid Monte-Carlo sampling    
11/21 M Recurrent neural networks   Q10
11/23 W THANKSGIVING BREAK no class  
11/25 F THANKSGIVING BREAK no class  
11/28 M Long short-term memory    
11/30 W RNN-RBM    
12/02 F TBA    
12/05 M TBA   Q11
12/07 W TBA    
12/09 F Last day    
12/19 Tu Presentations, 1-5pm    

1.1.7.1. Final exam day

Warning

You cannot leave town before the day of the final exam! (Tuesday, Dec 19, 1-5)

Tell your parents NOW! You are hereby warned. Do not tell me at the end of the semester that your parents bought you a ticket home without knowing.

1.2. About us

1.2.1. About me

  • Prof. Harry Howard

  • Office hours: MtW 4-5 & by appointment in Newcomb Hall 322-D

  • _images/0-email.png
    1. 862-3417 (voice mail 24 hours a day)

1.2.2. About y’all

Ask the person sitting next to you …

  1. what his/her name is,
  2. where he/she is from,
  3. what his/her major is,
  4. what he/she knows about computer programming or linguistics…
  5. and be ready to report what you learned back to the class.

1.3. Inductive vs. deductive learning

In most courses, you sit and listen to the professor lecture about something and then you are expected to apply what you have ‘learned’ in homework or on an exam. This method is often referred to as deductive learning. This course is going to be very different. For the most part, I will give you examples, and you will have to figure out how they work yourself, a method known as inductive learning. Recent research in pedagogy – the science of teaching – suggests that students learn better inductively than deductively. [note to self: need a reference] Be that as it may, I think that it is funner.

The difference between induction and deduction is so fundamental to all intellectual pursuits that I want you to absorb it well. This is a challenge, though, because there is one towering figure in pop culture who constantly uses the term “deduction” incorrectly.

1.3.1. Inductive vs. deductive reasoning

What is Sherlock Holmes known for? For amazing leaps of deduction, you might say. The Wikipedia entry on Holmesian deduction provides a helpful quote from “A Scandal in Bohemia”, in which Holmes takes one look at Watson and tells him that he had gotten very wet lately and that he had “a most clumsy and careless servant girl”. When Watson demands to know how Holmes could have made such a detailed and accurate guess, Holmes explains:

It is simplicity itself ... My eyes tell me that on the inside of your left shoe, just where the firelight strikes it, the leather is scored by six almost parallel cuts. Obviously they have been caused by someone who has very carelessly scraped round the edges of the sole in order to remove crusted mud from it. Hence, you see, my double deduction that you had been out in vile weather, and that you had a particularly malignant boot-slitting specimen of the London slavey.

You may have noticed that I characterize Holmes’ ratiocination as guessing, but he himself – as well as Wikipedia – calls it deduction. I was trying to be polite, as you can surmise by trying to categorize his explanation in the terms of the table of Two types of reasoning:

Table 1.1 Two types of reasoning
  induction deduction
premise 1 Sherlock is a grandfather. All men are mortal.
premise 2 Sherlock is bald. Sherlock is a man.
conclusion All grandfathers are bald. Sherlock is mortal
characterization specific > general general > specific
process bottom-up top-down

In the quote, Holmes starts with a very specific observation and works backwards to a general cause. Yet Two types of reasoning classifies this as induction, not deduction. So to the extent that the quote is representative of Holmesian reasoning, it is almost exclusively inductive. Yet you can appreciate why Holmes would want to say that it is deductive. If the premises are true in deduction, then the conclusion is guaranteed to be true, too. Not so in induction: even true premises can lead to a false conclusion.

Returning to the main thread of how this book teaches you to program in Python, you will start from specific bits of code like B+’s’ and try to reason to a general conclusion “what does + do”. This method does not guarantee correct conclusions, which is why I am here.

1.4. Powerpoint and podcast


Last edited October 28, 2016