SPAN 413.01, Natural Language Processing in Spanish

Time and place: MWF 2:00 - 2:50, Newcomb Hall 123
Prof. Harry Howard
howard at tulane dot edu
Office: Newcomb Hall 322-D
862-3417 (voice mail 24 hours a day)
Office hours: MW 4:30-5:30, T 4-5 and by appointment

Course language: As in almost all of the courses offered with the SPAN prefix, this course is taught in Spanish. However, this initial information is in English because one of the goals of the course is to translate it into Spanish.

Overview: Natural Language Processing in Spanish teaches you how to use a computer to do useful things with the Spanish language. Hopefully you'll finish the semester with some practical knowledge about solving linguistic problems, such as techniques for filtering junk email, automatically discovering the different meanings of a word, automatically translating from one language to another, and identifying the author of a text from the statistics of the words that he or she uses. You will also become familiar with the computer programming language called Python, which is easy to learn and makes doing many tasks in natural language processing rather simple. It is great training if you are interested in doing natural language processing work in industry, either in a research lab (Google, Microsoft, Powerset, Yahoo, etc.) or in a startup. As part of the course, we will use Google translate to translate an English-language textbook on natural language processing to Spanish, with the help of a research group in Barcelona.

Objectives:

Outcomes: For you to demonstrate your attainment of these objectives, you will perform the following tasks:

Code of Academic Integrity

“The integrity of Newcomb-Tulane College is based on the absolute honesty of the entire community in all academic endeavors. As part of the Tulane University community, students have certain responsibilities regarding work that forms the basis for the evaluation of their academic achievement. Students are expected to be familiar with these responsibilities at all times. No member of the university community should tolerate any form of academic dishonesty, because the scholarly community of the university depends on the willingness of both instructors and students to uphold the Code of Academic Conduct. When a violation of the Code of Academic Conduct is observed it is the duty of every member of the academic community who has evidence of the violation to take action. Students should take steps to uphold the code by reporting any suspected offense to the instructor or the associate dean of the college. Students should under no circumstances tolerate any form of academic dishonesty.” For further information, point your browser at http://college.tulane.edu/honorcode.htm.

Violations of the Code of Academic Integrity will not be tolerated in this class. I will rigorously investigate and pursue any such transgression.

Students with disabilities who need academic accommodation should:

Schedule of assigments, Spring 2010
Natural Language Processing with Python, 1e, (2009) by Steven Bird, Ewan Klein, and Edward Loper [NLPP]
There may be additional readings assigned from other sources.


Date

Day

Topic

Assignment

ppt mp3

P

Jan 11 (M)

1

Presentación del curso

NLPP Preface Powerpoint mp3  

13 (W)

2 La computación con el lenguaje NLPP 1.1 Powerpoint mp3  

15 (F)

3 La computación con el lenguaje NLPP 1.1 Powerpoint mp3  

18 (M)

 

MLK Birthday

     

20 (W)

4

Acercamiento a Python, La estadística

NLPP 1.2 - 1.3 Powerpoint mp3  

22 (F)

5

La estadística, Tomar decisiones y control

NLPP 1.3 - 1.4

Powerpoint mp3
25 (M) 6 Tomar decisiones y control, Comprensión automática del LN, Resumen NLPP 1.4 - 1.6 Powerpoint mp3 P1

27 (W)

7

Acceder a corpuses de texto

NLPP 2.1

Powerpoint mp3  

29 (F)

8

Unicode

NLPP 3.3

Powerpoint mp3  

Feb 1 (M)

9

 

 

-- --

P2

3 (W)

10

Unicode 2

NLPP 3.3

Powerpoint mp3  

5 (F)

11

Distribución de frecuencia condicionada

NLPP 2.2

Powerpoint mp3  

8 (M)

12

Más Python: Reciclar código

NLPP 2.3

Powerpoint --

 

10 (W)

13

Acceso a textos locales y de la web

NLPP 3.1

Powerpoint mp3 P3

12 (F)

14

Acceso a textos locales y de la web

NLPP 3.1 - 3.2

Powerpoint --  

15 (M)

Lundi Gras

 

17 (W)

15

Más sobre las cadenas, Las expresiones regulares

NLPP 3.2, 3.4

Powerpoint mp3

19 (F)

16

Aplicaciones de las expresiones regulares

NLPP 3.5

Powerpoint mp3  

22 (M)

17

Más aplicaciones de las expresiones regulares

NLPP 3.6 Powerpoint mp3 P4

24 (W)

18

Normalizing & tokenizing text, segmentation, formatting, summary

NLPP 3.7 - 3.10 Powerpoint mp3  

26 (F)

19

Using a tagger, Tagged corpora

NLPP 5.1 - 5.2

Powerpoint --  

Mar 1 (M)

20

La asociación de palabras con propiedades con diccionarios de Python

NLPP 5.3

Powerpoint mp3

P5

3 (W)

21

Automatic tagging

NLPP 5.3- 5.4

Powerpoint mp3  

5 (F)

22

Automatic tagging, N-gram tagging, Trans-based tagging, Word category, Summary

NLPP 5.4 - fin Powerpoint mp3  

8 (M)

23

La clasificación supervisada 1

NLPP 6.1

Powerpoint mp3

P6

10 (W)

24

La clasificación supervisada 2

NLPP 6.1

Powerpoint mp3  

12 (F)

25

La clasificación supervisada 3

NLPP 6.1

Powerpoint mp3  

15 (M)

26

La clasificación supervisada 4

NLPP 6.1

Powerpoint mp3

 

17 (W)

27

La clasificación supervisada 5

NLPP 6.1

Powerpoint mp3 P7

19 (F)

28

La clasificación supervisada 6

NLPP 6.1

Powerpoint mp3  

22 (M)

29

La clasificación supervisada 7

NLPP 6.1

Powerpoint mp3

P8

24 (W)

30

La evaluación

NLPP 6.3

Powerpoint mp3  

26 (F)

31 La extracción de información NLPP 7.1 Powerpoint --  

29 (M)

Spring Break

31 (W)

Spring Break

 

Apr 2 (F)

Spring Break

 

5 (M)

 

Spring Break

 

7 (W)

32

Chunking, Chunkers, Recursion, Names, Relations, Summary, Grammatical dilemmas, Syntax, CFG, Parsing, Dependencies, Grammar development, Summary, Grammatical features, Processing features, Extending the grammar, Extending the grammar, Summary

NLPP 7.2 - 9.4

 

9 (F)

33

NL understanding, Propositional logic

NLPP 10.1 - 10.2

 

12 (M)

34

FOL

NLPP 10.3

P9

14 (W)

35

FOL, Semantics of sentences

NLPP 10.3 - 10.4  

16 (F)

36

Semantics of sentences, Discourse semantics, Summary

NLPP 10.4 - 10.6  

19 (M)

37

Corpus structure, Life cycle

NLPP 11.1 - 11.2 P10

21 (W)

38

Acquiring data, XML

NLPP 11.3 - 11.4

23 (F)

39

Toolbox data, OLAC, Summary

NLPP 11.5 - 11.7  

26 (M)

40 The language challenge Afterward     P11

May 6 (R)

--

FINAL EXAM DAY 8 - noon

Present projects to class  

Go back to Harry Howard's home page

Inception: 08/16/09. Last revision: March 26, 2010 . HH