8. Control

Up to now, your snippets of code are entirely dependent on you for making decisions. This is fine for pieces of text that fit on a single line, but is clearly insufficient for texts that run to hundreds if not thousands of lines in length. You will want Python to make decisions for you. How to tell Python to do so is the topic of this chapter, and falls under the rubric of control.

Note

The code script for this chapter is nlp8.py, which you can download with codeDowner(), see Practice 1, question 2.

8.1. How to check the truth of a statement

8.1.1. How to check for the presence of an item with in

The first step in making a decision is to distinguish those cases in which the decision applies from those in which it does not. In computer science, this is usually known as a condition. Perhaps the simplest condition in text processing is whether an item is present or not. Python handles this in a way that looks a lot like English:

1
2
3
4
5
6
7
8
>>> dessert = 'watermelon'
>>> 'w' in dessert
>>> 'wa' in dessert
>>> 'mel' in dessert
>>> 'y' in dessert
>>> 'wt' in dessert
>>> 'W' in dessert
>>> '' in dessert

The first line assigns a string to a variable, and the rest ask whether a sub-string is in it. Python does not answer with “yes” or “no”, but rather with “True” or “False”. I refer to these as the evaluation of the question or condition. The evaluation of each condition should work out exactly as you would imagine – except in the last one. A quirk of the mathematics of strings is that the null string is part of every string. This can trick you up if you are not alert to it.

Lists behave exactly like strings, with the proviso that the string being asked about must match a string in the list exactly:

1
2
3
4
5
6
>>> fruit = ['apple', 'cherry', 'mango', 'pear', 'watermelon']
>>> 'apple' in fruit
>>> 'peach' in fruit
>>> 'app' in fruit
>>> '' in fruit
>>> [] in fruit

Note that the null string is not an element of the list.

Python can understand sequences of in conditions:

1
2
3
4
5
6
>>> 'app' in 'apple' in fruit
#  'app' in 'apple' > True
#  'apple' in fruit  > True
>>> 'water' in dessert in fruit
>>> 'aple' in 'apple' in fruit
>>> 'pea' in 'peach' in fruit

The compound condition in the first line evaluates to true, because Python reads it from left to right as in the two comments: Python evaluates any well-formed in condition that it runs across, left to right. The last two lines show that if either in condition comes out false, the whole thing is false.

8.1.2. How to check for the absence of an item with not in

Sometimes you also want to know whether an item is not present. Python effects this with two different orderings of not:

1
2
3
4
5
6
>>> not 'y' in dessert
>>> 'y' not in dessert
>>> 'W' not in dessert
>>> 'wt' not in dessert
>>> 'wat' not in dessert
>>> '' not in dessert

not 'y' in dessert follows the normal statement of negation in logic, in which the negator applies to the whole statement to be negated, like not('y' in dessert). 'y' not in dessert sounds much more like English.

Lists again work just like strings:

1
2
3
4
>>> 'apple' not in fruit
>>> 'peach' not in fruit
>>> 'app' not in fruit
>>> '' not in fruit

Compound negations work just as you would expect:

1
2
3
>>> 'pee' not in 'peach' not in fruit
>>> 'pea' not in 'peach' not in fruit
>>> 'pea' not in 'apple' not in fruit

8.1.3. How to check the properties of a string

You can verify the properties of a string with the methods below, whose meaning should be evident from their name:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> S = 'CoNfUsIoN'
>>> S.isalpha()
>>> S.isdigit()
>>> S.isalnum()
>>> S.isspace()
>>> S.islower()
>>> S.isupper()
>>> S.istitle()
>>> S.startswith('CoN')
>>> S.endswith('IoN')

Note that a titlecased string is one which the first letter of each word is upper case and the rest are lower case.

8.1.5. How to compare magnitude with ==, !=, <, <=, >, >=

You can compare the magnitude of two objects with the operators of basic arithmetic:

1
2
3
4
5
6
>>> len(dessert) == len(dessert)
>>> len(dessert) != len(fruit)
>>> len(fruit) < len(dessert)
>>> len(fruit) <= len(dessert)
>>> len(dessert) > len(fruit)
>>> len(dessert) >= len(fruit)

These operators can be chained together:

>>> len(fruit) < len(S) < len(dessert) == 10

8.1.6. How to check identity and type with is and isinstance()

Python also allows comparison of identity with is and its negation:

1
2
>>> dessert is dessert
>>> dessert is not fruit

This would be extremely useful for checking the type of an object, for instance, to check whether dessert is a string:

1
2
>>> dessert is str
>>> type(dessert) is str

The first line fails because dessert is really a pointer to a location in memory, though the thing is points to has the type of string, as in line 2. This task is so useful that the Python developers decided to create a function just for it:

1
2
3
4
5
6
7
>>> isinstance(dessert, str)
>>> isinstance(fruit, list)
>>> isinstance(1, int)
>>> isinstance(1.0, float)
>>> isinstance(1, float)
>>> isinstance(dessert, list)
>>> not isinstance(dessert, list)

Why does line 5 fail?

Note

The general syntax of isinstance() is:

isinstance(object, type)

8.1.7. How to combine conditions with and

You can combine conditions with and:

1
2
3
>>> S.isalpha() and S.startswith('CoN')
>>> not S.isdigit() and not S.startswith('oN')
>>> not(S.isdigit() and S.startswith('oN'))

Did you notice the equivalence between the second and third lines?

8.1.8. How to choose conditions with or

You can choose one condition or another with or:

1
2
>>> S.isalpha() or S.isdigit()
>>> S.isupper() or S.islower()

8.1.9. Practice 1

Todo

yes

8.2. How to make an action contingent on a condition with if

Having explored several ways to trigger a decision, we now turn to how to connect a trigger to an action. In English, the simplest way to do so is by saying something like “if some condition is true, then take some action”. Python tries hard to replicate the naturalness of English. Consider an initial example:

1
2
3
>>> if 'N' in S:
...     print 'yes' # you must start line with a tab
...

In the example, 'N' in S is the trigger or condition and print 'yes' is the action that are coordinated by if. The condition must evaluate to true, whether it is positive or negative:

1
2
3
4
5
6
7
8
9
>>> if 'n' in S:
...     print 'yes'
...
>>> if 'n' not in S:
...     print 'no'
...
>>> if 'N' not in S:
...     print 'no'
...

If the condition evaluates to false, the trigger fails and no action is taken.

Note

The general syntax for if is:

1
2
3
>>> if True:
...     do something
...

Here are some more examples, which finally show you how to use re.search(). You may need to import it:

.. put these into practice
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
>>> if S.isalpha():
...     print 'yes'
...
>>> if S.endswith('IoN'):
...     print 'yes'
...
>>> if not S.endswith('CoN'):
...     print 'yes'
...
>>> if search(r'N', S):
...     print 'yes'
...
>>> if search(r'IoN$', S):
...     print 'yes'
...
>>> if not search(r'CoN$', S):
...     print 'yes'
...
>>> if len('CoN') < len(S):
...     print 'yes'
...
>>> if isinstance(S, str):
...     print 'yes'
...
>>> if S.isalpha() or S.isdigit():
...     print 'yes'
...

8.2.1. How to chain together three or more conditions

Imagine that you wanted to check whether a character is lowercase. You would test for two conditions: whether it is lowercase or whether it is uppercase. But there are a lot of leftovers which are neither one – the punctuation. These three conditions are mutually exclusive, so they cannot be stated as three ifs. A different syntax is necessary, that of the chained conditional expression:

1
2
3
4
5
6
7
8
>>> char = 'Y'
>>> if char.islower():
...     print 'yes'
... elif char.isupper():
...     print 'no'
... else:
...     print 'whoops!'
...

The chained conditional starts with if` and follows with as many instances of elif as there are alternative conditions. If there is data left but it doesn’t fall under any condition – the remainder – it is captured with else, which lacks a condition.

Note

The general syntax of a chained conditional expression is:

1
2
3
4
5
6
7
>>> if True:
...             do something
... elif True:
...             do something
... else:
...             do something
...

The upshot is that you can now endow your programs with the ability to take an action on your behalf. Yet this ability is limited to applying to one item at a time. The next section frees you from that limitation.

8.2.2. Practice 2

Todo

yes

8.3. How to iterate over the items of a container with a for loop

In computer science, the programming construct for examining every member of a container is called a loop. This could be every character in a string or every word in a list.

8.3.1. How to examine every item with for

As a simple example, consider printing every letter of a word:

1
2
3
4
5
6
7
8
9
>>> greeting = 'Yo!'
>>> letter
>>> for letter in greeting:
...     print letter
...
Y
o
!
>>> letter

The syntax of the for statement looks just like that of if. char in greeting looks like it is is a condition, but after for, it is really the range of items to be iterated over. The variable char stands for each character of greeting.

Note

The general syntax for a for loop is:

1
2
3
>>> for item in container:
...             do something, presumably to item
...

It is crucial to point out that, just like with if, the lines after the colon must be indented:

1
2
3
4
5
6
>>> for letter in greeting:
... print letter
File "<stdin>", line 2
print letter
^
IndentationError: expected an indented block

for applies to lists, too:

1
2
3
4
5
6
7
8
9
>>> fruit = ['apple', 'cherry', 'mango', 'pear', 'watermelon'] # if the list is no longer available
>>> for word in fruit:
...             print word
...
apple
cherry
mango
pear
watermelon

Tip

To save space in your Python console, adding a comma after the variable to be printed puts the output on the same line:

1
2
3
4
5
6
>>> for letter in greeting:
...             print letter,
...
>>> for word in fruit:
...     print word,
...

8.3.2. How to make a list during a loop with append()

Just printing the result of a loop to the Python console is rather dull. It would be much more useful to put it into a list for further processing. Python has a method called append() that adds an item to the end of an existing list. To use it, first make an empty list to hold the items, then use append() to add an item to it during each iteration of the loop:

1
2
3
4
>>> letterList = []
>>> for letter in greeting:
...     letterList.append(letter)
>>> letterList

The method can be read as “to charList append char”. You can do the same with lists, but this just turns one list into another:

1
2
3
4
>>> wordList = []
>>> for word in fruit:
...     wordList.append(word)
>>> wordList

8.3.3. How to pack a loop into a list with a list comprehension

Creating a list from a loop is such a frequent task that Python has a breathtakingly elegant idiom for accomplishing it, the list comprehension. It consists of putting the whole for statement within square brackets, with the appending signaled by the brackets themselves. By way of example, the previous loops are repeated with their corresponding list comprehension:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
>>> letterList = []
>>> for letter in greeting:
...     letterList.append(letter)
>>> letterList

>>> [letter for letter in greeting]

>>> wordList = []
>>> for word in fruit:
...     wordList.append(word)
>>> wordList

>>> [word for word in fruit]

The first list comprehension converts the string to a list; the second one doesn’t change anything because fruit is already a list. These list comprehensions can be read in English along the lines of “the list of characters such that each one is in greeting” and “the list of words such that each one is in fruit”.

Note

The general syntax for a comprehension is:

[item for item in container]

8.3.4. How to check a condition in a loop

The ultimate step in making a decision about a collection of items is to make membership in the output contingent on a condition:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
>>> lowLet = []
>>> for letter in greeting:
...     if letter.islower():
...             lowLet.append(letter)
>>> lowLet
>>> [letter for letter in greeting if letter.islower()]

>>> melonList = []
>>> for word in fruit:
...     if word.endswith('melon'):
...             melonList.append(word)
>>> melonList
>>> [word for word in fruit if word.endswith('melon')]

Again, in English the list comprehensions could be read as “the list of characters such that each one is in greeting and is lower case” and “the list of words such that each one is in fruit and ends with ‘melon’”.

Chained conditionals can also be put into a loop, which makes them much easier to use:

1
2
3
4
5
6
7
8
9
>>> caseList = []
>>> for letter in greeting:
...     if letter.islower():
...             caseList.append('yes')
...     elif letter.isupper():
...             caseList.append('no')
...     else:
...             caseList.append('whoops!')
>>> caseList

However there is no list comprehension that is exactly analogous to a chained conditional, since elif is not allowed in them. A list comprehension only allows if -- else, so the elif has to be decomposed into else -- if. Here is what it looks like in a loop:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> caseList = []
>>> for letter in greeting:
...     if letter.islower():
...             caseList.append('yes')
...     else:
...             if letter.isupper():
...                     caseList.append('no')
...             else:
...                     caseList.append('whoops!')
>>> caseList

Here is what it looks like in a list comprehension:

>>> ['yes' if letter.islower() else 'no' if letter.isupper() else 'whoops!' for letter in greeting]

This can be read as “the list of yeses if the character is lowercase, noes if the character is uppercase and whoopses otherwise, for each character in greeting”.

The closest analogy to a three-part conditional is to populate a list with append() inside each condition:

1
2
3
4
5
6
7
8
9
>>> under5 = []; equal5 = []; over5 = []
>>> for word in fruit:
...     if len(word) < 5:
...             under5.append(word)
...     elif len(word) > 5:
...             over5.append(word)
...     else:
...             equal5.append(word)
>>> under5; equal5; over5

8.3.5. How to transform items within a loop

The argument of append() takes any type that can be an element of a list, such as strings or integers, so it can hold the result of a method:

1
2
3
4
5
6
7
8
9
>>> upperList = []
>>> for letter in greeting:
...     upperList.append(letter.upper())
>>> upperList

>>> lenList = []
>>> for word in fruit:
...     lenList.append(len(word))
>>> lenList

A list comprehension can perform the same change by applying it to the first mention of the item:

1
2
>>> [letter.upper() for letter in greeting]
>>> [len(word) for word in fruit]

In English, we would say, “the list of characters converted to upper case such that each character is in greeting” and “the list of lengths of word such that each word is in fruit”. This might seem like magic, but it works as long as the initial expression is a transformation of the item variable.

8.3.6. How to build strings

From what has been said so far, the way to build a string from a list should be with a for loop, such as turning the list fruit into a single string:

1
2
3
4
>>> S0 = ''
>>> for word in fruit:
...     S0 = S0 + fruit
>>> S0

The problem is that strings are immutable, so each iteration of this loop creates a new string, which could become burdensome for a long list. The recommended alternative is to build a list of substrings and then join them:

1
2
3
4
>>> fruitList = []
>>> for word in fruit:
...     fruitList.append(word)
>>> ''.join(fruitList)

But this is just bizarre, because you could have done it directly with join(), without the intervening list:

>>> ''.join(fruit)

Since you will undoubtedly be working from some list, the only reason to do this would be to modify the strings before joining them:

1
2
3
4
>>> fruitList = []
>>> for word in fruit:
...     fruitList.append(word.title())
>>> ''.join(fruitList)

And of course, the list-construction code can be stuffed into a list comprehension, to conflate all of this to a single line:

>>> ''.join([word.title() for word in fruit])

8.3.7. How to loop over lines of a file with readlines()

So far, you have read a file directly into a string variable and then manipulated it. This could become cumbersome if the file is very large, or if you wanted to extract something from the file directly, for posterior processing. Python supplies the readlines() function to help you out of this difficulty. It reads every line – that is, every string that ends in a newline character – into one element of a list:

1
2
>>> with open('Wub.txt', 'r') as tempFile:
>>>     lines = tempFile.readlines()

Once a file has been tokenized into lines, you don’t need the return or new line characters any more, so they can be stripped out in a loop:

1
2
3
>>> cleanLines0 = []
>>> for line in lines:
...     cleanLines0.append(line.rstrip('\r\n'))

However, the tempFile itself can be iterated over, saving you the readlines() step:

1
2
3
4
>>> cleanLines1 = []
>>> with open('Wub.txt', 'r') as tempFile:
...    for line in tempFile:
...        cleanLines1.append(line.rstrip('\r\n'))

But then, a list comprehension can squeeze the list-handling code into a single line:

1
2
>>> with open('Wub.txt', 'r') as tempFile:
...    cleanLines2 = [line.rstrip('\r\n') for line in tempFile]

Two clarifications. For as long as processing depends on the tempFile, it must stay under the scope of with. Also, I used line three times, but that should lead to any problems from code block to code block because it gets reinitialized every time.

8.3.8. How to loop over rows of a CSV file

Recall from How read CSV as a list that csv.reader() iterates over the rows of a CSV file to read their values into a list:

1
2
>>> from csv import reader
>>> help(reader)

You now know enough to be able to do that. You should have ISOlanguages.csv in pyScripts – if you don’t, you will have to follow the instructions from How read CSV as a list to download and save it. The approach is just what you have been reading about: create an empty list and then append each row of the file to it. Think about it … think some more … and here we go:

1
2
3
4
5
>>> ISOdata = []
>>> with open('ISOlanguages.csv', 'r') as csvfile:
...     fileReader = reader(csvfile, delimiter='|')
...     for row in fileReader:
...         ISOdata.append(row)

The file uses a non-standard separator, the pipe instead of the comma, so it has to be stipulated. Perhaps the only quirk of this code that deserves comment is that reader() reads from the open CSV file (i.e. csvfile), which is picked up by the for loop. Thus the loop has to be in the scope of with. If it is put outside of with, the CSV file is closed and there is nothing for for to loop over.

This should work just as well as a list comprehension:

1
2
3
>>> with open('ISOlanguages.csv', 'r') as csvfile:
...     fileReader = reader(csvfile, delimiter='|')
...     ISOdata = [row for row in fileReader]

All done in a single line! This is so breathtakingly elegant that you can appreciate why Python is so awesome.

8.3.9. Practice 3

  1. Make an alphabetized list of the words from The Wub that:

    1. end in ly.
    2. begin with some or any.
    3. end in ion or ic and are more than 4 letters long.
    4. have 8 or more letters.
    5. have more than 5 letters but less than 9.
    6. have a dash.
    7. are digits.

For each one, give a version with and without a regular expression.

  1. Is there a root that occurs both with -ed and -ing?

8.4. More on exceptions

8.4.1. How a for loop really works

My introduction to iteration with for suggests that it works by magic, but a peek behind the curtains shows something a bit more mundane, though still interesting.[#]_ Let us return to a simple example:

1
2
>>> for word in fruit:
...     print word

Python turns this into:

1
2
3
4
5
6
7
8
    iterator = iter(fruit)
while True:
    try:
        word = next(iterator)
    except StopIteration:
        break
    else:
        print(word)

The first line checks whether fruit is the type of object that can be iterated over. Lists are one such object, so iter returns a iterator. An iterator is something that can be called repeatedly to return 0 or more values and then an exception called StopIteration. The second line exemplifies a second kind of loop called while, which keeps on going as long as some condition is true. We have not had the opportunity to see a usage of while in natural language processing yet. Line 3 starts a try block, which is mostly self-explanatory. It tries next() to get the next item in the list and breaks the loop if there is none, signaled by the StopIteration exception. Otherwise, it prints the current word.

The try block makes explicit a very reasonable assumption about lists and other iterable objects, namely that there is normally a next item in the iteration, until there is suddenly not one. try marks the normal or more frequent case, while except marks the less frequent, if not unique, case. Thus there should be no computational penalty for going ahead and trying the normal case – it’s expected to work – and applying a special clause for the infrequent case.

This approach to designing the for loop reflects a larger, philosophical issue in the design of computer programming languages, which Python’s Benevolent Dictator for Life (BDFL), (seriously, look it up, Guido van Rossum) incorporated into the bones of the language. The BDFL’s guiding principles have been condensed into twenty aphorisms, The Zen of Python by Tim Peters, of which for embodies number two (if not others):

Explicit is better than implicit.

That is to say, like I already said, the try block of for manifests a statistical property of iterables, namely that they usually iterate. When one doesn’t, the situation should be clearly flagged as exceptional.

Python’s preference for explicitness leads us to another broad issue of programming style …

8.4.2. Look before you leap (LBYL) vs. it is easier to ask for permission (EAFP)

Imagine that you want to write a function that prevents you from dividing by zero.[#]_ Your first reaction would probably be to write something like this:

1
2
3
4
5
6
7
8
>>> def ifZero(x,y):
...     if y == 0:
...         print "Don't divide by zero!"
...         return None
...     else:
...         return x/y
>>> ifZero(2,1)
>>> ifZero(2,0)

However, you may recall that try acts like if in that it imposes a condition on processing. Moreover, Python has a ZeroDivisionError exception, so the task could be recoded as:

1
2
3
4
5
6
7
8
>>> def tryZero(x,y):
...     try:
...         return x/y
...     except ZeroDivisionError:
...         print "Don't divide by zero!"
...         return None
>>> tryZero(2,1)
>>> tryZero(2,0)

An if-else statement checks first and then does something, a programming style referred to as “look before you leap” (LBYL). On the other hand, a try-except statement goes ahead and does something, stopping in case of error, a programming style referred to as “it is easier to ask for forgiveness than permission” (EAFP).

Both appear to do the same thing, and let us assume for the sake of argument that neither is significantly more expensive computationally than the other – which is probably true, and you are welcome to check it with timeit. So I ask you, as a budding Python programmer, which is more pythonic?

Well, given aphorism number two from the preceding section, you should ask yourself, which is more explicit? As a weighty consideration in answering this question, ask yourself another: when dividing, what do you expect to divide by, zero or another number?

A mathematician would answer that there are an infinite number of numbers other than zero, so your odds of accidentally using zero are rather slim.

This observation allows you to answer the main question, namely that the second version with try` successfully makes the statistical distribution of the problem explicit.

This is just the tip of a rather extensive debate on the virtues of LBYL versus EAFP, but the consensus within the Python community clearly falls to the latter. Without going deeper into arcane details of how Python works, I offer you the following rules of thumb:

  1. Use EAFP, i.e. try when you know that there is a relevant exception.
  2. Use EAFP, i.e. try when you can formulate the try clause to handle the most frequent or expected case.
  3. You have to use LBYL, i.e. if when all you have to work with is a truth value.

8.5. How to decompose iteration with iter()

Python’s iter() function takes an iterable object and returns an iterator. This in turn allows fine-grained control over the iteration, for instance, with the next() function:

1
2
3
4
>>> fruitable = iter(fruit)
>>> fruitable.next()
>>> fruitable.next()
>>> fruitable.next()

A more flexible way of using next() is to define the iterator directly, as its argument, in a manner reminiscent of a list comprehension:

>>> next(word for word in fruit)

It would be expected to allow a condition:

>>> next(word for word in fruit if 'r' in word)

If I remind you that iterables have a StopIteration exception, you should be able to use the pattern from How a for loop really works to create a general findFirst block of code:

1
2
3
4
5
6
7
    >>> target = 'r'
    >>> try:
    ...     findFirst = next(word for word in fruit if target in word)
    >>> except StopIteration:
...         print 'There is no {} in the list.'.format(target)
    >>> else:
...         print 'The first {} is in {}.'.format(target, findFirst)

But we are still limited to just the next element in the sequence. It would be helpful to have the actual indexation of the iteration available, to be able to pick and choose what to see next.

8.5.1. How to add a counter to an iterable with enumerate()

This can be achieved through the enumerate() method, which pairs an element with its index. However, enumerate() returns an enumerate object, rather than the index-element pairs, so its effect can only be seen in a loop:

1
2
>>> for item in enumerate(fruit):
...     print item,

As you can see, the pairs returned by enumerate() are in the order “index, element”. Try enumerate() out as before by retrieving the first hit, but by its index, abbreviated to i:

>>> next(i for (i, word) in enumerate(fruit) if 'a' in word)

Most programmers omit the parentheses, though I find leaving them makes the code easier to read. Now find the next word with ‘a’:

1
2
>>> next(i for (i, word) in enumerate(fruit) if 'a' in word and i > 0)
>>> fruit[2]

I leave it to the practice to fold this into the try block.

8.6. Summary

8.7. Further practice

  1. Make a function for the case-checking conditional. Call it caseChecker() and put it into textProc.py

8.8. Powerpoint and podcast

Endnotes

[1]The gloss of for is from If you don’t like exceptions, you don’t like Python.
[2]I am indebted to ryeguy’s question LBYL vs EAFP in Java? for this example.

Last edited: October 13, 2016