10. Recycling code

10.1. Functions

10.1.1. How to write a function

As your first task, consider how to find the ratio of all the words in a text to its unique words. Mathematically, this is just the number of tokens divided by the number of types or more pythonically len(tokens)/len(set(tokens)).

10.1.1.1. Detour: How to force floating-point or non-truncating division in Python 2

However, the division operator does not work the way you think. What is 3 divided by 5?:

>>> 3/5
0
>>> 3/float(5)
0.6
>>> 3/5.0
0.6
>>> from __future__ import division
>>> 3/5
0.6

I bet you were surprised to see Python tell you that 3 divided by 5 is 0. This is because the default operation of / in Python 2 is integer or truncating division, so that 0.6 is rounded (down) to 0. To force floating-point or non-truncating division, you can either mark the denominator as floating-point with float(5) or the real number 5.0, or import the non-truncating division operator from the __future__ (Python 3) module.

10.1.1.2. The layout of a function

A function is a block of code that begins with def, followed by the name of the function, followed by parenthesis for any input information that it needs. It contains indented lines of code that perform its work. Its output follows the reserved word return. Below is a succinct function to calculate lexical diversity, illustrated by using the word tokenization of La gitanilla:

>>> def diversidad_lexica(tex):
...     return len(tex) / float(len(set(tex)))
...
>>> diversidad_lexica(texto)
4.868782722513089

Here is a schematic layout of a function’s parts:

def your_function_name(parameters):
    body_of_function
    return return_value

10.2. Scripts and batch processing

When you have gotten tired of typing “msinairatnemhsilbatsesiditna”, you might wonder whether you could type it just once and have Python run it as many times as you want. This is indeed possible and is in fact the main way in which we will interact with Python. That is to say, we will write sequences of commands – sometimes long ones – and have Python run them once we have finished typing, not one by one. This is ‘batch’, rather than ‘interactive’ mode, and the sequence is a script. You cannot write a script easily in the terminal. You can write one, but you will quickly lose patience with the non-graphical user interface and think that we are trying to punish you.

10.2.1. How to write a script in Spyder

It is here that the graphical interface of Spyder shows its superiority, so fire it up. Under the File menu, choose New file…, which opens a window for a blank Python file.

10.2.1.1. Spyder starts a script for you

Or nearly blank, since Spyder fills in the top few lines with a header, such as:

"""
Created on Mon Jun  3 19:55:43 2013

@author: Harry Howard
"""

The first series of three quotes tells Python to ignore the text after them. The second tells Python to stop ignoring text, that is, to start paying attention again. Between these two delimiters, you should enter information about the script that helps to identify it. Spyder thinks that you need at least your name and date; some explanation of what the script does should be added, too.

You can now type in the lexical-diversity function. Tabs are still important:

1
2
3
4
5
6
7
"""
Created on Mon Jun  3 19:55:43 2013

@author: Harry Howard
"""
def diversidad_lexica(tex):
    return len(tex) / float(len(set(tex)))

Save your work by going to the File menu, choosing Save as… and giving the file the name “funciones”, keeping the .py suffix.

And now we need to stop to think once more.

10.2.2. Where to save a script

Spyder’s Save as… dialog window will open onto some default place on your computer. We cannot predict where it will be, but for the sake of specificity, we will assume that it is in your Documents folder. On the Mac, it will look like this:

_images/SpyderSave.png

We recommend that you create a folder for your Python scripts. On the Mac, you first click on the triangle at the right of the Save As … text entry box to expand the dialog window:

_images/SpyderNewFolder.png

In this expanded view, you can click on New Folder and name it something informative like “pyScripts”. Python does not like to find spaces in the folders it looks at, so be sure not to use one. You finish by clicking Create to create the new folder and then Save to save your new script into it.

10.2.3. Running your script

Now that you have named and saved your script, go to the Run menu of Spyder and select Run. The first time that you do this for a script, a dialog window will open like the one below that asks you to make some decisions:

_images/SpyderRun.png

In General settings, the Working directory should be checked and pointing at the folder that you just created. In Interpreter, tick Execute in a new dedicated Python interpreter, and in Dedicated Python interpreter, check Interact with Python interpreter after execution. Now you can hit the Run button. A new interpreter window should open in Spyder’s console window and spit out the three lines of results, followed by a Python prompt:

9312
False
True
>>>

The opening of the new interpreter window is triggered by the second setting that you just made. The appearance of the Python prompt (triggered by the third setting) means that you can interact with the script. For example, type word at the prompt and hit return. Python should respond with the string that was assigned to the variable, namely msinairatnemhsilbatsesiditna.

10.2.4. Where are my files?

Being careful about where files go seems simple enough, maybe even unnecessary, but there is a reason why we bring it up. Recent versions of the Mac and Windows operating systems tend to hide from the user the detail of where files are stored. Unfortunately, Python is not so understanding and needs to know exactly where your files are, which means that you need to know where they are, in order to tell Python. Keeping all your scripts in the same folder will help you in this endeavor and save you many a headache as your coding becomes more complex.

Even so, it is convenient to sketch how to deal with files.

10.2.4.1. Finding files in Python

File organization is a prerogative of the operating system, so Python has a special module named ‘os’ for asking it to perform actions on files. To invoke it, use the import command with the module name, import os. Your first task is always to figure out what Python is looking at in your computer’s file hierarchy. This is known as the current working directory, and the command to ask for it is os.getcwd(), which can be read as something like “use the os module to get the current working directory”. If the current working directory is indeed the pyScripts folder, the response would be something like '/Users/{your_user_name}/Documents/pyScripts'. Note that the convention is to state the path through your computer’s folder hierarchy by separating folder names by a forward slash.

If the current working directory is some other folder, you can change it to pyScripts by putting the path to it in single or double quotes inside the parentheses of os.chdir(), to be read as “use the os module to change to the directory in parentheses”. Since a folder path can be very long, it can be hard to read within the parentheses and you might make a mistake. It is more perspicuous to assign the path to a variable and put the variable name within the parentheses. Once you have done this, you could double-check the last two by giving os.getcwd() again, but let us be more adventurous and list the files in this folder with os.listdir(path). The snippet of code below assembles all of these commands in one place for easy reference:

>>> import os
>>> os.getcwd()
>>> path = '/Users/{your_user_name}/Documents/pyScripts'
>>> os.chdir(path)
>>> os.listdir(path)

Note how defining the path for os.chdir() saved us the effort of retyping it for os.listdir(). Always let Python do as much work for you as possible.

10.2.4.2. Finding files in the terminal

To find files in the terminal, we will just list the commands that correspond to the Python ones that we have just reviewed:

$ [just opening a terminal is equivalent to import os]
$ pwd
$ path=/Users/{your_user_name}/Documents/pyScripts
$ cd $path
$ ls

Much more could be said, but this is enough to get us going.

10.2.5. The special comment for setting a default encoding of a script

Python has a special comment for declaring the default encoding of a file:

>>> # -*- coding: utf-8 -*-

It must be the first or second line of a file, but we haven’t come to writing files yet.

10.2.6. leftovers

And now you need to stop to think.

When you type 3/5 in Spyder’s console and hit return, Python interprets it as a command to perform the division and then display the result to you in the console window. But a script does not interact with you line by line, so you have to tell it explicitly what to do.

Displaying the result of the evaluation of an expression can be accomplished with the print command, as in print 237 + 9075. Inserting print before all of the expressions whose evaluation we want to see and then typing these lines into the new file after the header, starting on line 6, produces a script like:

1
2
3
4
5
6
7
8
9
"""
Created on Mon Jun  3 19:55:43 2013
This script illustrates mathematical and textual computation in Spyder.
@author: Harry Howard
"""
print 237 + 9075
word = 'msinairatnemhsilbatsesiditna'
print 'anti' in word
print 'itna' in word

text

    1. text
    2. text
    3. text

text

    1. text
    2. text
>>> test
test