2. An Introduction to Python

Python is a language in which you can write out a sequence of commands for your your computer to do something. It also the name for the software that actually makes your computer do something with the sequence that you write. For example, if you type the sequence “237 + 9075” into the Python interactive interpreter and hit the return key, Python will add them up and display “9312” on the next line, like so:

1
2
>>> 237 + 9075
9312

While working with numbers is important for what you are going to learn in this book, working with words is even even more important. Textual computation in Python can be just as simple as numerical computation:

1
2
3
4
5
>>> word = 'msinairatnemhsilbatsesiditna'
>>> 'anti' in word
False
>>> 'itna' in word
True

The first command assigns the string msinairatnemhsilbatsesiditna to the variable word, the second asks whether the string anti is in it, and the third asks whether the string itna is in it. The responses False and True are Python’s answer to each question.

By the way, the name python may make you think of snakes, but it was actually inspired on Monty Python’s Flying Circus [1]. Nevertheless, snake images figure prominently in pythonic icons.

Note

Do you see a snake in Python’s icon?

_images/2-PythonLogo.png

2.1. How to get Python

2.1.1. How to get Python on your computer

2.1.1.1. Don’t use your built-in Python

You may already have Python on your computer – see the box below – but your computer expects to have that exact version of Python; you can’t update it to a more recent one, and unpredictable things may happen if you add additional software packages to it. So I don’t recommend that you use the Python that you might already have.

Built-in Python

The Macintosh and Linux operating systems have Python installed on them by default; some Windows systems also have it, as explained in Why is Python installed on my machine?.

2.1.1.2. How to get a new installation of Python

Getting a new installation of Python onto your computer is unfortunately more of a challenge than I would like it to be. There are several considerations, which I will review step by step.

Python is a free, open-source, multi-platform project distributed by the Python Software Foundation. You could download an installer and go ahead and create your own installation, which would work fine. But I don’t recommend that you do that. Although the initial download and installation are easy enough, adding additional packages to it can be quite a challenge. I would rather you expend your energies on doing natural language processing, not combing through on-line forums trying to figure out why your software won’t compile.

Your best choice is to get one of the scientific installations. You may object that you don’t feel very scientific, but you don’t have to be a scientist to use it. And maybe by the end of this book, you will feel a little more scientific.

As of this writing, there are two multi-platform scientific distributions, Continuum Anaconda and Enthought Canopy. If you have a 32-bit computer – see the box “How many bits below” – you can stop now and get Canopy’s distribution. Otherwise, I think that Anaconda’s is slightly superior. [2] One of the reasons for my preference is that it includes my favorite Python IDE, Spyder, as explained in Use Python through an integrated development environment or IDE. By the way, you should download the Python 2.7 version, because I fear that not all of the modules that we are going to use are ready for version 3.5 – see Table 2.1.

How many bits?

Most computers made after 2007 processes data in 64-bit chunks. However, computers made before 2007 use smaller, 32-bit chunks. The last processor shipped by Apple that was 32-bit was the Intel Core Duo in 2006. How to check your Mac is explained by Apple at How to tell if your Intel-based Mac has a 32-bit or 64-bit processor. The reason that I bring this up is that the options for running Python on 32-bit machines nowadays are quite limited.

2.1.2. How to get Python on your tablet or smart phone

The two most popular apps for Python coding are Pythonista for iOS and DroidEdit for Android OS. Unfortunately, in this course you will eventually use Python modules that are not part of the standard distribution and – as far as I know – are difficult if not impossible to install in the computationally limited environment of a tablet or smart phone. If it is convenient, you are welcome to start the course coding on your hand-held device, but at some point, that will no longer be possible.

2.1.3. How to get Python on-line

There are several on-line Python environments. PythonAnywhere is the best one that focuses just on Python. There are others that include Python among the computer languages that they offer. PythonAnywere lets you import a wide variety of modules that are not part of the standard distribution – what it calls “batteries included” – and others can be installed by hand, so it may be sufficient for the course.

2.2. How to interact with Python

2.2.1. Use Python through an integrated development environment or IDE

Even though Python will run plain text files – if they have the ‘.py’ suffix – your coding will be much less painful and more accurate if you use an integrated development environment or IDE that colors Python syntax in an informative way and takes care of indentation automatically, among many other tasks. Anaconda includes my favorite IDE, the Scientific PYthon Development EnviRonment or Spyder. You are also free to use IDLE, though I have less success making it work on my Macs.

2.2.2. What Spyder looks like

If you have downloaded and installed Anaconda, you can start Spyder by opening the Navigator app and then launching spyder-app. At startup, Spyder opens a large window divided into three panes, each of which gives different insight into the workings of Python. The default set of visible windows is Editor, Object inspector, and Console, creating a layout that looks like this:

_images/2-SpyderDefault.png

In the Editor window you will write Python programs called scripts. In the image, it has opened a default script from the Spyder distribution. The Object inspector window gives a short explanation of Python commands that you type at the Object line. In the Console window you will interact with Python.

2.2.3. Type at the prompt in the Console

Python tells you that it is ready for you to give it a command by displaying three greater-than signs, >>>, followed by a blinking cursor. The three greater-than signs are called a prompt. As a point of reference, I include the prompt at the beginning of every line of code. To give it a try, type 237 + 9075 after the prompt, and hit return. Spyder should display the following:

1
2
>>> 237 + 9075
9312

Tip

To save time typing, you can copy text from here and paste it directly after the prompt. Try copying and pasting “237 + 9075”.

Be sure to try the other arithmetic operators, subtraction (-), multiplication (*), and division (/).

Question

Does division work the way you expect?

By the way, you don’t have to put spaces around arithmetic operators, but I think it makes them easier to read:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
>>> 237+9075
9312
>>> 237-9075
-8838
>>> 237*9075
2150775
>>> 237/9075
0
>>> 237/9075.0
0.026115702479338844

Question

Have you figured out how division works yet?

After you have tired of playing with arithmetic, play with some text:

1
2
3
4
5
>>> word = 'msinairatnemhsilbatsesiditna'
>>> 'anti' in word
False
>>> 'itna' in word
True

Feel free to try your own inventions. You can’t break anything.

2.2.4. How to set the global working directory in Spyder

There is one Python house-keeping chore that needs to be done immediately. You are going to collect texts from the Internet to analyze, and you are going to write short programs to do the analysis. You need a place to store all this stuff so that Python can find it easily. I therefore urge you right now to create a folder in your Documents folder and call it “pyScripts”.

Once you have done that, make it the default folder that Spyder looks in to find files. In order to do so, open Spyder and then look for the Preferences menu. On a Mac, it is found under the python menu; in Windows, it is found under the Tools menu.

Opening the Preferences menu displays a window like the one below. In the Startup area, select the following directory: and click on the folder icon to navigate to your pyScripts folder. Once it is selected, the path to it will be displayed in the box. A Mac will display something like /Users/harryhow/Documents/pyScripts; Windows will display something like \Users\harryhow\Documents\pyScripts. Set the next two selections to the global working directory. Leave the last untouched & unchecked:

_images/2-SpyderPrefWindow.png

Fig. 2.1 Spyder’s preferences window

Author’s image

Click Apply and then OK. Spyder’s Python will now always look for files in this folder, without you having to tell it to do so.

You can always tell what Spyder thinks the global working directory is by looking in the box at the top left corner, where it is displayed:

_images/2-SpyderGWD.png

Fig. 2.2 Where the current working directory is displayed in Spyder

Author’s image

You can change it by clicking the folder icon, but the change only stays in effect until you close Spyder.

Being careful about where files go seems simple enough, maybe even unnecessary, but there is a reason why we bring it up. Recent versions of the Mac and Windows operating systems tend to hide from the user the detail of where files are stored. Unfortunately, Python is not so understanding and needs to know exactly where your files are, which means that you need to know where they are, in order to tell Python. Keeping all your scripts in the same folder will help you in this endeavor and save you many a headache as your coding becomes more complex.

2.2.5. How to use a script in Spyder

2.2.5.1. Open a new script file in Spyder

It is here that the graphical interface of Spyder shines, so fire it up. You may already have a blank script open in the left pane, as so:

_images/2-SpyderAnacondaLayout.png

Fig. 2.3 My default layout of Spyder showing a blank script in the left pane.

Author’s image.

If not, under the File menu, choose New file…, which opens a window for a blank Python file.

2.2.5.2. Spyder adds some information for you

Or nearly blank, since Spyder fills in the top few lines with a header, such as:

1
2
3
4
5
"""
Created on Thu Sep  1 07:37:15 2016

@author: Harry Howard
"""

The first series of three quotes tells Python to ignore the text after them. The second tells Python to stop ignoring text, that is, to start paying attention again. Between these two delimiters, you should enter information about the script that helps to identify it. Spyder thinks that you need at least your name and date; some explanation of what the script does should be added, too.

2.2.5.3. Put something into your script

Just copy our sample code below and paste it directly into the script. I have taken out the prompts because Python will try to process them as part of your code and crash miserably:

1
2
3
4
237 + 9075
word = 'msinairatnemhsilbatsesiditna'
'anti' in word
'itna' in word

2.2.5.4. Save your script

Now open the File menu and choose Save as …, which opens Spyder’s Save as… dialog window. If you have created a pyScripts folder in your Documents folder and set it to be the global working directory, the window will open on it, like this:

_images/2-SpyderSave.png

Fig. 2.4 Spyder’s `Save as…` dialog window opens to pyScripts

Author’s image

Give it an informative name like firstScript.py (always keep the .py suffix) and hit Save.

2.2.5.5. Run your script line by line

Now that you have named and saved your script, go to the Run menu of Spyder and select Run. The first time that you do this, a dialog window will open like the one below that asks you to make some decisions:

_images/2-SpyderRunSettings.png

Fig. 2.5 Initial settings for running a script.

Author’s image

In Interpreter, tick Execute in a new dedicated Python interpreter, if it is not already ticked. In General settings, the Working directory should be ticked and pointing at pyScripts. Click Apply to apply the changes and OK.

Now you can hit the Run button (the green triangle point to the right. Spyder’s console should reply as so:

1
2
>>> runfile('/Users/harryhow/Documents/pyScripts/firstScript.py', wdir=r'/Users/harryhow/Documents/pyScripts')
>>>

Nothing happens, because this script does not report anything to the Console, though it does create the string named word, as you can see in the Variable explorer pane.

But Spyder has a totally cool way to see what each line does individually. Select the first one (on my Mac, a line turns orange when it is selected) and then you can do one of two things:

  1. From the menus, Run —> Run selection or current line, or
  2. right click with mouse, Run selection or current line.

The console should print:

1
2
>>> 237 + 9075
9312

You can step through the other three lines to make sure that it works the way you expect it to and then close the script by clicking on the X inside a circle on the left side of the tab that has the script’s name.

2.2.5.6. Open a script in Spyder

Just to make sure that we are on the same page, do File —> Open … and choose the script. It should refill the Editor window on the left pane of Spyder.

2.3. Computer hygiene

2.3.1. Refresh your browser!

I am constantly improving these pages, so it is possible that you might be looking at out-of-date material. To make sure that you have the up-to-date material, click on your web browser’s refresh button, which is an icon that looks like this:

_images/2-RefreshButton.png

Fig. 2.6 The refresh button is a three-quarters circle with an arrow, like this one, from Firefox

Author’s image

It is located to the left or right of the address bar at the top of your web browser.

2.3.2. Close your applications every now and then!

2.3.3. Shut down your computer every now and then!

2.4. How Python compares to other programming languages

For the experienced programmer Hoyt Koepke lauds Python for scientific computing, with particular reference to MATLAB, in 10 Reasons Python rocks for research (and a few reasons it doesn’t).

Todo

need some more

2.5. How to use a command-line interface

At some point you may have to do some maintenance on Anaconda or install a Python module by hand. To do so, you have to peak under the hood of your graphical user interface and poke around in what I will refer to as a command line interface. Since knowing how to use a command line interface will be useful for other things, I will make a brief digression to cover it.

2.5.1. How to get to a command line

On a Macintosh, the command line interface is the Terminal application, which can be found by typing terminal in Spotlight, the magnifying glass icon at the top right of the screen, and opening it. You should get a window that looks like this one:

_images/2-TerminalBlank.png

The first line tells you the last time you logged on. The second line is where the fun starts. It shows the name of my computer and my user name, followed by a dollar sign, $. There may also be a cursor blinking slowly. The dollar sign is the prompt.

In Windows, the command line interface tool that is the analog of the Terminal is called the Command Prompt. It can be found in two ways:

  • Click the Start button, click All Programs, click Accessories, and then click Command Prompt.
  • Click the Start button. In the Search box, type Command Prompt or just cmd, and then, in the list of results, double-click Command Prompt.

The window that opens up looks like this one from Wikipedia:

_images/2-Command_Prompt.png

Note that the prompt is a greater-than sign >.

2.5.2. How to use the command line

For the few times that I need to refer to the Terminal or Command Prompt, I will call it “terminal/command prompt” and combine the two prompts into the single hybrid $>. For instance, typing ls after the prompt as below lists the names of the files in whatever folder your terminal happens to be looking at:

$> ls

Note that you don’t have to type the prompt; I include it just to give you a point of reference.

More relevant to our immediate concern is to find out whether you computer has Python on it, by typing the command that asks for the path to Python’s executable file:

$> which python

After hitting return, the terminal should respond with a sequence of folder names separated by forward slashes like /Users/harryhow/anaconda/bin/python, which locates the basic Python file on your hard drive. It may also be helpful to know the version of your Python installation. Type this at the command line:

$> python -V

It should return something like “Python 2.7.10 :: Anaconda 2.3.0 (x86_64)”. This is also the first line that Spyder prints when it starts a new console.

2.5.3. How to update Anaconda

In the terminal/command prompt type the following two lines:

$> conda update conda
$> conda update anaconda

Additional information is found at How do I update Anaconda?.

2.5.4. How to install a package that is not part of Anaconda’s distribution

2.5.4.1. How to install a package with pip

The easiest way to add a package to your Python installation is with the Python package installer, pip – assuming that the package has been made available for pip. If not, you have to install it ‘by hand’ with setuptools, as explained in the next section.

You are going to practice by downloading and installing a package called tweepy, which makes Twitter available to Python. [3] In the Mac Terminal or the Windows command prompt type this line, where tweepy is the name of the package that you want to install:

$> pip install tweepy

The response will be a torrent of meaningless commands. If the installation is successful, a blank prompt is returned without any error message. You can now go back to Spyder and import the package to see if it really did work:

1
2
>>> import tweepy
>>>

If all is well, Python just returns a blank line. If the installation fails, you should get some kind of response that includes the word “error”. Then you have to install the package by hand, using the setuptools utility.

2.5.4.2. How to install a package with setuptools

The first step is to download the package’s source code. It will come in a folder which has been compressed, so you need to have a decompression or extraction utility installed on your computer. The Mac has one; the PC sometimes does not. Decompress the folder, navigate to it in the terminal, and run the installation command. I illustrate this procedure with a package for managing PDF files in Python called PDFMiner. The following explanation is adapted from PDFMiner’s download instructions, but it extends to most other packages.

Pdfminer is available for download at the Python Package Index page for PDFMiner. Click on the green button to download the source files. Your web browser should download them to your Downloads folder as the file “pdfminer-20140328.tar.gz”, or whatever the most recent version may be. The .tar and .gz suffixes mean that the original files have been compressed with the tar and gz routines. The Mac and PC may perform the decompression automatically, or you will have to double click the file to get it started. Of course, this assumes that your computer has a decompression utility.

What gets extracted is a folder called “pdfminer-20140328”. Open up the terminal/command prompt and navigate to it. You do this by typing cd followed by a space, and then dragging the folder directly to where the cursor is blinking and drop it. This pops up the path to the folder. For instance, if I drag the folder after the command, the following path appears:

$> cd /Users/harryhow/Downloads/pdfminer-20140328

Then press return and at the next prompt type the command:

$> python setup.py install

A bunch of gibberish should start flowing down the screen. If it works, the prompt will be returned to you with no error message.

PDFMiner happens to include a test utility, so check that it installed properly with:

$> pdf2txt.py samples/simple1.pdf

PDFMiner extracts the text from the sample PDF file called “simple1.pdf” and displays it on the screen. It is just the phrase “Hello world” repeated in several ways.

You can now open Spyder and type import pdfminer to ensure that the new package is indeed available.

more See setuptools.

2.6. What versions are used in this book?

Table 2.1 Software versions used in this book
name version source works in v3.5?
Python 2.7.10 CAD
Anaconda 2.3 CAD
Spyder 2.3.5.2 CAD
pip 8.1.2 CAD
setuptools 23.0 CAD
Beautiful Soup 4.4.1 CAD
tweepy 3.6 pip
pdfminer 20140328 setuptools
textract 1.4 pip ???
Graph API 2.2 Facebook  

Note

CAD indicates those resources that are included with the Continuum Anaconda distribution and can be accessed directly through the import command. See Anaconda package list for other modules available with this distribution.

Todo

Add rest of them

2.7. Summary

Yes, I need to write a summary.

2.8. Further practice

No, there isn’t any, yet.

2.9. Further reading

Any attempt to learn more about Python should start at the Python Programming Language – Official Website.

Endnotes

[1]See Wikipedia’s history of Python at Python (programming language)
[2]Disclaimer: I have no relationship with Continuum Analytics, financial or otherwise. I just find their software easy to use.
[3]For more information on tweepy, see its site at GitHub, tweepy.

Last edited: November 27, 2016