2. An Introduction to Python

Python is a language in which you can write out a sequence of commands for your your computer to do something. It also the name for the software that actually makes your computer do something with the sequence that you write. For example, if you type the sequence “237 + 9075” into the Python interactive interpreter and hit the return key, Python will add them up and display “9312” on the next line, like so:

>>> 237 + 9075
9312

While working with numbers is important for what you are going to learn in this book, working with words is even even more important. Textual computation in Python can be just as simple as numerical computation:

>>> word = 'msinairatnemhsilbatsesiditna'
>>> 'anti' in word
False
>>> 'itna' in word
True

The first command assigns the string msinairatnemhsilbatsesiditna to the variable word, the second asks whether the string anti is in it, and the third asks whether the string itna is in it. The responses False and True are Python’s answer to each question.

2.1. You might already have Python on your computer

The Macintosh and Linux operating systems have Python installed on them by default; some Windows systems also have it, as explained in Why is Python installed on my machine?. To make sure, you will have to use a piece of software that you may be unfamilar with, which we will refer to as a terminal. Since knowing how to use a terminal application will be useful for other things, we will make a brief digression to cover it.

2.1.1. The terminal

On a Macintosh, you can find the Terminal application by typing “terminal” in Spotlight, the magnifying glass icon at the top right of the screen, and opening it. You should get a window that looks like this one:

_images/TerminalBlank.png

The first line tells you the last time you logged on. The second line is where the fun starts. It shows the name of my computer and my user name, followed by a dollar sign, $. There may also be a cursor blinking slowly. The dollar sign is known as a prompt, and when you see one not followed by any text, the program is prompting you to type a command. For instance, typing ls after the prompt as below lists the names of the files in whatever folder your terminal happens to be looking at:

$ ls

Note that you don’t have to type the dollar sign; I include it just to give you a point of reference.

More relevant to our immediate concern is to find out whether you computer has Python on it, by typing the command that asks for the path to Python’s executable file:

$ which python

After hitting return, the terminal should respond with a sequence of folder names separated by forward slashes like /Library/Frameworks/EPD64.framework/Versions/Current/bin/python, which locates the basic Python file on your hard drive.

2.1.2. Don’t use your built-in Python

So you may have confirmed that you have Python already built into your computer’s operating system. That’s a good thing, isn’t it?

Well, it’s not bad, but your computer expects to have that exact version of Python – you can’t update it to a more recent one, and unpredictable things may happen if you add additional software packages to it. So we don’t recommend that you use the Python that you already have.

2.2. How to get a new installation of Python

Getting a new installation of Python onto your computer is unfortunately more of a challenge than we would like it to be. There are several considerations, which we will review step by step.

2.2.1. How many bits?

The very first thing that you have to ascertain is whether your computer processes data in 32-bit or 64-bit chunks. How to do this for a Mac is explained by Apple at How to tell if your Intel-based Mac has a 32-bit or 64-bit processor. The last processor shipped by Apple that was 32-bit was the Intel Core Duo in 2006, so if you have a Mac from 2007 or later, it should have a 64-bit processor.

The reason that we begin with the 32-/64-bit distinction is that the options for 32-bit machines nowadays are quite limited.

2.2.2. Don’t get a distribution from Python.org

Python is a free, open-source, multi-platform project distributed by the Python Software Foundation. You could download an installer and go ahead and create your own installation, which would work fine. But we don’t recommend that you do that.

Although the initial download and installation is easy enough, adding additional packages to it can be quite a challenge. We would rather you expend your energies on answering cultural questions, not combing through on-line forums trying to figure out why your software won’t compile.

2.2.3. Get a scientific distribution

Your best choice is to get one of the scientific installations. You may object that you don’t feel very scientific, but you don’t have to be a scientist to use it. And maybe by the end of this project, you will feel a little more scientific.

As of this writing, there are two multi-platform scientific distributions, Continuum Anaconda and Enthought Canopy. If you have a 32-bit computer, you can stop now and get Canopy’s distribution. Otherwise, we think that Anaconda’s is slightly superior. One of the reasons for our preference is the subject of the next section.

2.3. Integrated Development Environments or IDEs

Even though Python will run plain text files – if they have the ‘.py’ suffix – your coding will be much less painful and more accurate if you use an integrated development environment or IDE that colors Python syntax in an informative way and takes care of indentation automatically, among many other tasks. Anaconda includes our favorite IDE, the Scientific PYthon Development EnviRonment or Spyder. If you have downloaded and installed Anaconda, you can start Spyder by opening the Launcher app and then installing and launching spyder-app.

2.3.1. A first look at Spyder

At startup, Spyder opens a variety of windows, each of which gives different insight into the workings of Python. The default set of visible windows is Editor, Object inspector, and Console, creating a layout that looks like this:

_images/SpyderDefault.png

In the Editor window you will write Python programs called scripts. In the image, it has opened a default script from the Spyder distribution. The Object inspector window gives a short explanation of Python commands that you type at the Object line. In the Console window you will interact with Python.

2.3.2. Interacting with Python in Spyder

Once you have Spyder up and running, you can play with it in interactive mode. Just like the terminal, Python has a prompt that tells you that it is ready for you to type a command. Python’s prompt is three greater-than signs, >>>. As a point of reference, I include it at the beginning of commands. To give it a try, type 237 + 9075 into the Console window, as so, and hit return:

>>> 237 + 9075
9312

Be sure to try the other arithmetic operators, subtraction (-), multiplication (*), and division (/). Does division work the way you expect?

After you have tired of playing with math, play with some text:

>>> word = 'msinairatnemhsilbatsesiditna'
>>> 'anti' in word
False
>>> 'itna' in word
True

Feel free to try your own inventions. You can’t break anything.

2.4. Python’s parts

2.4.1. Command ~ function ~ method

2.4.2. Package ~ library ~ module

TO DO

2.5. Summary

2.6. Further practice

TO DO

  • More mathematical computation
    • test division
  • More textual computation
    • string operations that look like math: +
    • logical not in
  • More file navigation
    • where is site-folders

The answers are found at Answers to further practice with Python, but try do them all before looking at the answers.

2.7. Further reading

Any attempt to learn more about Python should start at the Python Programming Language – Official Website.

TO DO

  • Python vs other lgs
  • Pythons vs MATLAB
  • books & tutorials

The page that you downloaded the setuptools package from, setuptools, is the best source for further information.

See 20.5. urllib — Open arbitrary resources by URL for the Python 2.7.5 documentation of the urllib module. The following section, 20.6. urllib2 — extensible library for opening URLs documents the more general but slightly more complex urllib2 module. Also requests.

See 15.1 os — Miscellaneous operating system interfaces for the Python 2.7.5 documentation of the os module and in particular 15.1.4. Files and Directories.

Spyder stands for the Scientific PYthon Development EnviRonment. The website that hosts the project, spiderlib, is very informative.

To learn more about the terminal, …

2.8. Appendix: Anaconda vs. Canopy

2.8.1. How much to spend?

For recent processors, there are many more. So your next consideration will probably be, how much do you want to spend? If your answer is nothing, then is perhaps your best choice. If you have an academic email address (usually one that ends in ‘.edu’), you can download the 64-bit academic distribution of for free, which is comparable to Anaconda. We will go over the differences below.

2.8.2. Breadth and ease of use

Besides price, there are two other considerations for choosing a Python scientific distribution, breadth and ease of use. Starting at the end, ease of use refers to how you interact with the Python distribution. Even though Python will run plain text files – with the ‘.py’ suffix – your coding will be much less painful and more accurate if you use an integrated development environment or IDE that colors Python syntax in an informative way and takes care of indentation automatically, among many other tasks. Anaconda includes our favorite IDE, Spyder, while Canopy includes IDLE, which is demonstrably inferior, as well as Mayavi, which is superb at 3D data visualization. Unfortunately, we have little to no call for such visualization in this project.

As for breadth, it refers to the number of packages that come with the distribution, how useful they are to you, and how you can get more. Anaconda and Canopy are roughly comparable in these respects, so it is a draw.

2.8.3. Getting Canopy

Navigate your browser to Canopy’s home page and click on the link to download Canopy. Do not close its home page in your browser yet. Once you have Canopy on your computer, open the Canopy app and sign up for an account. Be sure to use an email address that ends in .edu. Once the account has been set up, quit the app. Go back to Canopy’s home page in your web browser and log in with your new credentials. Return to Canopy’s home page if you are not automatically redirected there. Click on the Academic License link and then on the Request your Academic License button. That should do it for the time being.

2.8.4. Adding a package to Canopy

In the course of this project, we try again and again to get useful information out of digital text. The basic tool for processing text in Python is the Natural Language Toolkit, or NLTK. Before we can use it, however, we have to make it accessible to Python – it is not included by default. To test for its presence, in Spyder’s Console window, type the words import nltk after the three greater-than signs >>>. NOTE: hereafter we shall refer to these three characters as the “Python prompt” or simply “prompt”. They are equivalent to the dollar sign in the Terminal in the sense that they tell you that Python is ready for your input. The result of import nltk on a default version of Canopy should be:

>>> import nltk
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
ImportError: No module named nltk
>>>

Python can’t find NLTK. But we know that it is there somewhere, because Enthought lists it in its Canopy Package Index.

All that needs to be done is to download the package from Enthought’s repository to your Python installation. Canopy automates this process for you, so start it up and click the Package Manager and then select Available Packages and enter “nltk” in the search window. It should find nltk 2.0.1-3. Press Install and wait for Canopy to download and install it. Then go back to Spyder and enter import nltk at the prompt again. Spyder should think about this for a few moments and then return the prompt without any error message. This tells you that the module has been imported successfully.

2.8.5. Spyder and Canopy

The Console window also shows the version of Python that Spyder is using. Here is Python 2.7.5, without an further elaboration. This is not Canopy’s Python! It is the Python distribution that comes with Spyder (for the Mac) and makes it so easy to use. Yet it has one huge drawback – it cannot be extended by adding additional packages to it [at least as far as I can tell from the answer to the question here].

To change to Canopy’s Python, go the Spyder menu and select Preferences. From there, select Console, then Preferences and then Advanced settings, or Spyder ‣ Preferences ‣ Console ‣ Advanced settings. Click on the button labeled Use the following Python interpreter and type in the following path: /Users/your_user_name/Library/Enthought/Canopy_64bit/User/bin/python2.7, where “your_user_name” is the name of your home folder. The path should look like the following:

_images/SpyderCanopyPath.png

You must get the path just right, or Spyder complains.

If the path is not working, you may have to double check where Canopy has downloaded its Python distribution to. This is easy enough to do – or at least it would be if Apple didn’t try to protect you from yourself. Since your Library folder contains all the files necessary to keep your Mac and applications in working order, Apple thinks that you shouldn’t mess with it. The default behavior of OS 10.7 and later versions is to hide it from you. That is to say, you do not see a Library folder in your user folder. It is there all right, just hidden.

To reveal the hidden Library folder, type this command after the dollar sign and hit the return key:

$ chflags nohidden ~/Library

Assuming that you didn’t get asked for a password, the Library folder should magically appear in your user folder. You can now quit the Terminal and navigate to Canopy’s Python in Spyder, as was explained above. Select it, hit the OK button, quit Spyder and restart it. The Console window should look like the one below:

_images/SpyderCanopyConsole.png

Now the console informs you that Spyder is using Canopy’s Python distribution, which is a little bit older than Spyder’s.


Last edited: April 06, 2014