Introduction to the interactive notebook

By Ariel Rokem, late of this university

The platform for this workshop will be files such as this one, which interleave explanation text, code and data visualizations.

To execute each cell of the notebook, hit shift-enter.

That will put you in the next cell. There are two types of input cells, controlled by the menu bar at the top of this page. One type are cells like this one, called "markdown cells". They are called that, because they contain a simple, minimalistic form of markup as input to the browser, which then renders these cells in a useful way. You can find a pretty good markdown cheat-sheet here. See - for example - that was a hyper-link to a web page!

These cells can also be used to write math-y stuff, using $\LaTeX$. To write math that will be rendered in this way, we simply enter a piece of syntactically correct latex between two $ signs. Like so: $f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{\frac{(x-\mu)^2}{2\sigma^2}}$

Another type of input cell in the notebook is a code cell. These cells contain code and when they are executed, this piece of code is sent to the python interpter. For example:

In [1]:
a = 1
print a
1

A whirlwind introduction to Python

Python is object-oriented

In contrast to some other commonly used scripting languages (notably Matlab), Python is an object-oriented language. That means that everything is an object. [And there's a big difference between having an object data-type and having almost everything be an object].

For example, if you type a. and hit the tab key, you will see all the attributes of this object:

In [2]:
print a.real
print a.imag
1
0

So far, that looks a lot like the matlab struct array. However, an important difference is that some of these attributes can be functions that can be called on/from the object. These are also referred to as 'methods' of the object. For example:

In [3]:
print a.bit_length
<built-in method bit_length of int object at 0x100311c88>

To tell what this method does, we can get ipython to give us a brief description of this function. This is done by attaching a ? after it:

In [4]:
a.bit_length?

This should open a sub-window at the bottom of this window with the 'docstring' of this method. If the author of the function was kind enough to provide useful information in the docstring, we can use this information to know what this method does and what are its expected inputs. This function has no inputs, so to call it, we simply provide the method with empty parentheses:

In [5]:
a.bit_length()
Out[5]:
1

There is a lot more to say about objects and object-orientation (and what it is good for). For now, this will suffice.

Python uses a few simple data structures

Some of them look deceivingly like structures in other languages. For example, here's a 'list':

In [6]:
b = [1,2,3,4]
print len(b)
4

Here's a classic gotcha. Python indexes start at 0:

In [7]:
print(b[0])
1

Lists can be changed:

In [8]:
b[0] = 2
print b
[2, 2, 3, 4]

They have a ton of interesting attributes and methods. Feel free to explore them as you just learned above. Importantly, though they look a lot like a Matlab vector, they are not the Python analogue of that. Wait just a few more cells for that one to show up.

In [9]:
print b * 2
[2, 2, 3, 4, 2, 2, 3, 4]
In [10]:
print b+1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-5bd19596e089> in <module>()
----> 1 print b+1

TypeError: can only concatenate list (not "int") to list
In [11]:
print b + b 
[2, 2, 3, 4, 2, 2, 3, 4]

Let's look at one useful class method and notice some interesting (and possibly surprising) behaviors:

In [12]:
new_b = b.append(5)

What do you expect new_b to look like?

In [13]:
print new_b
None

This is because the append method has no output. It changes the object on which it is called and doesn't return anything. We'll see more about what that means when we examine functions below. In the meanwhile, here's what b not looks like:

In [14]:
print b
[2, 2, 3, 4, 5]

'tuples' are very similar to lists:

In [15]:
c = (1,2,3,4)
print c[0]
1

In contrast to lists, they are immutable:

In [16]:
c[0] = 2
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-5673658116c7> in <module>()
----> 1 c[0] = 2

TypeError: 'tuple' object does not support item assignment

And they are also not analogous to Matlab vectors:

In [17]:
print c + c
(1, 2, 3, 4, 1, 2, 3, 4)

So far, looks kinda useless. It'll become more interesting in just a minute, after we introduce 'dictionaries', or as they are affectionately known 'dicts':

In [18]:
d = {'key_the_first':1000.0, 'key_the_second':'woot!', 10:1 , (10,11):12}

This data structure holds key-value pairs, such that when you refer to one, you can get the other:

In [19]:
print d['key_the_first']
1000.0
In [20]:
print d[10]
1

Note that while lists cannot be keys to a dict, tuples can:

In [21]:
print d[(10,11)]
12

In CS-speak, a thing such as a dict is sometimes referred to as a hash-table and is considered a very useful thing to have. For example, think what you would have to do in order to count the frequency of appearance of certain words in a big text file.

Python uses a few simple control structures

As in other languages you might know, there are if, else, elif (not elseif), for and while control statements. There are a few interesting idiosyncracies. First of all, if you have something that has several components, you can probably just loop over it, using the for loop:

In [22]:
for x in b:
    print x
2
2
3
4
5

Second, you might notice that there's no explicit delimiter to that for loop. How does python know where it started and where it ended? The answer is that the indentation signals that. In contrast to other languages, indentation in Python is not only a stylistic preference, but a syntactical requirement. That is, if you do not indent the contents of a control block, you will get an error:

In [23]:
for x in b: 
print x
IndentationError: expected an indented block

This level of stickler-ish insistence on such things as where the white-space appears have made people call it a "bondage and discipline" language. But maybe that's your thing?

Obviously, we can nest several different control structures in each other:

In [24]:
for lookfor in range(10):
    if lookfor in b:
        print 'I think we have a %s in b'%lookfor
I think we have a 2 in b
I think we have a 3 in b
I think we have a 4 in b
I think we have a 5 in b

Also demonstrating:

  • Use of range
  • Different uses of the builtin in
  • String comprehenrsion

Python gives users control of their namespace

The idea here is that apart from very few built-in constructs and types, when you start a python interperter, it knows very little. To allow it to know more, you need to assign variables (which we did above), you need to load things from files (which we'll see later), or you need to import modules that contain more objects, functions, etc. For example, let's import a favorite module of mine, the os module

In [25]:
import os

importing this name into our name-space now makes a lot of other names available through it. Try to type os. and tab-completing on that. And some of the names under that have additional names under them. For example:

In [26]:
curdir = os.path.curdir
listdir = os.listdir(curdir)
print os.path.join(curdir, listdir[1])
./.gitignore
  • Example: count words in a file.
  • Read the paradigm file and plot it and what-not
  • Where do I get help?

You can easily define Python functions

No need for other files, compilation steps, or anything. Just define it and it's there:

In [27]:
def add_two_numbers(num1, num2):
    """
    This is the function docstring. Ideally it is informative. For example: 

    Add two numbers to each other. 

    Parameters
    ----------
    num1 : int/float
       A number
    
    num2 : int/float
       Another number
    
    Returns
    -------
    sum_it : the sum of the inputs
    
    """
    sum_it = num1 + num2
    return sum_it
In [28]:
add_two_numbers(1,2)
Out[28]:
3
In [29]:
add_two_numbers?

If we have time, let's look at a slightly more interesting example:

In [30]:
import numpy as np # The array library - we'll come back to this one

def word_count(url): 
    """ 
    Count word frequencies in the text body of a given url

    Parameters 
    ----------
    url : string
        The address of a url to be scraped 
    
    Returns
    -------
    word_dict : dict
       Frequency counts of the words 

    """
    import urllib
    url_get = urllib.urlretrieve(url)
    f = file(url_get[0])
    txt = f.read()
    
    start_idx = txt.find('<body>')
    end_idx = txt.find('</body>')
    
    new_txt = txt[start_idx:end_idx]
    new_txt = new_txt.split(' ')
    
    word_dict = {}
    for word in new_txt:
        # Get rid of all kinds of html crud and the empty character:
        if not('>' in word or '<' in word or '=' in word or '-' in word or '&' in word or word == ''): 
            if word in word_dict.keys():
                word_dict[word] += 1
            else: 
                word_dict[word] = 1

    vals = np.array(word_dict.values())
    keys = np.array(word_dict.keys())
    sort_idx = np.argsort(vals)

    return (vals[sort_idx][::-1], keys[sort_idx[::-1]])

Let's apply this to a paper about fMRI:

In [31]:
word_arr = word_count('http://www.journalofvision.org/content/11/5/12.full?')

For simplicity, let's examine the top 50 results:

In [32]:
to_plot_vals = word_arr[0][:50]
to_plot_words = word_arr[1][:50]
print to_plot_vals
print to_plot_words
[557 492 491 336 213 209 198 149 148 134 134 120 105  91  81  77  77  77
  76  74  71  62  62  54  53  47  46  46  42  42  41  39  38  37  35  34
  34  34  33  33  33  32  31  30  30  30  29  29  28  27]
['the' 'in' 'of' '\n' 'to' 'and' 'a' 'is' 'that' 'text"\n' 'reference'
 'visual' 'fMRI' 'with' 'V1' 'by' 'response' 'attention' 'for' 'be' 'BOLD'
 'human' 'signal' 'responses' 'on' 'not' 'are' 'et' 'neurons' 'stimulus'
 'as' 'effects' 'from' 'This' 'electrophysiological' 'signals' 'cortex'
 'between' 'macaque' 'or' 'an' 'neuronal' 'effect' 'but' 'The' 'spatial'
 'time' 'this' 'primary' 'may']

I think that it's fair to say that the first 11 words are not very interesting, so let's ignore those and plot the others, wordle-style

In [33]:
%pylab inline
Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.
In [34]:
fig, ax = plt.subplots(1)
for word_idx in range(len(to_plot_words[11:])):
    ax.text(np.random.rand(), np.random.rand(), to_plot_words[11:][word_idx], fontsize=to_plot_vals[11:][word_idx])
ax.set_axis_off()

For help: tab complete, something?, and when in doubt, Google is your friend.

Summary

Python is an object-oriented language with powerful libraries for representation of numerical objects, and for scientific analysis and visualization. Using the ipython notebook, we can interactively analyze and visualize data in an iterative fashion. We can now move on to examine some more interesting data. Namely, we will start looking at some MRI data, using neuroimaging-specific libraries.