In [1]:
%pylab inline
Populating the interactive namespace from numpy and matplotlib
In [2]:
import numpy as np
import matplotlib.pyplot as plt
In [3]:
import scipy.stats
In [4]:
t_dist = scipy.stats.t
norm_dist = scipy.stats.norm
In [5]:
x_values = np.linspace(-4, 4, 100)

The $t$ distribution at different degrees of freedom:

In [6]:
t_prob_3 = t_dist.pdf(x_values, 3)
t_prob_10 = t_dist.pdf(x_values, 10)
t_prob_40 = t_dist.pdf(x_values, 40)
z_prob = norm_dist.pdf(x_values)
In [7]:
plt.figure(figsize=(10, 8))
plt.plot(x_values, t_prob_3, 'r:', label='t df=3')
plt.plot(x_values, t_prob_10, 'g:', label='t df=10')
plt.plot(x_values, t_prob_40, 'b:', label='t df=40')
plt.plot(x_values, z_prob, 'k', label='z')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()
Out[7]:
<matplotlib.legend.Legend at 0x490bfd0>

This is the probability density function (PDF) - the probability of observing a $z$ or $t$ value.

The cumulative density function (CDF) is the area under the curve of the PDF up until a particular value, and is the probability of observing a $z$ or $t$ value less than or equal to the value on the x axis:

In [8]:
t_cdf_10 = t_dist.cdf(x_values, 10)
plt.figure(figsize=(10, 8))
plt.plot(x_values, t_cdf_10, 'b', label='t cdf df=10')
plt.plot(x_values, t_prob_10, 'k', label='t pdf df=10')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()
Out[8]:
<matplotlib.legend.Legend at 0x54d6b10>

The CDF maps $z$ or $t$ value to a probability. That is, it is a function $f$ such that $p = f(t)$ returns the propability of observing (say) $t$ value greater than or equal to $t$. Here is the probability of observing a t value of 2 or less in a t distribution with 10 degrees of freedom:

In [9]:
p = t_dist.cdf(2, 10)
p
Out[9]:
0.96330598261462974

What if we want to go in the opposite direction? I mean, we want a function $g$ such that $t = g(p)$ - where $p$ is a probability and $t$ is the $t$ value?

In this case our function $g$ is the inverse of $f$ - that is $g(f(t)) = t$. In scipy the inverse of the CDF is the percent point function or ppf (see: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm).

The $t$ value corresponding to $p$ above should be 2:

In [10]:
t_dist.ppf(p, 10)
Out[10]:
1.9999999999960625

It isn't exactly 2, because of floating point error, but as you can see, it is very close.

The ppf (inverse cumulative density function) allows us to find a $t$ threshold for a probility of - say - 0.05:

In [11]:
t_for_05 = t_dist.ppf(0.95, 10)
t_for_05
Out[11]:
1.8124611228107335

Just for completeness, scipy also has the survival function which is just $1 - cdf(t)$:

In [12]:
p_sf = t_dist.sf(2, 10)
p_sf
Out[12]:
0.036694017385370196
In [13]:
1 - t_dist.cdf(2, 10)
Out[13]:
0.036694017385370259

scipy also has the inverse of the survival function, isf:

In [14]:
t_dist.isf(p_sf, 10)
Out[14]:
1.9999999999960638

scipy has these functions for lots of distributions, including the normal distribution.

So - what is the probability of observing a $z$ value (from the normal distribution) of 0.9 or lower?

In [15]:
norm_dist.cdf(0.9)
Out[15]:
0.81593987465324047

What $z$ value gives me a probability of 0.95 of seeing this $z$ value or less?

In [16]:
norm_dist.ppf(0.95)
Out[16]:
1.6448536269514722