%pylab inline

Populating the interactive namespace from numpy and matplotlib

import numpy as np
import matplotlib.pyplot as plt

import scipy.stats

t_dist = scipy.stats.t
norm_dist = scipy.stats.norm

x_values = np.linspace(-4, 4, 100)

The $t$ distribution at different degrees of freedom:

t_prob_3 = t_dist.pdf(x_values, 3)
t_prob_10 = t_dist.pdf(x_values, 10)
t_prob_40 = t_dist.pdf(x_values, 40)
z_prob = norm_dist.pdf(x_values)

plt.figure(figsize=(10, 8))
plt.plot(x_values, t_prob_3, 'r:', label='t df=3')
plt.plot(x_values, t_prob_10, 'g:', label='t df=10')
plt.plot(x_values, t_prob_40, 'b:', label='t df=40')
plt.plot(x_values, z_prob, 'k', label='z')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()

<matplotlib.legend.Legend at 0x490bfd0>

This is the probability density function (PDF) - the probability of observing a $z$ or $t$ value.

The cumulative density function (CDF) is the area under the curve of the PDF up until a particular value, and is the probability of observing a $z$ or $t$ value less than or equal to the value on the x axis:

t_cdf_10 = t_dist.cdf(x_values, 10)
plt.figure(figsize=(10, 8))
plt.plot(x_values, t_cdf_10, 'b', label='t cdf df=10')
plt.plot(x_values, t_prob_10, 'k', label='t pdf df=10')
plt.xlabel('$z$ or $t$ value')
plt.ylabel('probability')
plt.legend()

<matplotlib.legend.Legend at 0x54d6b10>

The CDF maps $z$ or $t$ value to a probability. That is, it is a function $f$ such that $p = f(t)$ returns the propability of observing (say) $t$ value greater than or equal to $t$. Here is the probability of observing a t value of 2 or less in a t distribution with 10 degrees of freedom:

p = t_dist.cdf(2, 10)
p

0.96330598261462974

What if we want to go in the opposite direction? I mean, we want a function $g$ such that $t = g(p)$ - where $p$ is a probability and $t$ is the $t$ value?

In this case our function $g$ is the inverse of $f$ - that is $g(f(t)) = t$. In scipy the inverse of the CDF is the percent point function or ppf (see: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm).

The $t$ value corresponding to $p$ above should be 2:

t_dist.ppf(p, 10)

1.9999999999960625

It isn't exactly 2, because of floating point error, but as you can see, it is very close.

The ppf (inverse cumulative density function) allows us to find a $t$ threshold for a probility of - say - 0.05:

t_for_05 = t_dist.ppf(0.95, 10)
t_for_05

1.8124611228107335

Just for completeness, scipy also has the survival function which is just $1 - cdf(t)$:

p_sf = t_dist.sf(2, 10)
p_sf

0.036694017385370196

1 - t_dist.cdf(2, 10)

0.036694017385370259

scipy also has the inverse of the survival function, isf:

t_dist.isf(p_sf, 10)

1.9999999999960638

scipy has these functions for lots of distributions, including the normal distribution.

So - what is the probability of observing a $z$ value (from the normal distribution) of 0.9 or lower?

norm_dist.cdf(0.9)

0.81593987465324047

What $z$ value gives me a probability of 0.95 of seeing this $z$ value or less?

norm_dist.ppf(0.95)

1.6448536269514722