Probability Primer

Discrete Random Variables (RV) and Probability Mass Function (PMF)
Continuous Random Variables and Probability Density Functions (PDF)
Other probability concepts
Bayes’ Rule
Resources

Discrete Random Variables (RV) and Probability Mass Function (PMF)

Probability Mass Function(PMF) $P$ maps a state of a random variable to the probability of that random variable taking on that state.

Domain of $P$ is the set of all possible events: $\Omega$
Probabilities range from 0 to 1. $\forall{x}\in\mathbf{x}, 0 \le P(x) \le 1$
Probabilities sum up to 1. $\sum_{x \in \mathbf{x}}P(x) = 1$

Discrete Uniform Distribution

\[P(\mathbf{x}=x_i) = \frac{1}{k}\]

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import seaborn as sns
sns.set()

# Discrete Uniform distributions
uniform1 = stats.randint(low=3, high=8)
uniform2 = stats.randint(low=4, high=12)

# Plot the PMF function
a = np.arange(15)
plt.xticks(a + 0.4, a)
plt.bar(a, uniform1.pmf(a), color='#348ABD', edgecolor='#348ABD', alpha=0.60, lw='3', label='low=3, high=8')
plt.bar(a, uniform2.pmf(a), color='#A60628', edgecolor='#A60628', alpha=0.60, lw='3', label='low=4, high=12')
plt.ylim(0, 0.4)
plt.xlabel('Events')
plt.ylabel('Probability of an event')
plt.title('PMF $P(x)$ for Discrete Uniform Distribution')
plt.legend()

Bernoulli Distribution

\[P(x) = \begin{cases} 1-p, & \text{if $k=0$} \\ p, & \text{if $k=1$} \end{cases}\]

$p$ is the parameter of the distribution. Expected value of the distribution is $p$.

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import seaborn as sns
sns.set()

# Bernoulli distributions
bernoulli1 = stats.bernoulli(p=0.25)
bernoulli2 = stats.bernoulli(p=0.5)

# Plot the PMF function
a = np.arange(2)
plt.xticks(a + 0.4, a)
plt.bar(a, bernoulli1.pmf(a), color='#348ABD', edgecolor='#348ABD', alpha=0.60, lw='3', label='$p=0.25$')
plt.bar(a, bernoulli2.pmf(a), color='#A60628', edgecolor='#A60628', alpha=0.60, lw='3', label='$p=0.5$')
plt.xlim(-1, 4)
plt.ylim(0, 1)
plt.xlabel('Events')
plt.ylabel('Probability of an event')
plt.title('PMF $P(x)$ for Bernoulli Distribution')
plt.legend()

Binomial Distribution

A binomial distribution is parameterized by $n$ and $p$. It models the number of successes in a sequence of $n$ independent experiments, each with success probability $p$. For $n=1$, the binomial distribution is a Bernoulli distribution.

\[P(\mathbf{x} = k) = {n \choose k}p^kq^{n-k}, \text{where } k \in \{0, 1, 2,..n\}\]

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import seaborn as sns
sns.set()

# Binomial distributions
binomial1 = stats.binom(15, 0.75)
binomial2 = stats.binom(5, 0.5)

# Plot the PMF function
a = np.arange(-2, 20)
plt.xticks(a + 0.4, a)
plt.bar(a, binomial1.pmf(a), color='#348ABD', edgecolor='#348ABD', alpha=0.60, lw='3', label='$n=15, p=0.75$')
plt.bar(a, binomial2.pmf(a), color='#A60628', edgecolor='#A60628', alpha=0.60, lw='3', label='$n=5, p=0.5$')
plt.ylim(0, 0.5)
plt.xlabel('Events')
plt.ylabel('Probability of an event')
plt.title('PMF $P(x)$ for Binomial Distribution')
plt.legend()

Poisson Distribution

Random Variable $\mathbf{z}$ is Poisson-distributed if:

\[P(\mathbf{z}=k) = \frac{\lambda^k e^{-\lambda}}{k!}, k = 0, 1, 2,...\]

$\lambda$ is the parameter of the distribution - it can be any positive number. Expected value of the distribution $\mathbf{z}$ is:

\[E[\mathbf{z}] = \lambda\]

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as stats
import seaborn as sns
sns.set()

# Poisson distributions
poisson1 = stats.poisson(0.75)
poisson2 = stats.poisson(6.5)

# Plot the PMF function
a = np.arange(-2, 10)
plt.xticks(a + 0.4, a)
plt.bar(a, poisson1.pmf(a), color='#348ABD', edgecolor='#348ABD', alpha=0.60, lw='3', label='$\lambda=0.75$')
plt.bar(a, poisson2.pmf(a), color='#A60628', edgecolor='#A60628', alpha=0.60, lw='3', label='$\lambda=6.5$')
plt.ylim(0, 0.5)
plt.xlabel('Events')
plt.ylabel('Probability of an event')
plt.title('PMF $P(x)$ for Poisson Distribution')
plt.legend()

Continuous Random Variables and Probability Density Functions (PDF)

We describe continuous random variables using Probabiliy Density Function(PDF) $p$ instead of probability mass function.

It’s important to note that $p(x)$ is not probability. $p(x)$ is the probability density.
Densities are positive. $\forall{x}\in\mathbf{x}, p(x) \ge 0$
$p(x)\delta{x}$ represents the probability of landing inside $\delta{x}$ sized region. Since $p(x)\delta{x}$ is the probability, it should integrate to 1. $\int p(x)dx = 1$
Example: If a random variable $\mathbf{x}$ is represented by the uniform distribution that takes values from real value $a$ to real value $b$, it can be represented as $\mathbf{x} \sim U(a, b)$.

Uniform Distribution

\[p(x) = \frac{1}{b - a}\]

Exponential Distribution

\[p(x) = e^{-x}, \text{where } x >= 0\]

Normal Distribution

\[p(x) = \frac{e^{-x^2/2}}{\sqrt{2\pi}}\]

Other probability concepts

Marginal Probability

Sometimes we know the probability distribution over a set of variables and we want to know the probability distribution over just a subset of them. The probability distribution over the subset is known as the marginal probability distribution.

\[p(x) = \int p(x, y)dy\]

Considitional Probability

In many cases, we are interested in the probability of some event, given that some other event has happened. This is called a conditional probability.

\[P(\mathbf{y} = y | \mathbf{x} = x) = \frac{P(\mathbf{y} = y, \mathbf{x} = x)}{P(\mathbf{x} = x)}\]

Chain Rule of Probabilities

\[P(a,b,c) = P(a|b,c)P(b|c)P(c)\]

Independence and Conditional Independence

Two random variables $\mathbf{x}$ and $\mathbf{y}$ are independent if their probability distribution can be expressed as a product of two factors.

\[\forall x\in\textbf{x},y\in\textbf{y}, p(\textbf{x}=x, \textbf{y}=y) = p(\textbf{x}=x)p(\textbf{y}=y)\]

Expectation, Variance and Covariance

Expected value of a function $f(x)$ over the random variable $\textbf{x}$ is defined as:

\[E_{x \sim p}[f(x)] = \int p(x)f(x)dx\]

Bayes’ Rule

\[P(x|y) = \frac{P(x)P(y|x)}{P(y)}\]

Resources

Probability Cheatsheet
Probability and Information Theory chapter in the Deep Learning book.
Introduction to Probability and Statistics course from MIT OpenCourseWare.

Hardik Patel