2  Chapter 2: Conditional distributions and Bayes rule

Author

Shao-Ting Chiu

Published

August 30, 2022

2.1 Axioms of probability

Let \(F\), \(G\) and \(H\) be three possibly overlapping statements.

  1. 0 = Pr(not H|H) \(\leq\) Pr(F|H) \(\leq\) Pr(H|H) = 1
  2. Pr(F\(\cup\)G|H) = Pr(F|H) + Pr(G|H) if \(F\cap G=\emptyset\)
  3. \(Pr(F\cap G|H)=Pr(G|H)Pr(F|G\cap H)\)

2.2 Events and partition

  • Sample space \(S\)
  • Partition
    • a collection of sets \(A_1,\dots, A_m\)
    • \(A_{i} \cap A_j = \emptyset\)
  • Conditional probability
    • Let \(B\) be an event, and \(A_i,\dots,A_m\) be a partition of \(S\)
    • \(P(B|A_i) = \frac{P(B\cap A_i)}{P(A_i)}\)
  • Bayes rule
    • \(P(A_j |B) = \frac{P(B|A_j)P(A_j)}{P(B)} = \frac{P(B|A_j)P(A_j)}{\sum^{m}_{i=1}P(B|A_i)P(Ai)}\)
Example: COVID test

A college with a Covid-19 prevalencde of 10% is using a test that is positive with probability 90% if an individual is infected, and positive with probability of 5% if the individual is not.

  • For a randomly selected student at the college, what is the probability that the test will be positive?

\[\begin{align} P(+) &= P(+|C)P(C) + P(+|H)P(H)\\ &= 0.9*0.1 + 0.05*0.9\\ &= 0.135 \end{align}\]

  • Give that a student has tested positive, what is the probability the student is actually infected?

\[\begin{align} P(C|+) &= \frac{P(+|C)P(C)}{P(+)}\\ &= \frac{0.9*0.1}{0.135}\\ &= 0.67 \end{align}\]

2.3 Random variables and univariate distributions

Random variable is an unknown quantity characterized by a probability distribution
RV Discrete Continuous
Outcome \(y\) countable uncountable
prop. of pdf \(0\leq p(y) \leq1\) \(0\leq p(y)\)
\(\sum_{y\in Y}p(y)=1\) \(\int_{Y}p(y)dy = 1\)
cdf \(F(a)\) \(F(a) = \sum_{y\leq a}p(y)\) \(F(a)=\int^{a}_{-\infty}p(y)dy\)
mean \(E(Y)=\sum_{y\in Y}p(y)\) \(E(Y)=\int_{Y}y p(y)dy\)
  • CDF: \(F(a) = P(Y\leq a)\)
  • Variance: \(Var(Y) = E(Y-E(Y))^2 = E(Y^2) - (E(Y))^2\)
Binomial distribution

\[p(Y=y|\theta) = dbinom(y,n,\theta) = {n\choose y} \theta^y (1-\theta)^{n-y}\]

Poisson distribution

Let \(\mathcal{Y} = \{0,1,2,\dots\}\). The uncertain quantity \(Y\in \mathcal{Y}\) has a poisson distribution with mean \(\theta\) if

\[p(Y=y|\theta) = dpois(y,\theta) = \theta^{y}\frac{e^{-\theta}}{y!}\]

2.4 Description of distributions

  • Expection
    • \(E[Y] = \sum_{y\in\mathcal{Y}yp(y)}\) if \(Y\) is discrete.
    • \(E[Y] = \int_{y\in\mathcal{Y}yp(y)}\) if \(Y\) is discrete.
  • Mode
    • The most probable value of \(Y\)
  • Median
    • The value of \(Y\) in the middle of the distribution
  • Variance \[\begin{align} Var[Y] &= E[(Y-E[Y])^2]\\ &= E[Y^2-2YE[Y] + E[Y]^2]\\ &= E[Y^2] - 2E[Y]^2 + E[Y]^2\\ &= E[Y^2] - E[Y]^2 \end{align}\]

2.5 Joint distribution

Marginal
Discrete Continuous
\(P_{y1}(y_1)=\sum_{y_2\in Y_2}p_{Y_1, Y_2}(y_1, y_2)\) \(p_{Y_1}(y_1) = \int_{y_2}p_{Y_1,Y_2}(y_1,y_2)dy_2\)
  • Conditional: \(p_{Y_2|Y_1}(y_2|y_1) = \frac{p_{Y_1,Y_2}(y_1,y_2)}{p_{Y_1}(y_1)}\)

2.6 Proportionality

  • A function \(f(x)\) is proportional to \(g(x)\), denoted by \(f(x) \propto g(x)\)
    • \[f(x) = cg(x)\]

2.7 A Bayesian model

  • Random vector of data — \(Y\)
  • Probability distribution of \(Y\)\(p(y|\theta)\)
Bayes theorem applied to statistical model

\[p(\theta|y) = \frac{p(y,\theta)}{m(y)} = \frac{p(y|\theta)p(\theta)}{\int_{\Theta}p(y|\theta)p(\theta)d\theta}\]

  • \(p(\theta)\): the prior distribution
  • \(\Theta\): parameter space
  • \(p(y|\theta)\): the likelihood function.
  • \(p(\theta|y)\): the posterior distribution
  • \(m\): marginal distribution of \(Y\)

The posterior distribution expresses the experimenter’s updated beliefs about \(\theta\) in light of the observed data \(y\).

  • \(m(y)\): doesn’t depend on \(\theta\).

2.8 Conditional independence and Exhangeability

Conditional independence

\[P(A\cap B|C) = P(A|C)P(B|C)\]

  • Two events \(A\) and \(B\) are conditionally independent given event \(C\)
    • \(A\perp B|C\)

Knowing \(C\) and \(B\) gives no more information about \(A\) thatn does \(C\) by itself.

  • \[P(A|C\perp B) = P(A|C)\]
  • Conditional independence
    • Let \(Y_1,\dots,Y_n\) are conditionally indep. given \(\theta\). for every collection \(A_1,\dots,A_n\) of sets:
    • \[P(Y_1\in A_1,\dots, Y_n\in A_n |\theta) = \amalg^{n}_{i=1} P(Y_i\in A_i |\theta)\]

2.8.1 Exchangeability

\[p(y_1,\dots,y_n) = p(y_{\pi_1},\dots,y_{\pi_n})\]

for all permutations \(\pi\) of \(\{1,\dots,n\}\)

If we think of \(Y_1,\dots,Y_n\) as data, exchangeability says that the ordering of the data conveys no extra information than that in the observations themselves.

  • For example: time-series of weather is not exhangeable

  • i.i.d. data is exchangeable.

  • exchangeable does not imply unconditional independence