2 Chapter 2: Conditional distributions and Bayes rule

Author

Shao-Ting Chiu

Published

August 30, 2022

2.1 Axioms of probability

Let $F$, $G$ and $H$ be three possibly overlapping statements.

0 = Pr(not H|H) $\leq$ Pr(F|H) $\leq$ Pr(H|H) = 1
Pr(F$\cup$G|H) = Pr(F|H) + Pr(G|H) if $F\cap G=\emptyset$
$Pr(F\cap G|H)=Pr(G|H)Pr(F|G\cap H)$

2.2 Events and partition

Sample space $S$
Partition
- a collection of sets $A_1,\dots, A_m$
- $A_{i} \cap A_j = \emptyset$
Conditional probability
- Let $B$ be an event, and $A_i,\dots,A_m$ be a partition of $S$
- $P(B|A_i) = \frac{P(B\cap A_i)}{P(A_i)}$
Bayes rule
- $P(A_j |B) = \frac{P(B|A_j)P(A_j)}{P(B)} = \frac{P(B|A_j)P(A_j)}{\sum^{m}_{i=1}P(B|A_i)P(Ai)}$

Example: COVID test

A college with a Covid-19 prevalencde of 10% is using a test that is positive with probability 90% if an individual is infected, and positive with probability of 5% if the individual is not.

For a randomly selected student at the college, what is the probability that the test will be positive?

\[\begin{align} P(+) &= P(+|C)P(C) + P(+|H)P(H)\\ &= 0.9*0.1 + 0.05*0.9\\ &= 0.135 \end{align}\]

Give that a student has tested positive, what is the probability the student is actually infected?

\[\begin{align} P(C|+) &= \frac{P(+|C)P(C)}{P(+)}\\ &= \frac{0.9*0.1}{0.135}\\ &= 0.67 \end{align}\]

2.3 Random variables and univariate distributions

Random variable is an unknown quantity characterized by a probability distribution
RV	Discrete	Continuous
Outcome $y$	countable	uncountable
prop. of pdf	$0\leq p(y) \leq1$	$0\leq p(y)$
	$\sum_{y\in Y}p(y)=1$	$\int_{Y}p(y)dy = 1$
cdf $F(a)$	$F(a) = \sum_{y\leq a}p(y)$	$F(a)=\int^{a}_{-\infty}p(y)dy$
mean	$E(Y)=\sum_{y\in Y}p(y)$	$E(Y)=\int_{Y}y p(y)dy$

CDF: $F(a) = P(Y\leq a)$
Variance: $Var(Y) = E(Y-E(Y))^2 = E(Y^2) - (E(Y))^2$

Binomial distribution

\[p(Y=y|\theta) = dbinom(y,n,\theta) = {n\choose y} \theta^y (1-\theta)^{n-y}\]

Poisson distribution

Let $\mathcal{Y} = \{0,1,2,\dots\}$. The uncertain quantity $Y\in \mathcal{Y}$ has a poisson distribution with mean $\theta$ if

\[p(Y=y|\theta) = dpois(y,\theta) = \theta^{y}\frac{e^{-\theta}}{y!}\]

2.4 Description of distributions

Expection
- $E[Y] = \sum_{y\in\mathcal{Y}yp(y)}$ if $Y$ is discrete.
- $E[Y] = \int_{y\in\mathcal{Y}yp(y)}$ if $Y$ is discrete.
Mode
- The most probable value of $Y$
Median
- The value of $Y$ in the middle of the distribution
Variance \[\begin{align} Var[Y] &= E[(Y-E[Y])^2]\\ &= E[Y^2-2YE[Y] + E[Y]^2]\\ &= E[Y^2] - 2E[Y]^2 + E[Y]^2\\ &= E[Y^2] - E[Y]^2 \end{align}\]

2.5 Joint distribution

Marginal
Discrete	Continuous
$P_{y1}(y_1)=\sum_{y_2\in Y_2}p_{Y_1, Y_2}(y_1, y_2)$	$p_{Y_1}(y_1) = \int_{y_2}p_{Y_1,Y_2}(y_1,y_2)dy_2$

Conditional: $p_{Y_2|Y_1}(y_2|y_1) = \frac{p_{Y_1,Y_2}(y_1,y_2)}{p_{Y_1}(y_1)}$

2.6 Proportionality

A function $f(x)$ is proportional to $g(x)$, denoted by $f(x) \propto g(x)$
- \[f(x) = cg(x)\]

2.7 A Bayesian model

Random vector of data — $Y$
Probability distribution of $Y$ — $p(y|\theta)$

Bayes theorem applied to statistical model

\[p(\theta|y) = \frac{p(y,\theta)}{m(y)} = \frac{p(y|\theta)p(\theta)}{\int_{\Theta}p(y|\theta)p(\theta)d\theta}\]

$p(\theta)$: the prior distribution
$\Theta$: parameter space
$p(y|\theta)$: the likelihood function.
$p(\theta|y)$: the posterior distribution
$m$: marginal distribution of $Y$

The posterior distribution expresses the experimenter’s updated beliefs about $\theta$ in light of the observed data $y$.

$m(y)$: doesn’t depend on $\theta$.

2.8 Conditional independence and Exhangeability

Conditional independence

\[P(A\cap B|C) = P(A|C)P(B|C)\]

Two events $A$ and $B$ are conditionally independent given event $C$
- $A\perp B|C$

Knowing $C$ and $B$ gives no more information about $A$ thatn does $C$ by itself.

\[P(A|C\perp B) = P(A|C)\]

Conditional independence
- Let $Y_1,\dots,Y_n$ are conditionally indep. given $\theta$. for every collection $A_1,\dots,A_n$ of sets:
- \[P(Y_1\in A_1,\dots, Y_n\in A_n |\theta) = \amalg^{n}_{i=1} P(Y_i\in A_i |\theta)\]

2.8.1 Exchangeability

\[p(y_1,\dots,y_n) = p(y_{\pi_1},\dots,y_{\pi_n})\]

for all permutations $\pi$ of $\{1,\dots,n\}$

If we think of $Y_1,\dots,Y_n$ as data, exchangeability says that the ordering of the data conveys no extra information than that in the observations themselves.

For example: time-series of weather is not exhangeable
i.i.d. data is exchangeable.
exchangeable does not imply unconditional independence

--- title: 'Chapter 2: Conditional distributions and Bayes rule' date: '20220830' author: 'Shao-Ting Chiu' --- ## Axioms of probability Let $F$, $G$ and $H$ be three possibly overlapping statements. 1. 0 = Pr(not H|H) $\leq$ Pr(F|H) $\leq$ Pr(H|H) = 1 2. Pr(F$\cup$G|H) = Pr(F|H) + Pr(G|H) if $F\cap G=\emptyset$ 3. $Pr(F\cap G|H)=Pr(G|H)Pr(F|G\cap H)$ ## Events and partition - Sample space $S$ - Partition - a collection of sets $A_1,\dots, A_m$ - $A_{i} \cap A_j = \emptyset$ - Conditional probability - Let $B$ be an event, and $A_i,\dots,A_m$ be a partition of $S$ - $P(B|A_i) = \frac{P(B\cap A_i)}{P(A_i)}$ - Bayes rule - $P(A_j |B) = \frac{P(B|A_j)P(A_j)}{P(B)} = \frac{P(B|A_j)P(A_j)}{\sum^{m}_{i=1}P(B|A_i)P(Ai)}$ ::: {.callout-tip} ## Example: COVID test > A college with a Covid-19 prevalencde of 10% is using a test that is positive with probability 90% if an individual is infected, and positive with probability of 5% if the individual is not. > - For a randomly selected student at the college, what is the probability that the test will be positive? $$\begin{align} P(+) &= P(+|C)P(C) + P(+|H)P(H)\\ &= 0.9*0.1 + 0.05*0.9\\ &= 0.135 \end{align}$$ > - Give that a student has tested positive, what is the probability the student is actually infected? $$\begin{align} P(C|+) &= \frac{P(+|C)P(C)}{P(+)}\\ &= \frac{0.9*0.1}{0.135}\\ &= 0.67 \end{align}$$ ::: ## Random variables and univariate distributions |RV|Discrete|Continuous| |---|---|---| |Outcome $y$|countable|uncountable| |prop. of pdf|$0\leq p(y) \leq1$|$0\leq p(y)$| ||$\sum_{y\in Y}p(y)=1$|$\int_{Y}p(y)dy = 1$| |cdf $F(a)$|$F(a) = \sum_{y\leq a}p(y)$|$F(a)=\int^{a}_{-\infty}p(y)dy$| |mean|$E(Y)=\sum_{y\in Y}p(y)$|$E(Y)=\int_{Y}y p(y)dy$| : Random variable is an unknown quantity characterized by a probability distribution - CDF: $F(a) = P(Y\leq a)$ - Variance: $Var(Y) = E(Y-E(Y))^2 = E(Y^2) - (E(Y))^2$ ::: {.callout-note} ### Binomial distribution $$p(Y=y|\theta) = dbinom(y,n,\theta) = {n\choose y} \theta^y (1-\theta)^{n-y}$$ ::: ::: {.callout-note} ### Poisson distribution Let $\mathcal{Y} = \{0,1,2,\dots\}$. The uncertain quantity $Y\in \mathcal{Y}$ has a *poisson distribution with mean $\theta$* if $$p(Y=y|\theta) = dpois(y,\theta) = \theta^{y}\frac{e^{-\theta}}{y!}$$ ::: ## Description of distributions - Expection - $E[Y] = \sum_{y\in\mathcal{Y}yp(y)}$ if $Y$ is discrete. - $E[Y] = \int_{y\in\mathcal{Y}yp(y)}$ if $Y$ is discrete. - Mode - The most probable value of $Y$ - Median - The value of $Y$ in the middle of the distribution - Variance $$\begin{align} Var[Y] &= E[(Y-E[Y])^2]\\ &= E[Y^2-2YE[Y] + E[Y]^2]\\ &= E[Y^2] - 2E[Y]^2 + E[Y]^2\\ &= E[Y^2] - E[Y]^2 \end{align}$$ ## Joint distribution |Discrete|Continuous| |---|---| |$P_{y1}(y_1)=\sum_{y_2\in Y_2}p_{Y_1, Y_2}(y_1, y_2)$|$p_{Y_1}(y_1) = \int_{y_2}p_{Y_1,Y_2}(y_1,y_2)dy_2$| : Marginal - Conditional: $p_{Y_2|Y_1}(y_2|y_1) = \frac{p_{Y_1,Y_2}(y_1,y_2)}{p_{Y_1}(y_1)}$ ## Proportionality - A function $f(x)$ is proportional to $g(x)$, denoted by $f(x) \propto g(x)$ - $$f(x) = cg(x)$$ ## A Bayesian model - Random vector of data --- $Y$ - Probability distribution of $Y$ --- $p(y|\theta)$ ::: {.callout-tip} ### Bayes theorem applied to statistical model $$p(\theta|y) = \frac{p(y,\theta)}{m(y)} = \frac{p(y|\theta)p(\theta)}{\int_{\Theta}p(y|\theta)p(\theta)d\theta}$$ - $p(\theta)$: the prior distribution - $\Theta$: parameter space - $p(y|\theta)$: the likelihood function. - $p(\theta|y)$: the posterior distribution - $m$: marginal distribution of $Y$ The **posterior distribution** expresses the experimenter's updated beliefs about $\theta$ in light of the observed data $y$. ::: - $m(y)$: doesn't depend on $\theta$. ## Conditional independence and Exhangeability ::: {.callout-important} ### Conditional independence $$P(A\cap B|C) = P(A|C)P(B|C)$$ - Two events $A$ and $B$ are conditionally independent given event $C$ - $A\perp B|C$ > Knowing $C$ and $B$ gives no more information about $A$ thatn does $C$ by itself. - $$P(A|C\perp B) = P(A|C)$$ ::: - Conditional independence - Let $Y_1,\dots,Y_n$ are conditionally indep. given $\theta$. for every collection $A_1,\dots,A_n$ of sets: - $$P(Y_1\in A_1,\dots, Y_n\in A_n |\theta) = \amalg^{n}_{i=1} P(Y_i\in A_i |\theta)$$ ### Exchangeability $$p(y_1,\dots,y_n) = p(y_{\pi_1},\dots,y_{\pi_n})$$ for all permutations $\pi$ of $\{1,\dots,n\}$ > If we think of $Y_1,\dots,Y_n$ as data, exchangeability says that the *ordering* of the data conveys no extra information than that in the observations themselves. - For example: time-series of weather is not exhangeable - i.i.d. data is exchangeable. - exchangeable does *not* imply **unconditional independence**

RV	Discrete	Continuous
Outcome \(y\)	countable	uncountable
prop. of pdf	\(0\leq p(y) \leq1\)	\(0\leq p(y)\)
	\(\sum_{y\in Y}p(y)=1\)	\(\int_{Y}p(y)dy = 1\)
cdf \(F(a)\)	\(F(a) = \sum_{y\leq a}p(y)\)	\(F(a)=\int^{a}_{-\infty}p(y)dy\)
mean	\(E(Y)=\sum_{y\in Y}p(y)\)	\(E(Y)=\int_{Y}y p(y)dy\)