2 Chapter 2: Conditional distributions and Bayes rule
2.1 Axioms of probability
Let \(F\), \(G\) and \(H\) be three possibly overlapping statements.
- 0 = Pr(not H|H) \(\leq\) Pr(F|H) \(\leq\) Pr(H|H) = 1
- Pr(F\(\cup\)G|H) = Pr(F|H) + Pr(G|H) if \(F\cap G=\emptyset\)
- \(Pr(F\cap G|H)=Pr(G|H)Pr(F|G\cap H)\)
2.2 Events and partition
- Sample space \(S\)
- Partition
- a collection of sets \(A_1,\dots, A_m\)
- \(A_{i} \cap A_j = \emptyset\)
- Conditional probability
- Let \(B\) be an event, and \(A_i,\dots,A_m\) be a partition of \(S\)
- \(P(B|A_i) = \frac{P(B\cap A_i)}{P(A_i)}\)
- Bayes rule
- \(P(A_j |B) = \frac{P(B|A_j)P(A_j)}{P(B)} = \frac{P(B|A_j)P(A_j)}{\sum^{m}_{i=1}P(B|A_i)P(Ai)}\)
2.3 Random variables and univariate distributions
RV | Discrete | Continuous |
---|---|---|
Outcome \(y\) | countable | uncountable |
prop. of pdf | \(0\leq p(y) \leq1\) | \(0\leq p(y)\) |
\(\sum_{y\in Y}p(y)=1\) | \(\int_{Y}p(y)dy = 1\) | |
cdf \(F(a)\) | \(F(a) = \sum_{y\leq a}p(y)\) | \(F(a)=\int^{a}_{-\infty}p(y)dy\) |
mean | \(E(Y)=\sum_{y\in Y}p(y)\) | \(E(Y)=\int_{Y}y p(y)dy\) |
- CDF: \(F(a) = P(Y\leq a)\)
- Variance: \(Var(Y) = E(Y-E(Y))^2 = E(Y^2) - (E(Y))^2\)
2.4 Description of distributions
- Expection
- \(E[Y] = \sum_{y\in\mathcal{Y}yp(y)}\) if \(Y\) is discrete.
- \(E[Y] = \int_{y\in\mathcal{Y}yp(y)}\) if \(Y\) is discrete.
- Mode
- The most probable value of \(Y\)
- Median
- The value of \(Y\) in the middle of the distribution
- Variance \[\begin{align} Var[Y] &= E[(Y-E[Y])^2]\\ &= E[Y^2-2YE[Y] + E[Y]^2]\\ &= E[Y^2] - 2E[Y]^2 + E[Y]^2\\ &= E[Y^2] - E[Y]^2 \end{align}\]
2.5 Joint distribution
Discrete | Continuous |
---|---|
\(P_{y1}(y_1)=\sum_{y_2\in Y_2}p_{Y_1, Y_2}(y_1, y_2)\) | \(p_{Y_1}(y_1) = \int_{y_2}p_{Y_1,Y_2}(y_1,y_2)dy_2\) |
- Conditional: \(p_{Y_2|Y_1}(y_2|y_1) = \frac{p_{Y_1,Y_2}(y_1,y_2)}{p_{Y_1}(y_1)}\)
2.6 Proportionality
- A function \(f(x)\) is proportional to \(g(x)\), denoted by \(f(x) \propto g(x)\)
- \[f(x) = cg(x)\]
2.7 A Bayesian model
- Random vector of data — \(Y\)
- Probability distribution of \(Y\) — \(p(y|\theta)\)
- \(m(y)\): doesn’t depend on \(\theta\).
2.8 Conditional independence and Exhangeability
- Conditional independence
- Let \(Y_1,\dots,Y_n\) are conditionally indep. given \(\theta\). for every collection \(A_1,\dots,A_n\) of sets:
- \[P(Y_1\in A_1,\dots, Y_n\in A_n |\theta) = \amalg^{n}_{i=1} P(Y_i\in A_i |\theta)\]
2.8.1 Exchangeability
\[p(y_1,\dots,y_n) = p(y_{\pi_1},\dots,y_{\pi_n})\]
for all permutations \(\pi\) of \(\{1,\dots,n\}\)
If we think of \(Y_1,\dots,Y_n\) as data, exchangeability says that the ordering of the data conveys no extra information than that in the observations themselves.
For example: time-series of weather is not exhangeable
i.i.d. data is exchangeable.
exchangeable does not imply unconditional independence