3  Chapter 3: One-parameter models

Author

Shao-Ting Chiu

Published

September 1, 2022

3.1 Key messages

  • One-parameter models
    • Binomial model
    • Poisson model
  • Bayesian data analysis
    • Conjugate prior distribution
    • Predictive distribution
    • Confidence regions

3.2 The binomial model

\[p(\theta|y) \propto p(y|\theta)\]

Calculus

\[\int^{1}_{0}\theta^{a-1}(1-\theta)^{b-1}d\theta = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}\]

where \(\Gamma(n) = (n-1)!\).

3.3 The beta distribution

\[p(\theta) = dbeta(\theta, a, b) = \frac{\Gamma(a+b)}{\Gamma(a)\Gamma(b)}\theta^{a-1}(1-\theta)^{b-1}\quad \text{for}~0\leq \theta\leq 1\]

  • \(E[\theta]=\frac{a}{a+b}\)
  • \(Var[\theta] = \frac{ab}{(a+b+1)(a+b)^2} = \frac{E[\theta]E[1-\theta]}{a+b+}\)

3.4 Inference for exchangeable binary data

If \(Y_{1},\dots,Y_n|\theta\) are i.i.d. binary (\(\theta\)):

\[p(\theta|y_1,\dots,y_n) = \frac{\theta^{\sum y_i}(1-\theta)^{n-\sum y_{i}} \times p(\theta)}{p(y_1,\dots,y_n)} \tag{3.1}\]

3.5 Sufficient statistics

If compare the relative probabilities of any two \(\theta\)-values, \(\theta_a\) and \(\theta_b\) (from Equation 3.1):

\[\frac{p(\theta_a|y_1,\dots,y_n)}{p(\theta_b|y_1,\dots,y_n)} = (\frac{\theta_{a}}{\theta_b})^{\sum y_i}(\frac{1-\theta_{a}}{1-\theta_b})^{n - \sum y_i}\frac{p(\theta_a)}{p(\theta_b)} \tag{3.2}\]

Equation 3.2 shows that

\[p(\theta\in A|Y_1=y_1,\dots,Y_n = y_n) = p(\theta \in A|\sum^{n}_{i=1} Y_i=\sum^{n}_{i=1}y_i)\]

\(\sum^{n}_{i=1} Y_i\) is a sufficient statistic for \(\theta\) and \(p(y_1,\dots,y_n|\theta)\). It is sufficient to know \(\sum Y_i\) to make inference about \(\theta\).

In this case where \(Y_1, \dots, Y_n|\theta\) are i.i.d. binary (\(\theta\)) random variables, the sufficient statistics \(Y=\sum^{n}_{i=1} Y_i\) has a binomial distribution with parameters \((n,\theta)\).

3.6 Conjugacy

  • Beta prior and binomial sampling leads to beta posterior
    • beta prior is conjugate for the binomial sampling.
Definition: Conjugate

A class \(\mathcal{P}\) of prior distribution for \(\theta\) is called conjugate for a sampling model \(p(y|\theta)\) if

\[p(\theta) \in \mathcal{P} \Rightarrow p(\theta|y) \in \mathcal{P}\]

  • Conjugate priors make posterior calculations easy.

3.7 Combining information

If \(\theta|Y=y \sim beta(a+y, b+n-y)\), then

\[\begin{aligned} E[\theta|y] &=\frac{a+y}{a+b+n}\\ &= \frac{a+b}{a+b+n}\underbrace{\frac{a}{a+b}}_{\text{prior expectation}} + \frac{n}{a+b+n}\underbrace{\frac{y}{n}}_{\text{data average}}\\ \end{aligned} \tag{3.3}\]

From Equation 3.3, the posterior expectation is a weighted average of the prior expectation and the sample average. This leads to interpretation of \(a\) and \(b\) as “prior data”:

  • \(a\): prior number of 1’s
  • \(b\): prior number of 0’s
  • \(a+b\): prior sample size

3.8 Predictive distribution

The predictive distribution of \(\tilde{Y}\) is the conditional distribution of \(\tilde{Y}\) given \(\{Y_1=y_1,\dots,Y_n=y_n\}\)

\[Pr(\tilde{Y}=1|y_1,\dots,y_n) = E[\theta|y_1,\dots,y_n] = \frac{a+\sum^{n}_{i=1}y_i}{a+b+n}\]

  1. Predictive distribution does not depend on any unknown quantities.
  2. Predictive distribution depends on our observed data.

3.9 Confidence regions

Bayesian coverage

An interval \([l(y), u(y)]\), based on the observed data \(Y=y\), has \(95\%\) Bayesian coverage for \(\theta\) if \[Pr(l(y)<\theta<u(y)|Y=y) = .95 \tag{3.4}\]

  • Equation 3.4 describes the information about the true value of \(\theta\) after observing \(Y=y\).
  • post-experimental coverage
Frequentist coverage

A random interval \([l(Y), u(Y)]\) has \(95\%\) frequentist coverage for \(\theta\) if, before the data are gathered,

\[Pr(l(Y) < \theta < u(Y)|\theta) = .95 \tag{3.5}\]

  • Equation 3.5 describes the probability that the interval will cover the true value before the data are observed
  • pre-experimental coverage

3.10 Binomial distribution

\[p(Y=y|\theta) = dbinom(y,n,\theta) = {n\choose y}\theta^{y}(1-\theta)^{n-y},\quad y\in\{0,1,\dots, n\}\]

3.11 The Poisson model

Poisson distribution

\[Pr(Y=y|\theta) = \theta^{y}\frac{e^{-\theta}}{y!}\quad \text{for} y\in \{0,1,2,\dots\}\]

  • \(E[Y|\theta] = \theta\)
  • \(Var[Y|\theta] = \theta\)

3.11.1 Posterior inference

Let \(Y_1,\dots,Y_n\) as i.i.d. Poisson with mean \(\theta\), then the joint pdf is

\[\begin{aligned} Pr(Y_1 =y_1,\dots, Y_n = y_n |\theta) &= \prod^{n}_{i=1} p(y_i|\theta)\\ &= \prod^{n}_{i=1} \frac{1}{y_{i}!} \theta^{y_i}e^{-\theta}\\ &= c(y_1, \dots, y_n)\theta^{\sum y_i}e^{-n\theta} \end{aligned}\]

{#eq-pois-son}

3.12 Some one-parameter models

3.13 Bayesian prediction

3.13.1 The marginal

\[\begin{aligned} p(y) &= \int p(y,\theta)d\theta\\ &= \int_{\Theta}p(y|\theta)p(\theta)d\theta \end{aligned}\]

3.13.2 Posterior predictive distribution

Let \(\bar{Y}\) be a data point that is yet to be observed.

\[\begin{aligned} p(\bar{y}|y) &= \int_{\Theta} p(\bar{y}, \theta|y)d\theta\\ &= \int_{\Theta} p(\bar{y}|\theta,y)p(\theta|y)d\theta \end{aligned}\]

3.13.3 Sufficient statistics

Comparing two values of \(\theta\) a poseteriori,

\[\frac{p(\theta_a|y_1,\dots,y_n)}{p(\theta_b|y_1,\dots,y_n)} = \frac{e^{-n\theta_a}}{-n\theta_b}\frac{\theta_{a}^{\sum y_i}}{\theta_{b}^{\sum y_i}}\frac{p(\theta_a)}{p(\theta_b)}\]

3.13.4 Conjugate prior

\[p(\theta|y_1,\dots,y_n) \propto p(\theta) \times \underbrace{p(y_1,\dots,y_n|\theta)}_{\theta^{\sum y_i} e^{-n\theta}}\]

  • \(\theta^{c_1}e^{-c_2 \theta}\): Gamma distribution
Gamma distribution

\[p(\theta) = \frac{b^a}{\Gamma(a)}\theta^{a-1}e^{-b\theta} \quad \text{for } \theta, a, b > 0\]

  • \(E[\theta] = \frac{a}{b}\)
  • \(Var[\theta] = \frac{a}{b^2}\)
Gamma pdf integration

\[\int^{\infty}_{0} \theta^{a-1}e^{-b\theta}d\theta = \frac{\Gamma(a)}{b^a}\]

3.14 Jeffreys prior

3.15 Gamma Distribution

Conjuagate prior of Poisson data

\[p(\theta) = \frac{b^a}{\Gamma(a)}\theta^{a-1} e^{-b\theta}I_{0,\infty}(\theta)\]

  • posterior of poisson data

\[E(\theta|y) = \frac{a+n\bar{y}}{b+n} = \frac{b}{b+n}\frac{a}{b} + \frac{n}{b+n}\frac{n\bar{y}}{n} = (1-\omega_n)E(\theta) + \omega_n \bar{y}\]

a=1; b=1
curve(dgamma(x,a,b),0, 10)

a=4; b=4
curve(dgamma(x,a,b),0, 10)

a=16; b=4
curve(dgamma(x,a,b),0, 10)

3.16 Exponential Families and conjugate priors

  • \(p(y|\phi) = h(y)c(\phi)e^{\phi t(y)}\)
    • \(\phi\) is unknown parameter
    • \(t(y)\) is the sufficient statistic
  • General exponential family models for particular prior
    • \(p(\phi|n_0,t_0) = \kappa(n_0,t_0)c(\phi)^{n_0}e^{n_0 t_0 \phi}\)
    • have to posterior distribution

\[\begin{align} p(\phi|y_1,\dots, y_n) &\propto p(\phi)p(y_1,\dots,y_n|\phi)\\ &\propto c(\phi)^{n_0 + n} \exp \left(\phi \times \left[ n_0 t_0 + \sum_{i=1}^{n}t(y_i) \right]\right)\\ &\propto p(\phi|n_0 +n, n_0t_0 +n \bar{t}(y)) \end{align}\]

where \(\bar{t}(y)=\frac{\sum t(y_i)}{n}\)

3.17 Mixture distribution

http://www.mas.ncl.ac.uk/~nmf16/teaching/mas3301/week11.pdf

3.18 Installation

R installation: https://www.drdataking.com/post/how-to-add-existing-r-to-jupyter-notebook/