12  Homework 8

Author

Department of Electrical and Computer Engineering, Texas A&M University

Published

January 20, 2023

12.1 Description

  • Course: STAT638, 2022 Fall

Read Chapter 8 in the Hoff book. Then do the following exercises in Hoff: 8.1 and 8.3.

Please note some typos in 8.1: All θi’s should be θj’s.

For 8.1(c), you may find the law of total (co-)variance useful. In addition, remember that all of these laws also hold for conditional distributions (e.g., when conditioning on additional quantities such as μ and τ2 in all terms on the left- and right-hand side of the equation).

12.2 Computational Enviromnent Setup

12.2.1 Third-party libraries

using Pkg
Pkg.activate("hw8")
using Distributions
using DataFrames
using Turing
using Plots
using DelimitedFiles
using LinearAlgebra
using Statistics
using Turing
using StatsBase
using StatsPlots
import Random
Random.seed!(2022)
  Activating project at `~/Documents/GitHub/STAT638_Applied-Bayes-Methods/hw/hw8`
Random.TaskLocalRNG()

12.2.2 Version

Pkg.status()
VERSION
Status `~/Documents/GitHub/STAT638_Applied-Bayes-Methods/hw/hw8/Project.toml`
⌃ [a93c6f00] DataFrames v1.4.2
⌃ [31c24e10] Distributions v0.25.76
⌃ [be115224] MCMCDiagnosticTools v0.1.4
⌃ [91a5bcdd] Plots v1.35.5
  [2913bbd2] StatsBase v0.33.21
  [f3b207a7] StatsPlots v0.15.4
⌃ [fce5fe82] Turing v0.21.12
  [8bb1440f] DelimitedFiles
  [10745b16] Statistics
Info Packages marked with ⌃ have new versions available and may be upgradable.
v"1.8.2"

12.2.3 Problem 8.1

Components of variance: Consider the hierarchical model where

θ1,,θm|μ,τ2i.i.d.normal(μ,τ2)

y1,j,,ynj,j|θj,σ2i.i.d.normal(θj,σ2) For this problem, we will eventually compute the following:

  • Var[yi,j|θi,σ2], Var[y¯,j|θi,σ2], Cov[yi1,j,yi2,j|θj,σ2]
  • Var[yi,j|μ,τ2], Var[y¯,j|μ,τ2], Cov[yi1,j,yi2,j|μ,τ2] First, lets use our intuition to guess at the answers:

12.2.4 (a)

Which do you think is bigger, Var[yi,j|θi,σ2] or Var[yi,j|μ,τ2]? To guide your intuition, you can interpret the first as the variability of the Y’s when sampling from a fixed group, and the second as the variability in first sampling a group, then sampling a unit from within the group.

  • Var[yi,j|μ,τ2] because θj is uncertain and the between-group varibability create additional uncertainty.

12.2.5 (b)

Do you think Cov[yi1,j,yi2,j|θj,σ2] is negative, positive, or zero? Answer the same for Cov[yi1,j,yi2,j|μ,τ2]. You may want to think about what yi2,j tells you about yi1,j if θj is known, and what it tells you when θj is unknown.

Cov[yi1,j,yi2,j|θj,σ2]

Because yi1,j and yi2,j is i.i.d. sampled, I expect Cov[yi1,j,yi2,j|θj,σ2] to be zero.

Cov[yi1,j,yi2,j|μ,τ2]

y1,j does tell information about y2,j. The covariance Cov[yi1,j,yi2,j|μ,τ2] is likely to be positive because values from same θj tend to be close together.

12.2.6 (c)

Now compute each of the six quantities above and compare to your answers in (a) and (b).

Var[yi,j|θi,σ2]=σ2

Var[y¯,j|θi,σ2]=Var[i=1njyi,j/n|θi,σ2]=1n2Var[i=1njyi,j|θi,σ2]=1n2i=1njVar[yi,j|θi,σ2]=1nVar[yi,j|θi,σ2]=σ2n

Cov[yi1,j,yi2,j|θj,σ2]=E[yi1,jyi2,j]E[yi1,j]E[yi2,j]=E[yi1,j]E[yi2,j]E[yi1,j]E[yi2,j]=0

Var[yi,j|μ,τ2]=E(Var[yi,j|μ,τ2,θ,σ2]|μ,τ2)+Var(E[yi,j|μ,τ2,θ,σ2]|μ,τ2)=E(σ2|μ,τ2)+Var(θ|μ,τ2)=σ2+τ2

Var[y¯,j|μ,τ2]=E(Var[y¯,j|μ,τ2,θ,σ2]|μ,τ2)+Var(E[y¯,j|μ,τ2,θ,σ2]|μ,τ2)=E(σ2n|μ,τ2)+Var(θ|μ,τ2)=σ2n+τ2

Cov[yi1,j,yi2,j|μ,τ2]=E(Cov[yi1,j,yi2,j|θ,σ2,μ,τ2]|μ,τ2)+Cov(E[yi1,j|θ,σ2,μ,τ2],E[yi2,j|θ,σ2,μ,τ2]|μ,τ2)=0+Cov(θ,θ|μ,τ2)=E[θ2|μ,τ2]E[θ|μ,τ2]2=Var(θ|μ,τ2)=τ2

12.2.7 (d)

Now assume we have a prior p(μ) for μ. Using Bayes’ rule, show that p(μ|θ1,,θm,σ2,τ2,y1,,ym)=p(μ|θ1,,θm,τ2) Interpret in words what this means.

p(μ|θ1,,θm,σ2,τ2,y1,,ym)=p(σ2,y1,,ym|μ,θ1,,θm,τ2)p(μ|θ1,,θm,τ2)p(σ2,y1,,ym|θ1,,θm,τ2)=p(μ|θ1,,θm,τ2)

where p(σ2,y1,,ym|μ,θ1,,θm,τ2)=p(σ2,y1,,ym|θ1,,θm,τ2) because knowing μ doesn’t provide more information when θ1,,θm are known.

12.3 Problem 8.3

Herarchical modeling: The files school1.dat through school8.dat give weekly hours spent on homework for students sampled from eight different schools. Obtain posterior distributions for the true means for the eight different schools using a herarchical normal model with the following prior parameters: μ0=7,γ02=5,τ02=10,η0=2,σ02=15,ν0=2

dsch = Dict()
nsch = 8
for i in 1:nsch
   dsch[i] = readdlm("data/school$i.dat")
end

12.3.1 (a)

Run a Gibbs sampling algorithm to approximate the posterior distribution of {θ,σ2,μ,τ2}. Assess the convergence of the Markov chain, and find the effective sample size for {σ2,μ,τ2}. Run the chain long enough so that the effective sample sizes are all above 1000.

# Prior
μ0 = 7.
γ0² = 5.
τ0² = 10.
η0 = 2.
σ0² = 15.
ν0 = 2.

# Data
ns = [ length(dsch[i]) for i in 1:nsch]
n = sum(ns)
m = length(dsch)
ȳs = [mean(dsch[i]) for i in 1:nsch]
s²s = [ (ns[i] - 1)^-1 * sum( (dsch[i] .- ȳs[i]).^2) for i in 1:nsch]
# posterior

function τ²_pos(m, η0, θv, μ, τ0²)
    ths² = sum([ (θ- μ)^2 for θ in θv])
    α = (m + η0)* 0.5
    β = (ths² + η0*τ0²)/2
    return InverseGamma(α, β)
end

function σ²_pos(n, ν0, σ0², ns, s²s, ȳs, θs)
    α = (n+ν0)/2
    β =( sum((ns .- 1) .* s²s .+ ns .* (ȳs .- θs).^2) + ν0*σ0²)/2
    return InverseGamma(α, β)
end

function μ_pos(m, τ², θs, γ0², μ0)
    γm² = (m/τ² + 1/γ0²)^-1
    θ̄ = mean(θs)

    a = γm²*(m*θ̄/τ² + μ0/τ²)
    return Normal(a, γm²)
end

function θ_pos(τ², ȳ, n, σ², μ)
    τ̃² = (n/σ² + 1/τ²)^-1
    a = τ̃²*(n*/σ² + μ/τ²)
    return Normal(a, τ̃²)
end

"""
Effective Sample Size
"""
function ess(v)
    n = length(v)
    c = sum(autocov(v, collect(1:n-1)))
    return n/(1+2*c)
end

# Sampling
smp = 4000
τ²s = zeros(smp)
σ²s = zeros(smp)
μs = zeros(smp)
θs = zeros(smp, m)


σ²s[1] = rand(InverseGamma0/2, ν0*σ0²/2))
τ²s[1] = rand(InverseGamma0 /2, η0 *τ0²/2))
μs[1] = rand(Normal0, γ0²))
#θs[1,:] = [rand(θ_pos(τ²s[1], ȳs[i], ns[i], σ²s[1], μs[1])) for i in 1:m]
θs[1,:] = rand(Normal(μs[1], τ²s[1]), m)

for s in 2:smp
    σ²s[s] = rand(σ²_pos(n, ν0, σ0², ns, s²s, ȳs, θs[s-1,:]))
    τ²s[s] = rand(τ²_pos(m, η0, θs[s-1,:], μs[s-1], τ0²))
    θs[s,:] = [rand(θ_pos(τ²s[s-1], ȳs[i], ns[i], σ²s[s-1], μs[s-1])) for i in 1:m]
    μs[s] = rand(μ_pos(m, τ²s[s-1], θs[s-1,:], γ0², μ0))
end
for i in [τ²s, σ²s, μs, θs]
    plot(i)
end

p1 = plot(τ²s[2:end], label="τ²")
p2 = plot(σ²s[2:end], label="σ²")
p3 = plot(μs[2:end], label="μ")

plot(p1, p2, p3)

  • Effetive Sample Size

  • τ2

ess(τ²s)
1523.525
  • σ2
ess(σ²s)
1234.234
  • μ
ess(μs)
1045.242

12.3.2 (b)

Compute posterior means and 95% confidence regions for {σ2,μ,τ2}. Also, compare the posterior densities to the prior densities, and discuss what was learned from the data.

  • σ2
quantile(σ²s, [0.025, 0.5, 0.975])
3-element Vector{Float64}:
 11.492557006363636
 14.15210768607951
 17.713837318481733
  • μ
quantile(μs, [0.025, 0.5, 0.975])
3-element Vector{Float64}:
 5.3131625605052655
 7.688237183428129
 8.736460757511468
  • τ
quantile(τ²s, [0.025, 0.5, 0.975])
3-element Vector{Float64}:
  1.8676028002352072
  4.466482143729323
 15.698240842165873
pu = density(μs, label="μₙ")
pt = density(σ²s,label="σ²ₙ")
ps = density(τ²s, label="τ²ₙ")

plot!(pu, Normal0 , γ0²), label= "μ0")
plot!(pt, InverseGamma0 /2, η0 *τ0²/2), label= "σ²0")
plot!(ps, InverseGamma0/2, ν0*σ0²/2), label= "τ²0")

xlims!(pt, 0,30)
xlims!(ps, 0,30)

plot(pu, pt, ps)

Estimations of μ and τ are similar in prior and posterior. However, σ2 is different.

12.3.3 (c)

Plot the posterior density of R=τ2σ2+τ2 and compare it to a plot of the prior density of R. Describe the evidence for between-school variation.

σ²_prs = rand(InverseGamma0/2, ν0*σ0²/2), 1000)
τ²_prs = rand(InverseGamma0 /2, η0 *τ0²/2), 1000)



R_prs = τ²_prs ./ (σ²_prs .+ τ²_prs)
R_pos = τ²s ./ (σ²s .+ τ²s)

pr = density(R_prs, label="R Prior", xlabel="R", ylabel="density")
density!(pr, R_pos, label="R posterior")

R represents the quantity of vairance in between-group. The prior is not certain about the specific quantity, but after applying posterior inference. The posterior probability of R is peaked and more cetain about the value is around 0.3.

12.3.4 (d)

Obtain the posterior probability that θ7 is smaller than θ6, as well as the posterior probability that θ7 is smaller than of all the θ’s.

  • p(θ7 is smaller than θ6)
mean(θs[:,7] .< θs[:,6])
0.53075
  • p(θ7 is smaller than of all the θ’s)
res = zeros(size(θs)[1])
for i in 1 : size(θs)[1]
    if argmin(θs[i,:]) == 7
        res[i] = 1
    end
end
mean(res)
0.339

12.3.5 (e)

Plot the sample averages y¯1,,y¯8 against the posterior expectations of θ1,,θ8, and describe the relationship. Also compute the sample mean of all observations and compare it to the posterior mean of μ.

psmp = scatter(ȳs, mean(θs, dims = 1)[1,:], xlabel="Sample Average", ylabel= "Posterior Expectation")

hline!(psmp, [mean(μs)], label="posterior mean (μn)")
hline!(psmp, [sum(ȳs .* ns)/n], label="Pooled sample mean (μ)" )


  1. I use special character in Julia code. Unfortunately, those are not displayed in PDF version.↩︎

  2. Var(Y)=E[Var(Y|X)]+Var(E[Y|X])↩︎