Bayesian Linear Regression with Gibbs Sampling

An in-depth guide on implementing Bayesian linear regression and Gibbs sampling for parameter estimation.

TL; DR

  • Bayesian linear regression assumes data follows a normal distribution given parameters.

  • Prior distributions for regression coefficients and variance are normally and inverse-gamma distributed respectively.

  • Gibbs sampling is used to sample from the posterior distribution when direct computation is infeasible.

  • The full conditional distributions for coefficients and variance are derived to facilitate sampling.

  • The Gibbs sampling procedure iterates between sampling regression coefficients and variance to approximate the posterior distribution.


Bayesian Linear Regression and Gibbs Sampling

Bayesian linear regression is a statistical method in which the statistical analysis is undertaken within the context of Bayesian inference. When performing Bayesian linear regression, we assume that the observed data, yy, given the parameters β\beta and σ2\sigma^2, follows a normal distribution:

yβ,σ2N(Xβ,σ2)y|\beta, \sigma^2 \sim N(X\beta, \sigma^2)

Prior Distributions

The prior distributions for the parameters are set as follows:

  • For β\beta:

    βN(β0,Λ0)\beta \sim N(\beta_0, \Lambda_0)

    with the prior probability density function (pdf) given by:

    f(x;β0,Λ0)=(2π)k2det(Λ0)12exp(12(ββ0)Λ01(ββ0))f(x; \beta_0, \Lambda_0) = (2\pi)^{-\frac{k}{2}} \text{det}(\Lambda_0)^{-\frac{1}{2}} \exp\left(-\frac{1}{2} (\beta - \beta_0)' \Lambda_{0}^{-1} (\beta - \beta_0)\right)
  • For σ2\sigma^2:

    σ2IG(a02,b02)\sigma^2 \sim IG\left(\frac{a_0}{2}, \frac{b_0}{2}\right)

    with the prior pdf given by:

    f(x;a02,b02)=(b02)a02Γ(a02)x(a02+1)exp(b02x)f(x; \frac{a_0}{2}, \frac{b_0}{2}) = \frac{\left(\frac{b_0}{2}\right)^{\frac{a_0}{2}}}{\Gamma\left(\frac{a_0}{2}\right)} x^{-\left(\frac{a_0}{2} + 1\right)} \exp\left(-\frac{b_0}{2x}\right)
Expand for Bayesian Theorem and Posterior Distribution

By Bayes' theorem, the posterior distribution is proportional to the product of the likelihood and the prior distributions:

Where:

  • The likelihood is given by:

    f(yβ,σ2)(σ2)T2exp((yXβ)(yXβ)2σ2)f(y | \beta, \sigma^2) \propto (\sigma^2)^{-\frac{T}{2}} \exp\left(-\frac{(y-X\beta)' (y-X\beta)}{2\sigma^2}\right)
  • The prior for β\beta is:

    π(β)exp(12(ββ0)Λ01(ββ0))\pi(\beta) \propto \exp\left(-\frac{1}{2} (\beta - \beta_0)' \Lambda_{0}^{-1} (\beta - \beta_0)\right)
  • The prior for σ2\sigma^2 is:

    π(σ2)(σ2)(a02+1)exp(b02σ2)\pi(\sigma^2) \propto (\sigma^2)^{-\left(\frac{a_0}{2} + 1\right)} \exp\left(-\frac{b_0}{2\sigma^2}\right)

Gibbs Sampling

Directly computing β,σ2y\beta, \sigma^2 | y can be challenging, so we use Gibbs sampling by sampling from the full conditional distributions of βσ2,y\beta | \sigma^2, y and σ2β,y\sigma^2 | \beta, y.

Expand for Full Conditional Distributions

The full conditional for β\beta is: βσ2,yexp((yXβ)(yXβ)2σ2)exp(12(ββ0)Λ01(ββ0)) \beta | \sigma^2, y \propto \exp\left(-\frac{(y-X\beta)' (y-X\beta)}{2\sigma^2}\right) \exp\left(-\frac{1}{2} (\beta - \beta_0)' \Lambda_{0}^{-1} (\beta - \beta_0)\right)

This simplifies to:

exp(12[β(XXσ2+Λ01)β2β(Xyσ2+Λ01β0)])\propto \exp\left(-\frac{1}{2} \left[ \beta' \left(\frac{X'X}{\sigma^2} + \Lambda_0^{-1}\right)\beta -2 \beta' \left(\frac{X'y}{\sigma^2} + \Lambda_0^{-1} \beta_0\right) \right]\right)

The full conditional for β\beta is a normal distribution. If we denote βσ2,yN(β1,Λ1)\beta | \sigma^2, y \sim N(\beta_1, \Lambda_1), then:

Λ1=(XXσ2+Λ01)1\Lambda_1 = \left(\frac{X'X}{\sigma^2} + \Lambda_0^{-1}\right)^{-1}

β1=Λ1(Xyσ2+Λ01β0)\beta_1 = \Lambda_1 \left(\frac{X'y}{\sigma^2} + \Lambda_0^{-1} \beta_0\right)

The full conditional for σ2\sigma^2 is:

σ2β,y(σ2)T2exp((yXβ)(yXβ)2σ2)(σ2)(a02+1)exp(b02σ2)\sigma^2 | \beta, y \propto (\sigma^2)^{-\frac{T}{2}} \exp\left(-\frac{(y-X\beta)' (y-X\beta)}{2\sigma^2}\right) (\sigma^2)^{-\left(\frac{a_0}{2} + 1\right)} \exp\left(-\frac{b_0}{2\sigma^2}\right)

This simplifies to an inverse-gamma distribution:

σ2β,yIG(a12,b12)\sigma^2 | \beta, y \sim IG\left(\frac{a_1}{2}, \frac{b_1}{2}\right)

Where:

a1=a0+Ta_1 = a_0 + T

b1=b0+(yXβ)(yXβ)b_1 = b_0 + (y-X\beta)'(y-X\beta)

Gibbs Sampling Procedure

The Gibbs sampling procedure for obtaining samples from β,σ2y\beta, \sigma^2 | y is as follows:

  1. Sample β(1)\beta^{(1)} from βσ2(0),y. \beta | \sigma^{2^{(0)}}, y.

  2. Sample σ2β(1),y. \sigma^2 | \beta^{(1)}, y. from σ2β(1),y\sigma^2 | \beta^{(1)}, y.

  3. Sample β(2)\beta^{(2)} from βσ2(1),y. \beta | \sigma^{2^{(1)}}, y.

  4. Sample σ2(2)\sigma^{2^{(2)}} from σ2β(2),y.\sigma^2 | \beta^{(2)}, y.

  5. Continue this process for nn iterations.

After nn iterations, we have nn samples from the full posterior distribution.

Last updated