Bayesian Linear Regression with Gibbs Sampling

An in-depth guide on implementing Bayesian linear regression and Gibbs sampling for parameter estimation.

TL; DR
Bayesian linear regression assumes data follows a normal distribution given parameters.
Prior distributions for regression coefficients and variance are normally and inverse-gamma distributed respectively.
Gibbs sampling is used to sample from the posterior distribution when direct computation is infeasible.
The full conditional distributions for coefficients and variance are derived to facilitate sampling.
The Gibbs sampling procedure iterates between sampling regression coefficients and variance to approximate the posterior distribution.

Bayesian Linear Regression and Gibbs Sampling

Bayesian linear regression is a statistical method in which the statistical analysis is undertaken within the context of Bayesian inference. When performing Bayesian linear regression, we assume that the observed data, $y$ , given the parameters $\beta$ and $\sigma^2$ , follows a normal distribution:

y|\beta, \sigma^2 \sim N(X\beta, \sigma^2)

Prior Distributions

The prior distributions for the parameters are set as follows:

For $\beta$ :
$\beta \sim N(\beta_0, \Lambda_0)$
with the prior probability density function (pdf) given by:
$f(x; \beta_0, \Lambda_0) = (2\pi)^{-\frac{k}{2}} \text{det}(\Lambda_0)^{-\frac{1}{2}} \exp\left(-\frac{1}{2} (\beta - \beta_0)' \Lambda_{0}^{-1} (\beta - \beta_0)\right)$
For $\sigma^2$ :
$\sigma^2 \sim IG\left(\frac{a_0}{2}, \frac{b_0}{2}\right)$
with the prior pdf given by:
$f(x; \frac{a_0}{2}, \frac{b_0}{2}) = \frac{\left(\frac{b_0}{2}\right)^{\frac{a_0}{2}}}{\Gamma\left(\frac{a_0}{2}\right)} x^{-\left(\frac{a_0}{2} + 1\right)} \exp\left(-\frac{b_0}{2x}\right)$

Expand for Bayesian Theorem and Posterior Distribution

By Bayes' theorem, the posterior distribution is proportional to the product of the likelihood and the prior distributions:

Where:

The likelihood is given by:
$f(y | \beta, \sigma^2) \propto (\sigma^2)^{-\frac{T}{2}} \exp\left(-\frac{(y-X\beta)' (y-X\beta)}{2\sigma^2}\right)$
The prior for $\beta$ is:
$\pi(\beta) \propto \exp\left(-\frac{1}{2} (\beta - \beta_0)' \Lambda_{0}^{-1} (\beta - \beta_0)\right)$
The prior for $\sigma^2$ is:
$\pi(\sigma^2) \propto (\sigma^2)^{-\left(\frac{a_0}{2} + 1\right)} \exp\left(-\frac{b_0}{2\sigma^2}\right)$

Gibbs Sampling

Directly computing $\beta, \sigma^2 | y$ can be challenging, so we use Gibbs sampling by sampling from the full conditional distributions of $\beta | \sigma^2, y$ and $\sigma^2 | \beta, y$ .

Expand for Full Conditional Distributions

The full conditional for $\beta$ is: $\beta | \sigma^2, y \propto \exp\left(-\frac{(y-X\beta)' (y-X\beta)}{2\sigma^2}\right) \exp\left(-\frac{1}{2} (\beta - \beta_0)' \Lambda_{0}^{-1} (\beta - \beta_0)\right)$

This simplifies to:

$\propto \exp\left(-\frac{1}{2} \left[ \beta' \left(\frac{X'X}{\sigma^2} + \Lambda_0^{-1}\right)\beta -2 \beta' \left(\frac{X'y}{\sigma^2} + \Lambda_0^{-1} \beta_0\right) \right]\right)$

The full conditional for $\beta$ is a normal distribution. If we denote $\beta | \sigma^2, y \sim N(\beta_1, \Lambda_1)$ , then:

$\Lambda_1 = \left(\frac{X'X}{\sigma^2} + \Lambda_0^{-1}\right)^{-1}$

$\beta_1 = \Lambda_1 \left(\frac{X'y}{\sigma^2} + \Lambda_0^{-1} \beta_0\right)$

The full conditional for $\sigma^2$ is:

$\sigma^2 | \beta, y \propto (\sigma^2)^{-\frac{T}{2}} \exp\left(-\frac{(y-X\beta)' (y-X\beta)}{2\sigma^2}\right) (\sigma^2)^{-\left(\frac{a_0}{2} + 1\right)} \exp\left(-\frac{b_0}{2\sigma^2}\right)$

This simplifies to an inverse-gamma distribution:

$\sigma^2 | \beta, y \sim IG\left(\frac{a_1}{2}, \frac{b_1}{2}\right)$

Where:

$a_1 = a_0 + T$

$b_1 = b_0 + (y-X\beta)'(y-X\beta)$

Gibbs Sampling Procedure

The Gibbs sampling procedure for obtaining samples from $\beta, \sigma^2 | y$ is as follows:

Sample $\beta^{(1)}$ from $\beta | \sigma^{2^{(0)}}, y.$
Sample $\sigma^2 | \beta^{(1)}, y.$ from $\sigma^2 | \beta^{(1)}, y$ .
Sample $\beta^{(2)}$ from $\beta | \sigma^{2^{(1)}}, y.$
Sample $\sigma^{2^{(2)}}$ from $\sigma^2 | \beta^{(2)}, y.$
Continue this process for $n$ iterations.

After $n$ iterations, we have $n$ samples from the full posterior distribution.

Last updated 1 year ago

Was this helpful?