Bayesian Linear Regression

This lab is focused on Bayesian linear regression. Our goal is to estimate a function $f(x) : \mathbb{R}\rightarrow\mathbb{R}$ given $N$ observations ${(x_1,f(x_1)),\ldots, (x_N,f(x_N))}$.

Assume an approximation $\tilde{f}(x)$ of the form The regression problem is to characterize the coefficients $m=[m_0,\ldots,m_P]$, which we will do in a Bayesian setting.

Goals:

By the end of this lab, you will be able to

Formulation:

Let $y$ denote the “observable” random variable denoting the models outputs and let $\bar{y}$ be a vector in $\mathbb{R}^N$ denoting the specific observations $f(x_1), \ldots, f(x_N)$. Our goal is to characterize the distribution of $m$ given $y=\bar{y}$. Bayes’ rule gives The two major components of this density are the prior density over the coefficeints $p(m)$ and the likelihood function $p(y|m)$. The prior represents should contain any information we might have before obtaining the observation $\bar{y}$. The likelihood function $p(y | m)$ defines a statistical model for the data.

For the prior distribution, we choose a Gaussian prior with a large variance. In particular, with $\sigma_m^2 = 100$.

Question: Do you think this is a valid prior? What if we knew $\frac{df}{dx}>0$? What would change?

To form the likelihood function, we need to relate the expansion coefficients $m$ with the observable random variable $y$. To do this, consider the Vandermonde matrix Using $V$, a common choice is to assume $y$ is given by where $\epsilon \sim N(0,\Sigma_y)$ is a zero mean random variable with covariance $\Sigma_y$. The additive error $\epsilon$ is often called the observation noise. Here, we will further assume that $\Sigma_y = \sigma_y^2 I$.

Question: Is this a reasonable model of $y$? What if the noise was multiplicative?

With this additive error, the likelihood function takes the form

Useful Identities

Let $\theta$ denote an arbitrary Gaussian random variable with mean $\mu_\theta$ and covariance $\Sigma_\theta$, Then, for a matrix $A$, Let $\eta = A\theta + e$, where $e\sim N(0,\Sigma_e)$, then the posterior $p(\theta | \eta)$ is given by where and

These notes by Sam Roweis also provide some useful identities.

Useful software references

Imports

import sys
sys.path.insert(0,'/home/fenics/Installations/MUQ_INSTALL/lib')

import numpy as np
import matplotlib.pyplot as plt

from PlotUtilities import PlotGaussianPDF

Generate Synthetic Observations

def TrueFunc(x):
    return np.sin(4.5*x)
numObs = 20
trueNoiseStd = 2e-2
obsLocs = np.linspace(0,1,numObs)

obsData = TrueFunc(obsLocs) + trueNoiseStd*np.random.randn(numObs)
numPlot = 100

plotx = np.linspace(-0.1, 1.1, numPlot)

plt.figure(figsize=(12,6))
plt.plot(plotx, TrueFunc(plotx),label='True Function')
plt.plot(obsLocs,obsData,'.', markersize=15, label = 'Observations')

plt.ylim([-1.2,1.2])
plt.legend(fontsize=16)
<matplotlib.legend.Legend at 0x7f79e7d22d68>

png

Form the prior

polyOrder = 4

priorMean = np.zeros(polyOrder+1)
priorCov = (10*10)*np.eye(polyOrder+1)
PlotGaussianPDF(priorMean,priorCov)

png

Prior predictive

predV = np.ones((numPlot,polyOrder+1))
for p in range(1,polyOrder+1): 
    predV[:,p] = np.power(plotx, p)
    
predMean = np.dot(predV, priorMean)
predCov = np.dot(predV, np.dot(priorCov, predV.T))

predStd = np.sqrt(np.diag(predCov))
plt.figure(figsize=(12,6))

plt.fill_between(plotx, predMean-2.0*predStd, predMean+2.0*predStd, alpha=0.5, label='Predictive $\mu\pm2\sigma$')
plt.plot(plotx, TrueFunc(plotx), 'k', linewidth=3, label='True Function')
plt.plot(plotx, predMean, linewidth=3, label='Predictive $\mu$')
plt.plot(obsLocs,obsData,'.', markersize=15, label = 'Obs.')

plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3, ncol=4, mode="expand", borderaxespad=0., fontsize=16)
plt.xlabel('Location $x$')
plt.ylabel('Load $m$')
plt.show()

png

Form the polynomial model

V = np.ones((numObs,polyOrder+1))
for p in range(1,polyOrder+1): 
    V[:,p] = np.power(obsLocs, p)

Compute the posterior distribution

modelNoiseStd = 1.0*trueNoiseStd
noiseCov = modelNoiseStd * np.eye(numObs)

Vsigma = np.dot(V,priorCov)
predCov = np.dot(Vsigma, V.T) + noiseCov

postCov = priorCov - np.dot(Vsigma.T, np.linalg.solve(predCov, Vsigma))
postMean = priorMean + np.dot(Vsigma.T, np.linalg.solve(predCov, obsData))
PlotGaussianPDF(postMean,postCov)

png

Posterior Predictive

predMean = np.dot(predV, postMean)
predCov = np.dot(predV, np.dot(postCov, predV.T))

predStd = np.sqrt(np.diag(predCov))
plt.figure(figsize=(12,6))

plt.fill_between(plotx, predMean-2.0*predStd, predMean+2.0*predStd, alpha=0.5, label='Predictive $\mu\pm2\sigma$')
plt.plot(plotx, TrueFunc(plotx), 'k', linewidth=3, label='True Function')
plt.plot(plotx, predMean, linewidth=3, label='Predictive $\mu$')
plt.plot(obsLocs,obsData,'.', markersize=15, label = 'Obs.')

plt.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3, ncol=4, mode="expand", borderaxespad=0., fontsize=16)
plt.xlabel('Location $x$')
plt.ylabel('Load $m$')
plt.show()

png

Repeat with MUQ

import pymuqModeling as mm

Set up prior

priorMean = np.zeros(polyOrder+1)
priorCov = (10*10)*np.eye(polyOrder+1)

prior = mm.Gaussian(priorMean, priorCov)
PlotGaussianPDF(prior.GetMean(), prior.GetCovariance())

png

Compute posterior

post = prior.Condition(V, obsData, noiseCov)

PlotGaussianPDF(post.GetMean(),post.GetCovariance())

png

Prior Predictive Samples

numSamps = 10

priorSamps = np.zeros((priorMean.shape[0], numSamps))
for i in range(numSamps):
    priorSamps[:,i] = prior.Sample()
    
priorPredSamps = np.dot(predV, priorSamps)

plt.figure(figsize=(12,6))
for i in range(numSamps):
    plt.plot(plotx, priorPredSamps[:,i])
    
plt.xlabel('Location $x$')
plt.ylabel('Load $m$')
plt.show()

png

Posterior Predictive Samples

numSamps = 10

postSamps = np.zeros((priorMean.shape[0], numSamps))
for i in range(numSamps):
    postSamps[:,i] = post.Sample()

postPredSamps = np.dot(predV, postSamps)

plt.figure(figsize=(12,6))
for i in range(numSamps):
    plt.plot(plotx, postPredSamps[:,i])
    
plt.xlabel('Location $x$')
plt.ylabel('Load $m$')
plt.show()

png