Gibbs sampling on a bimodal, highly-correlated posterior with lots of autocorrelation.
mcmc
sampling
Demonstrates Gibbs sampling on a bimodal, highly-correlated posterior distribution. Shows that conditional posteriors are Gaussian despite an ugly joint form, and illustrates the autocorrelation challenges that arise in difficult sampling problems.
Published
April 30, 2025
A gibbs sampler with lots of autocorrelation
%matplotlib inlineimport numpy as npfrom scipy import statsfrom scipy.stats import normfrom scipy.stats import betafrom scipy.stats import distributionsimport matplotlib.pyplot as pltimport timeimport seaborn as snssns.set_style('whitegrid')sns.set_context('poster')
//anaconda/envs/py35/lib/python3.5/site-packages/matplotlib/__init__.py:872: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
warnings.warn(self.msg_depr % (key, alt_key))
A tetchy posterior
Imagine your posterior distribution has the following form:
\[ f(x, y \mid data) = (1/C)*e^{-\frac{(x^2*y^2+x^2+y^2-8x-8y)}{2}} \]
As is typical in Bayesian inference, you don’t know what C (the normalizing constant) is, so you can’t sample from this distribution using conventional methods. However, MCMC techniques allow us to sample from probability distributions without knowing this constant, and we will use oGibbs sampling, to do this here.
Gibbs sampling allows you to sample from a probability distribution by iteratively sampling from its conditional distributions. This strategy is very useful in problems where each unknown would have a very simple distribution if we knew all of the other unknowns. In this problem, the posterior distribution \(f(x, y \mid data)\) is over two unknowns, \(x\) and \(y\). To perform Gibbs sampling, we sample from the distribution of \(x\) holding \(y\) constant at its current value, then sample from the distribution of \(y\) holding \(x\) constant at its current value. As it turns out, even though \(f(x, y \mid data)\) is incredibly ugly, the conditional distributions are relatively simple.
Conditionals
After some simplification (completing the square and throwing all factors that do not involve \(x\) into \(g(y)\) for the first equation, and vice versa for the second), we find that the conditional distributions have a relatively simple form.
\[ p(x \mid y, data) = g(y) e^{-\left(x-\frac{4}{(1+y^2)}\right)^{2}\frac{(1+y^2)}{2}} \]
To assess how the sampler is exploring the space, we can plot a traceplot for each dimension. A traceplot plots the value of each sample against the iteration number and gives a sense of how well the sampler is exploring the space.
def traceplot(z): plt.plot(z, alpha=0.3);
traceplot(x)
You can see from the traceplot the when sampling \(x\), the sampler spends long periods of time near zero, and occasionally moves to and hangs out at higher values. These correspond to the two areas of high density in the countour plot.
Marginals
We can also draw a histogram of \(x\) to get an estimate of its marginal distribution.
plt.hist(x, bins=50);
This is exactly what we would expect if we projected the distribution in the contour plot down to the \(x\) axis.
We can do the same plots for \(y\).
traceplot(y)
plt.hist(y, bins=50);
How we move
Because we are in two dimensions, we can also plot the path that the sampler took through the \(xy\) plane. Note that the path always takes right angles, because we are alternating between moves that only move in the \(x\) direction and only move in the \(y\) direction.
To see how effective the samples we have drawn will be at approximating summaries of the posterior distribution (for example the posterior mean), we can look at the autocorrelation of the samples. High autocorrelation would mean that the sample average that we take to approximate the posterior mean would be higher than expected if we had taken independent samples from the posterior distribution.
In both \(x\) and \(y\), we can see that the autocorrelation is quite high. This is not a big problem though because the sampler is so simple that we can draw millions of samples to make up for the high autocorrelation.
To figure out exactly how many samples we would have to draw, we can compute effective sample size, a measure of how many independent samples our our samples are equivalent to. This usese the same quantities that were used to compute the autocorrelation plot above. The following code is taken from https://code.google.com/p/biopy/source/browse/trunk/biopy/bayesianStats.py?r=67. You don’t need to try to understand the function – it’s just here to run it, and this is a rather slow implementation.
def effectiveSampleSize(data, stepSize =1) :""" Effective sample size, as computed by BEAST Tracer.""" samples =len(data)assertlen(data) >1,"no stats for short sequences" maxLag =min(samples//3, 1000) gammaStat = [0,]*maxLag#varGammaStat = [0,]*maxLag varStat =0.0;iftype(data) != np.ndarray : data = np.array(data) normalizedData = data - data.mean()for lag inrange(maxLag) : v1 = normalizedData[:samples-lag] v2 = normalizedData[lag:] v = v1 * v2 gammaStat[lag] =sum(v) /len(v)#varGammaStat[lag] = sum(v*v) / len(v)#varGammaStat[lag] -= gammaStat[0] ** 2# print lag, gammaStat[lag], varGammaStat[lag]if lag ==0 : varStat = gammaStat[0]elif lag %2==0 : s = gammaStat[lag-1] + gammaStat[lag]if s >0 : varStat +=2.0*selse :break# standard error of mean# stdErrorOfMean = Math.sqrt(varStat/samples);# auto correlation time act = stepSize * varStat / gammaStat[0]# effective sample size ess = (stepSize * samples) / actreturn ess
Now we can compute effective sample size for x and y.
esx = effectiveSampleSize(xall)esy = effectiveSampleSize(yall)print("Effective Size for x: ", esx, " of ", len(x), " samples, rate of", esx/len(x)*100, "%.")print("Effective Size for y: ", esy, " of ", len(y), " samples, rate of", esy/len(y)*100, "%.")
Effective Size for x: 10115.8236073 of 36001 samples, rate of 28.0987295001 %.
Effective Size for y: 10264.5088434 of 36001 samples, rate of 28.5117325726 %.
Note that while the effective size is only just over 25% of the actual sample size, we can draw samples so quickly from the posterior that this is not a major hindrance.