Hamiltonian MCMC Part 2

Engineering and Data Science

min read

Hamiltonian MCMC Part 2

In part one, we reviewed the Metropolis-Hastings algorithm and discussed the phenomena of random walk behaviour on the one hand, and high rejection rates leading to repeated draws on the other.

In this second part we look at a modification of this algorithm aimed at mitigating these issues by making decorrelated proposals while maintaining a high acceptance rate.

This method has an interpretation in the dynamics of a physical system, and was originally proposed by Duane et al. (1987) in the context of lattice field theory simulations of quantum chromodynamics.

A quick review of Hamiltonian mechanics

First we quickly review some classical mechanics. If it seems easier, you can simply take the mathematics at face value, as a system of coupled partial differential equations.

Hamiltonian mechanics is a theoretical reformulation of the laws of classical mechanics in physics. In the Hamiltonian scheme, the state of a physical system of particles is represented by the point $(q_1,\ldots,q_N,p_1,\ldots,p_N)$ in $2N$-dimensional phase space. The $q_i$s and $p_i$s are the positions and momenta of our particles. Roughly speaking, phase space is much like the familiar multi-dimensional Euclidean space.

The time-evolution of this state is governed by a set of coupled first-order differential equations in terms of a quantity $H$, called the Hamiltonian:

$$\frac{dp_i}{dt} = -\frac{\partial H}{\partial q_i}$$

and

$$\frac{dq_i}{dt} = \frac{\partial H}{\partial p_i}$$

Often, as in our case, $H$ can be regarded as the total energy of the system, i.e. the sum of the kinetic energy $K(\mathbf{p})$ and the potential energy $U(\mathbf{q})$, where $\mathbf{q}$ and $\mathbf{p}$ are shorthand for $q_1,\ldots,q_N$ and $p_1,\ldots,p_N$.

Mathematically, we consider these equations to describe a mapping $T$ between points in phase space over a given time interval. The properties of this formulation which lend it to MCMC are the following:

The determinant of $T$ is 1, which means that volumes in phase space are preserved under Hamiltonian dynamics (a result known as Liouville's theorem).
$T$ is continuous and invertible, which means the dynamics are reversible.

These properties are used to demonstrate that detailed balance holds when proposals are made using Hamiltonian dynamics, and hence the existence of a stationary distribution (see Neal (2010)).

Perhaps one of the most remarkable properties of these dynamics is that we can arrange for properties (1) and (2) to hold exactly, even when the differential equations are only numerically approximated! This is accomplished with leapfrog integration, which goes as follows:

$$p_i(t + \Delta t/2) \leftarrow p_i(t) - \frac{\Delta t}{2} \left.\frac{\partial U}{\partial q_i}\right|_t$$

$$q_i(t + \Delta t) \leftarrow q_i(t) + \Delta t \left.\frac{\partial K}{\partial pi}\right|{t + \Delta t/2}$$

$$p_i(t + \Delta t) \leftarrow p_i(t + \Delta t/2) - \frac{\Delta t}{2} \left.\frac{\partial U}{\partial qi}\right|{t + \Delta t}$$

For a proof of this result, see Neal (2010).

The canonical ensemble

The final physics background for this algorithm is a result taken from statistical mechanics.

In Hamiltonian Monte Carlo, we consider our system of particles to be in thermal contact and equilibrium with a (very) large heat reservoir, where the motions of the particles in our system undergo random perturbations due to thermal interactions. The probability of finding the system in a state with energy $E$ when in contact with a heat reservoir at fixed temperature $T>0$ is given by the canonical distribution:

$$p = \frac{1}{Z}e^{-E/T}$$

where $T$ has dimensions such that the Boltzmann constant is one. For our purposes $Z$ is a normalizing constant.

Hamiltonian Monte Carlo algorithm

Finally we are equipped with the required physics to implement Hamiltonian Monte Carlo. The algorithm defines a Hamiltonian with potential energy

$$U(\mathbf{q}):=-\log[f(\mathbf{q})]$$,

where $f$ is the pdf of the distribution we wish to sample from. In the Bayesian context, for example, $f$ could be proportional to the posterior distribution.

A set of auxiliary variables $\mathbf{p}$ are also introduced, whose physical interpretation are momenta (as noted above). For the kinetic energy we pick $K(\mathbf{p})=\mathbf{p}\cdot\mathbf{p}/2$. This corresponds physically to the classical kinetic energy of particles each of mass $m=1$, and probabilistically to the $p_i$s being independently normally distributed.

We consider this system in thermal equilibrium with a heat reservoir, which means the canonical distribution gives the probabilities for the state $(\mathbf{q},\mathbf{p})$ as

$$P(\mathbf{q},\mathbf{p}) = \frac{1}{Z} \, f(\mathbf{q}) \, \exp(-\mathbf{p}\cdot\mathbf{p}/2)$$

(Here we've set the temperature to $T=1$.) Ultimately, we marginalize over the auxilliary momenta variables $\mathbf{p}$, which in practice amounts to simply discarding them after they have served their purpose in the dynamics.

The algorithm then proceeds as follows:

Initialize $x_0$
For each iteration $t=0,1,2,\ldots$ do:
$q_0 \leftarrow x_t$
Draw $p_0$ from $N(0,1)$ (given our choice of $K$ above)
Evolve $q_0$ and $p_0$ using $L$ iterations of leapfrog integration, combining intermediate half-steps
$\alpha \leftarrow \min[1,\exp(U(q_0)-U(q)+K(p_0)-K(p))]$
Draw u uniformly from $[0,1]$
If $u \leq \alpha$ then $x_{t+1} \leftarrow q$
Else $x_{t+1} \leftarrow q_0$
End

Example of Hamiltonian Monte Carlo simluation

Here is an animation showing simulations from the same correlated Gaussian distribution we considered in part one of this blog, but this time being sampled using the Hamiltonian Monte Carlo algorithm:

Again we set the simulation to start from (-4, 4) and record the evolution. The serial correlation is now much lower than in the Metropolis example from part one, since proposed points can be quite distant from the current point. (This also shows up in the corresponding plots of the autocorrelation function.)

Why Hamiltonian Monte Carlo works

As noted in part one, a sufficient condition for the existence of a stationary distribution is detailed balance, and properties of Hamiltonians dynamics (namely Liouville's theorem and reversibility) can be used to show that detailed balance holds.

On a physical level, the algorithm models the classical evolution of a system of particles in thermal equilibrium with a heat reservoir such that the marginal distribution of particle positions is equal to the distribution from which we want to sample. Therefore, the potential energy $U$ is constructed explicitly to be the negative log of our distribution, so that this is the case.

Algorithmically the evolution is performed using leapfrog intergration, owing in particular to its remarkable property of preserving phase space volumes exactly.

The dynamics allow for distant proposals, which minimizes serial correlation between them, while maintaining a high acceptance rate. Although it doesn't hold exactly numerically, this high acceptance rate is a consequence of the Hamiltonian being a theoretical constant of the system (conservation of energy), which in turn leaves the canonical distribution invariant in principle. Due to this being only an approximate numerical result however, the algorithm still includes a Metropolis accept-reject step.

As we've mentioned, the dynamics play out in the thermal context of a large heat reservoir, which provides for random resampling of the momenta of the particles. The momenta are not directly of interest (and in a probabilistic sense they are marginalized out) but, by being coupled with the positions, they induce the randomness we require.

Finally, we note that the leapfrog integration step size, $\Delta t$, and number of steps, $L$, are tunable parameters of the algorithm, possibly substantially reducing the number of tunable parameters as compared with random-walk Metropolis-Hastings. Again, Neal (2010) provides detailed discussion on more practicalities of using the algorithm that we don't have space to address here.

Summary

We've seen how a modification of the classic Metropolis-Hastings algorithm can lead to more efficient sampling with a smaller number of tunable parameters. The modification uses the dynamics of a physical system of particles in thermal contact with a large heat reservoir to achieve this, because the mathematical properties of such a system allow it to make much more distant (and therefore decorrelated) proposals, which will generally be accepted with a higher probability.

References

Duane, S., Kennedy, A.D., Pendleton, B.J., Roweth, D. (1987) Hybrid Monte Carlo. Physics Letters B, 195:216-222
Neal, R. M. (2010) "MCMC using Hamiltonian dynamics", in the Handbook of Markov Chain Monte Carlo, S. Brooks, A. Gelman, G. L. Jones, and X.-L. Meng (editors), Chapman & Hall / CRC Press, pp. 113-162
Neal, R. M. (1993) Probabilistic Inference Using Markov Chain Monte Carlo Methods, Technical Report CRG-TR-93-1, Dept. of Computer Science, University of Toronto

Words by

iwoca

Article updated on:

March 9, 2020

Get startedBorrow up to £1,000,000
Repay early with no fees
From 1 day to 24 months
Applying won't affect your credit score

Apply now

View all

From Intern to CMO: Joost’s path to leadership at iwoca

Joost Versteeg is the CMO at iwoca and joined in September 2017 as an intern.

People of iwoca

The Women of 3PS

We speak to members of iwoca's Women of 3PS initiative, who discuss the group's creation to address biases and foster positive change, covering topics such as diversity in leadership and career progression.

People of iwoca

Black History Month at iwoca: Saluting our sisters

In this post we share some of the most compelling highlights from our BHM blogs, showcasing the diverse perspectives and experiences within our community.

Hamiltonian MCMC Part 2