Random Growth and KPZ Universality

The KPZ universality class is expected to govern the fluctuations of a wide range of random growth phenomena. The same statistics also appear in other settings, including eigenvalue statistics of sample covariance matrices arising in principal component analysis.

First-passage percolation

A metric is a rule that assigns a distance between points. First-passage percolation is a model of a random metric, obtained by placing i.i.d. random weights on the edges of the lattice and defining the distance between two sites to be the smallest total weight of a path joining them.

First-passage percolation

Each nearest-neighbor edge \(e\) of \(\mathbb{Z}^2\) is assigned an i.i.d. nonnegative random weight \(\tau_e\). For lattice sites \(x\) and \(y\),

\[ T_{x,y}=\inf_{\pi:x\to y}\sum_{e\in\pi}\tau_e, \qquad B(t)=\{y\in\mathbb{Z}^2:T_{0,y}\le t\}. \]

The set \(B(t)\) is the ball of radius \(t\) in this random metric. It is the collection of sites that can be reached from the origin by time \(t\). The large-scale questions are then geometric: what deterministic shape approximates \(B(t)\), how does the boundary fluctuate around that shape, and what do the minimizing paths look like?

The first question asks for the large-scale behavior of the passage time itself. If one fixes a macroscopic direction \(x\) and looks at lattice points near \(nx\), then the passage time from the origin to those points grows linearly in \(n\) to first order. Under standard moment assumptions on the edge-weight distribution, the coefficient of that linear growth is deterministic.

Shape theorem

The first statement is the passage-time limit. The second line defines the deterministic limit shape through the corresponding sublevel set.

\[ \frac{T_{0,\lfloor nx\rfloor}}{n}\to \mu(x), \qquad \mathcal{B}=\{x\in\mathbb{R}^2:\mu(x)\le 1\}. \]

The function \(\mu\) is the time constant of the model. Once \(\mu\) is known, its unit sublevel set \(\mathcal{B}=\{x:\mu(x)\le 1\}\) gives the deterministic shape of the random ball, and the rescaled balls \(t^{-1}B(t)\) converge to \(\mathcal{B}\). So the passage-time limit and the limit shape are two ways of expressing the same first-order behavior.

Three still frames from the first-passage percolation growth video, showing the ball at increasing times. — Three still frames from the same simulation, showing the geometry of the growing ball at increasing times.

Figure 1. Evolution of the FPP ball

An animation of the growing ball \(B(t)\) in first-passage percolation. The plotted set is \(B(t)\), and the video shows how that set changes as \(t\) varies.

Last-passage percolation

Last-passage percolation is a related directed growth model on the lattice, in which admissible paths move only by the coordinate steps \(e_1=(1,0)\) and \(e_2=(0,1)\). When the weights are nonnegative, it can be interpreted as an infection process: \(G_{x,y}\) is the time that \(y\) is infected by an infection started at \(x\).

Last-passage percolation

Each site \(z\in\mathbb{Z}^2\) is assigned an i.i.d. random weight \(\omega_z\). For an up-right path \(\pi:x\nearrow y\), define

\[ W(\pi)=\sum_{z\in\pi\setminus\{x\}}\omega_z, \qquad G_{x,y}=\max_{\pi:x\nearrow y}W(\pi). \]

Every up-right path from \(x\) to \(y\) collects the same number of vertex weights, so the path sums appearing in the maximum have the same distribution. The maximum is taken over exponentially many path sums, and these random variables are strongly correlated. It is then natural to ask whether the limiting statistics are still given by the extreme-value laws discussed on the universality page. The answer, as we will shortly see, is no.

A second basic observable is obtained by changing the terminal set: instead of fixing one endpoint, one takes the maximum passage time over all points at distance \(n\) from the origin.

Point-to-line last passage

Instead of fixing a single endpoint, this model maximizes over all endpoints on the line \(x_1+x_2=n\), or equivalently over all up-right paths of length \(n\) starting from the origin.

\[ G_{(n)}^{\textup{line}} = \max_{0\le k\le n}G_{0,(k,n-k)} = \max_{\pi:0\nearrow\{x_1+x_2=n\}}W(\pi). \]

This is a maximum over more paths than the point-to-point model. Its limiting statistics will be discussed later. The passage times can be computed using dynamic programming.

Dynamic programming

This recursion expresses the passage time to \((i,j)\) in terms of the two sites that can reach it in one directed step.

\[ G_{0,(i,j)}=\omega_{(i,j)}+\max\{G_{0,(i-1,j)},\,G_{0,(i,j-1)}\}. \]

On the coordinate axes there is only one admissible directed path, so one starts from \(G_{0,0}=0\) and then accumulates the weights encountered along that unique path.

As in FPP, there is a deterministic first-order limit. If \(x_n\in\mathbb{Z}_+^2\) satisfies \(x_n/n\to \xi\), where \(\xi\) lies on the line segment joining \(e_1=(1,0)\) and \(e_2=(0,1)\), then the passage time to \(x_n\) grows linearly in \(n\), with a deterministic growth rate depending only on \(\xi\). This first-order law also requires standard moment assumptions on the weight distribution.

Shape function

The shape function records the first-order growth rate in each direction, and the point-to-line first-order growth is obtained by maximizing that function over all directed endpoints at level \(n\).

\[ \frac{G_{0,x_n}}{n}\to g(\xi), \qquad \frac{G_{(n)}^{\textup{line}}}{n}\to \max_{\xi\in[e_1,e_2]} g(\xi). \]

The function \(g\) is the shape function for the directed model. It describes the first-order growth of the infection times, and the corresponding sublevel sets of the infection process converge, after rescaling, to deterministic sets.

Three still frames from the last-passage percolation infection-growth video, showing the infected region at increasing times. — Three still frames from the same simulation, showing the infected region at increasing times.

Figure 3. LPP as an infection model

An animation of the infected region in exponential last-passage percolation. The plotted set is \(\mathcal{I}_0(t)\), and the video shows how that set changes as \(t\) varies.

Project 2

Exponential point-to-point and point-to-line LPP

Work here with i.i.d. exponential(1) weights. The recursion above makes this model easy to simulate, and the first-order shape function is known explicitly: \[ g(x_1,x_2)=(\sqrt{x_1}+\sqrt{x_2})^2. \] Along the diagonal this gives \(g(1,1)=4\), so \(G_{0,(n,n)}/n\) should approach \(4\). On the line \(x_1+x_2=1\), the maximum is attained at \((1/2,1/2)\), so \(G_{(n)}^{\textup{line}}/n\) should approach \(2\).

Generate i.i.d. exponential(1) weights on square boxes and compute \(G_{0,(i,j)}\) by dynamic programming.
Make one picture or three still frames of the infected region \(\mathcal I_0(t)\) at increasing times, in the style of Figures 3 and 4.
For a sequence of \(n\), compute both \(G_{0,(n,n)}\) and \(G_{(n)}^{\textup{line}}\).
Plot \(G_{0,(n,n)}/n\) against \(n\) and check that it moves toward \(4\).
Plot \(G_{(n)}^{\textup{line}}/n\) against \(n\) and check that it moves toward \(2\).
Include one short comparison of the point-to-point and point-to-line models at the first-order level.

Geodesics

In both FPP and LPP, a geodesic is a path that realizes the optimizing passage time. When geodesics to each target are unique, the union of the geodesics from a fixed root to many targets forms a finite geodesic tree. These trees encode how optimizing paths branch and merge.

Competition interfaces

In the infection picture rooted at the origin, each geodesic begins with either \(e_1\) or \(e_2\). When geodesics are unique, this gives a genuine tree from the origin and assigns each site to one of two classes: those whose geodesic starts with \(e_1\), and those whose geodesic starts with \(e_2\). The competition interface is the boundary between those two classes.

This boundary is a geometric object in its own right. It records how the two lineages separate, and it leads to questions about direction, coalescence, and fluctuations that are closely related to the geodesic questions above. A longer discussion appears on the competition-interface page.

The geodesic tree from the origin, split into sites whose geodesic starts with e1 in red and sites whose geodesic starts with e2 in blue, with the competition interface in black between them. — The geodesic tree at the origin, with the \(e_1\) and \(e_2\) regions colored red and blue and the competition interface in black.

Geodesics to distant endpoints are typically close to the straight line joining their endpoints. The transversal fluctuation is the size of the geodesic's deviation from that straight line.

Project 3

Measure geodesic straightness

The goal here is to look for the transversal fluctuation exponent. For a geodesic from \(0\) to \((n,n)\), measure how far the path wanders from the straight line \(x_1=x_2\), and then study how that deviation grows with \(n\). A reasonable first test is to use sizes such as \(n=64,128,256,512\).

Work in exponential(1) LPP and, for each \(n\), generate many geodesics from \(0\) to \((n,n)\).
For each sampled geodesic, record the maximal distance from the path to the diagonal line \(x_1=x_2\).
At each \(n\), compute a summary statistic of that distance, such as the sample mean or median, and make a log-log plot against \(n\).
Fit a line to that log-log plot and compare the fitted slope with the KPZ prediction \(2/3\). Include one overlay of several geodesics to \((n,n)\) together with the diagonal line.

As one representative example, using \(n=64,128,256,512\) and \(150\) samples at each size produced fitted slopes of about \(0.676\) for the mean deviation and \(0.686\) for the median deviation. A log-log plot in that range should already give a slope close to \(2/3\).

Several geodesics from the origin to (512,512) in exponential last-passage percolation, overlaid with the straight line x1 equals x2. — A representative overlay of sample geodesics from \(0\) to \((512,512)\), together with the diagonal line \(x_1=x_2\).

A log-log plot of mean and median geodesic deviation versus n, with fitted slopes near two thirds and a reference slope two thirds. — A representative log-log plot of the mean and median deviations against \(n\), with fitted slopes close to \(2/3\).

Centering and scaling

The shape limits above already identify the first-order growth of the model. For point-to-point and point-to-line passage times, the centering terms are the linear terms coming from those limits. The next question is how to scale the fluctuations around that first-order behavior.

Centered and scaled passage times

These are the KPZ analogues of subtracting the first-order growth term and dividing by a fluctuation scale.

\[ \frac{G_{0,(n,n)}-b_n}{a_n}, \qquad \frac{G_{(n)}^{\textup{line}}-\widetilde b_n}{\widetilde a_n}. \]

The previous shape statements say that these centering terms are linear in \(n\):

First-order growth

In exponential(1) LPP these centering terms are explicit: \(g(1,1)=4\) for point-to-point passage times, and the maximum of \(g(\xi)\) over the line segment from \(e_1\) to \(e_2\) is \(2\) for the point-to-line model.

\[ G_{0,(n,n)}=4n+\text{fluctuation}, \qquad G_{(n)}^{\textup{line}}=2n+\text{fluctuation}. \]

The shape function describes the first-order growth, but in most models it is not available in closed form. The next question is the fluctuation scale: what is the order of \(a_n\) and \(\widetilde a_n\)? That is one of the KPZ scaling exponents. Once the correct centering and scaling are identified, one can ask for the limiting distributions of the normalized passage times. Random matrix models provide the first examples where those limiting laws can be computed explicitly.

Random matrices

The basic objects here are large random matrices, and the statistic of interest is the largest eigenvalue.

Gaussian matrix ensembles

One way to define these ensembles is to start with a matrix of independent Gaussian entries and then impose the required symmetry.

\[ H_n=\frac{X_n+X_n^*}{2\sqrt{n}}, \qquad M_n=\frac{Y_n+Y_n^{\mathsf T}}{\sqrt{2n}}. \]

Here \(X_n\) has i.i.d. complex Gaussian entries of the form \(U+iV\), where \(U\) and \(V\) are independent \(N(0,1)\) random variables, and \(Y_n\) has i.i.d. \(N(0,1)\) entries. The Hermitian matrix \(H_n\) is the Gaussian Unitary Ensemble (GUE), and the real symmetric matrix \(M_n\) is the Gaussian Orthogonal Ensemble (GOE).

The factor \(1/\sqrt{n}\) is the normalization that keeps the spectrum on an order-one scale. With this convention, the empirical eigenvalue distribution, which is essentially the histogram of eigenvalues, converges as \(n\to\infty\) to a deterministic limiting density.

For these Gaussian ensembles, that limiting density is the semicircle law. The largest eigenvalue sits at the upper edge of this limiting density, so one first identifies that edge and then studies the fluctuations around it.

Side-by-side empirical eigenvalue histograms from one GOE matrix and one GUE matrix of size 500, with the semicircle density overlaid and the spectral edge at 2 marked by a dashed line. — Empirical eigenvalue histograms from one GOE matrix and one GUE matrix of size \(500\), together with the semicircle density and the spectral edge at \(2\).

Figure 7 shows this at the level of one sample: the eigenvalues fill an interval, and the histogram is already close to the semicircle density. The next question is how the edge of that spectrum fluctuates from sample to sample.

Edge heuristic

The limiting density is the semicircle law

\[ \rho_{\mathrm{sc}}(x)=\frac{1}{2\pi}\sqrt{4-x^2}\,\mathbf{1}_{|x|\le 2}. \]

Heuristically, one looks for the scale \(s\) at which the expected number of eigenvalues in \([2-s,2]\) is of order \(1\). That expected count is approximately \[ n\int_{2-s}^{2}\rho_{\mathrm{sc}}(x)\,dx. \] Near \(x=2\), the semicircle density behaves like a constant times \(\sqrt{2-x}\), so this integral is of order \(s^{3/2}\). Setting the expected count equal to \(1\) suggests \(s\approx n^{-2/3}\).

Side-by-side histograms built from 1500 GOE and 1500 GUE matrices of size 500, plotting n to the two thirds times the largest eigenvalue minus two, with the corresponding Tracy-Widom densities overlaid. — Histograms of \(n^{2/3}(\lambda_{\max}-2)\) built from \(1500\) GOE samples and \(1500\) GUE samples of size \(500\), together with the corresponding Tracy-Widom densities.

After centering at the spectral edge and scaling by \(n^{2/3}\), the largest eigenvalue converges to a non-Gaussian limit law. In the real symmetric case this is the Tracy-Widom GOE law, and in the Hermitian case it is the Tracy-Widom GUE law.

Project 4

Sample GUE and GOE matrices

Generate GUE matrices and GOE matrices at several sizes.
For one large matrix in each ensemble, plot the full eigenvalue histogram and compare it with the semicircle curve as in Figure 7.
Compute the largest eigenvalue from many matrix samples.
Center at the spectral edge \(2\) and scale by \(n^{2/3}\), as in Figure 8.
Plot separate histograms for the GOE and GUE edge statistics.
Compare the empirical distribution functions across two matrix sizes in each ensemble.

KPZ laws

Returning now to exponential(1) last-passage percolation, the same Tracy-Widom laws reappear there as well. In the KPZ class, the universal features are scaling exponents and limit laws. Passage-time fluctuations are expected on the \(n^{1/3}\) scale, and geodesic deviations from the straight line are expected on the \(n^{2/3}\) scale. In exponential(1) LPP these quantities can be computed explicitly, which makes it possible to identify the limiting laws.

Transversal scale

If \(\pi_n\) is a geodesic from \(0\) to \((n,n)\) and \(L_n\) is the straight line segment joining those endpoints, then the KPZ prediction is that the largest deviation of \(\pi_n\) from \(L_n\) is on the \(n^{2/3}\) scale.

\[ \max_{z\in \pi_n}\operatorname{dist}(z,L_n)\asymp n^{2/3}. \]

Point-to-point law

In exponential(1) point-to-point LPP, the centered and scaled passage time has a non-Gaussian limit law.

\[ \frac{G_{0,(n,n)}-4n}{2^{4/3}n^{1/3}} \Longrightarrow \mathrm{TW}_{\mathrm{GUE}}. \]

The same limit law appears for the largest eigenvalue in GUE after its own centering and scaling. For the point-to-line model, the corresponding exponential(1) LPP limit law is Tracy-Widom GOE.

Point-to-line law

Changing the geometry of the terminal set changes the limiting law. In the point-to-line case, the corresponding limit is Tracy-Widom GOE.

\[ \frac{G_{(n)}^{\textup{line}}-\widetilde b_n}{\widetilde a_n} \Longrightarrow \mathrm{TW}_{\mathrm{GOE}}. \]

The same scaling picture is conjectured in two-dimensional first-passage percolation for sufficiently nice i.i.d. weight distributions. There too one expects passage-time fluctuations on the \(n^{1/3}\) scale and geodesic deviations from the straight line on the \(n^{2/3}\) scale. Unlike exponential(1) LPP, the centering and scaling terms are not known explicitly.

Conjectural FPP law

For point-to-point first-passage percolation with sufficiently nice i.i.d. weights, one expects a model-dependent centering sequence \(b_n\) and scaling sequence \(a_n\) such that the limiting law is again Tracy-Widom GUE.

\[ \frac{T_{0,(n,n)}-b_n}{a_n} \Longrightarrow \mathrm{TW}_{\mathrm{GUE}}. \]

Empirical cumulants

If \(Y_1,\dots,Y_m\) are already centered-and-scaled samples, their empirical cumulants are estimated by

\[ \widehat\kappa_1=\overline Y=\frac{1}{m}\sum_{i=1}^m Y_i,\qquad \widehat\kappa_2=\frac{1}{m}\sum_{i=1}^m (Y_i-\overline Y)^2, \] \[ \widehat\kappa_3=\frac{1}{m}\sum_{i=1}^m (Y_i-\overline Y)^3,\qquad \widehat\kappa_4=\frac{1}{m}\sum_{i=1}^m (Y_i-\overline Y)^4-3\widehat\kappa_2^2. \]

These are the quantities compared with the Tracy-Widom table below.

Project 5

Point-to-point and GUE

Simulate \(G_{0,(n,n)}\) in exponential LPP for a sequence of \(n\).
Sample largest eigenvalues of GUE matrices.
Center and scale both collections using the explicit exponential-LPP and GUE normalizations.
Compare the resulting histograms directly.
Compute empirical mean, variance, third cumulant, and fourth cumulant for the normalized LPP passage times and compare them with the corresponding cumulants of \(\mathrm{TW}_{\mathrm{GUE}}\).

\(\mathrm{TW}_{\mathrm{GUE}}\) cumulants
Statistic	Value
Mean	-1.7711
Variance	0.8132
Third cumulant	0.1643
Fourth cumulant	0.0618

Project 6

Point-to-line and GOE

Compute \(G_{(n)}^{\textup{line}}\) in exponential LPP for a sequence of \(n\).
Sample largest eigenvalues of GOE matrices.
Center and scale both collections using the appropriate point-to-line and GOE normalizations.
Compare histograms and empirical distribution functions for the normalized samples.
Record how the point-to-line law differs numerically from the point-to-point law.

Other weights

Exponential(1) weights are special because the model is "solvable." This is a term of art meaning that some algebraic miracles happen which make exact computation possible. When the weight distribution is changed even slightly, the explicit formulas for the shape function and the fluctuation scales that give us very deep insight into the exponential(1) model are no longer available. The universality question remains the same, but it becomes harder to study analytically. The exercise below asks for numerical evidence for universality by comparing dimensionless cumulants of the passage-time fluctuations with the Tracy-Widom GUE values.

Once we change the distribution to be anything other than exponential(1), it becomes much more involved to do the kind of comparisons we did above. Numerically estimating the centering and scaling correctly can be a delicate problem. A simpler comparison is to study normalized cumulants, which do not require us to estimate those terms. After empirical centering and scaling, the first two cumulants are fixed at \(0\) and \(1\), so the skewness and excess kurtosis become dimensionless quantities that can be compared directly with Tracy-Widom GUE.

Standardized cumulants

If \(X_1,\dots,X_m\) are simulated passage times, first normalize them by the empirical mean and empirical standard deviation:

\[ \overline X=\frac{1}{m}\sum_{i=1}^m X_i, \qquad s^2=\frac{1}{m}\sum_{i=1}^m (X_i-\overline X)^2, \qquad Z_i=\frac{X_i-\overline X}{s}. \]

The standardized cumulants are then estimated from the empirical moments of the \(Z_i\):

\[ \widehat\kappa_3=\frac{1}{m}\sum_{i=1}^m Z_i^3, \qquad \widehat\kappa_4=\frac{1}{m}\sum_{i=1}^m Z_i^4-3, \]

Standardized \(\mathrm{TW}_{\mathrm{GUE}}\) cumulants
Statistic	Value
Skewness	0.2241
Excess kurtosis	0.0934

Project 7

Non-exponential LPP

Replace the exponential weights by another distribution such as the uniform, normal, gamma, or lognormal.
Simulate \(G_{0,(n,n)}\) for a sequence of \(n\) and collect many samples at each size.
At each \(n\), center and scale the passage times using the empirical sample mean and sample standard deviation.
Compute the empirical skewness and excess kurtosis and compare them with the \(\mathrm{TW}_{\mathrm{GUE}}\) values in the table above.
Record how stable those cumulant estimates are as \(n\) increases and as the weight distribution changes.

Random Growth and KPZ Universality

First-passage percolation

Figure 2. Three times in FPP growth

Figure 1. Evolution of the FPP ball

Last-passage percolation

Figure 4. Three times in LPP growth

Figure 3. LPP as an infection model

Exponential point-to-point and point-to-line LPP

Geodesics

Figure 5. A finite geodesic tree

Competition interfaces

Figure 6. A competition interface

Measure geodesic straightness

Centering and scaling

Random matrices

Figure 7. Single-sample spectra

Figure 8. Edge-scaled largest eigenvalues

Sample GUE and GOE matrices

KPZ laws

Point-to-point and GUE

Point-to-line and GOE

Other weights

Non-exponential LPP