Random Growth and KPZ Universality
The KPZ universality class is expected to govern the fluctuations of a wide range of random growth phenomena. The same statistics also appear in other settings, including eigenvalue statistics of sample covariance matrices arising in principal component analysis.
First-passage percolation
A metric is a rule that assigns a distance between points. First-passage percolation is a model of a random metric, obtained by placing i.i.d. random weights on the edges of the lattice and defining the distance between two sites to be the smallest total weight of a path joining them.
First-passage percolation
In first-passage percolation, each nearest-neighbor edge has an independent identically distributed nonnegative random weight. The passage time T sub x comma y is the minimum over all nearest-neighbor paths from x to y of the sum of the edge weights along the path. The ball B of t is the set of sites y whose passage time from the origin is at most t.Each nearest-neighbor edge \(e\) of \(\mathbb{Z}^2\) is assigned an i.i.d. nonnegative random weight \(\tau_e\). For lattice sites \(x\) and \(y\),
\[ T_{x,y}=\inf_{\pi:x\to y}\sum_{e\in\pi}\tau_e, \qquad B(t)=\{y\in\mathbb{Z}^2:T_{0,y}\le t\}. \]The set \(B(t)\) is the ball of radius \(t\) in this random metric. It is the collection of sites that can be reached from the origin by time \(t\). The large-scale questions are then geometric: what deterministic shape approximates \(B(t)\), how does the boundary fluctuate around that shape, and what do the minimizing paths look like?
The first question asks for the large-scale behavior of the passage time itself. If one fixes a macroscopic direction \(x\) and looks at lattice points near \(nx\), then the passage time from the origin to those points grows linearly in \(n\) to first order. Under standard moment assumptions on the edge-weight distribution, the coefficient of that linear growth is deterministic.
Shape theorem
There is a deterministic function mu so that the passage time from the origin to n x, divided by n, converges to mu of x. The limit shape is the sublevel set of mu at level 1.The first statement is the passage-time limit. The second line defines the deterministic limit shape through the corresponding sublevel set.
\[ \frac{T_{0,\lfloor nx\rfloor}}{n}\to \mu(x), \qquad \mathcal{B}=\{x\in\mathbb{R}^2:\mu(x)\le 1\}. \]The function \(\mu\) is the time constant of the model. Once \(\mu\) is known, its unit sublevel set \(\mathcal{B}=\{x:\mu(x)\le 1\}\) gives the deterministic shape of the random ball, and the rescaled balls \(t^{-1}B(t)\) converge to \(\mathcal{B}\). So the passage-time limit and the limit shape are two ways of expressing the same first-order behavior.
Figure 2. Three times in FPP growth
Figure 1. Evolution of the FPP ball
Last-passage percolation
Last-passage percolation is a related directed growth model on the lattice, in which admissible paths move only by the coordinate steps \(e_1=(1,0)\) and \(e_2=(0,1)\). When the weights are nonnegative, it can be interpreted as an infection process: \(G_{x,y}\) is the time that \(y\) is infected by an infection started at \(x\).
Last-passage percolation
In last-passage percolation, each lattice site has an independent identically distributed random weight. The passage time along an up-right path is the sum of the vertex weights on that path, excluding the starting point. The point-to-point last-passage time G sub x comma y is the maximum of those path passage times over all up-right paths from x to y.Each site \(z\in\mathbb{Z}^2\) is assigned an i.i.d. random weight \(\omega_z\). For an up-right path \(\pi:x\nearrow y\), define
\[ W(\pi)=\sum_{z\in\pi\setminus\{x\}}\omega_z, \qquad G_{x,y}=\max_{\pi:x\nearrow y}W(\pi). \]Every up-right path from \(x\) to \(y\) collects the same number of vertex weights, so the path sums appearing in the maximum have the same distribution. The maximum is taken over exponentially many path sums, and these random variables are strongly correlated. It is then natural to ask whether the limiting statistics are still given by the extreme-value laws discussed on the universality page. The answer, as we will shortly see, is no.
A second basic observable is obtained by changing the terminal set: instead of fixing one endpoint, one takes the maximum passage time over all points at distance \(n\) from the origin.
Point-to-line last passage
The point-to-line last-passage time G sub n superscript line is the maximum passage time from the origin to the line x one plus x two equals n. Equivalently, it is the maximum passage time over all directed paths of length n from the origin.Instead of fixing a single endpoint, this model maximizes over all endpoints on the line \(x_1+x_2=n\), or equivalently over all up-right paths of length \(n\) starting from the origin.
\[ G_{(n)}^{\textup{line}} = \max_{0\le k\le n}G_{0,(k,n-k)} = \max_{\pi:0\nearrow\{x_1+x_2=n\}}W(\pi). \]This is a maximum over more paths than the point-to-point model. Its limiting statistics will be discussed later. The passage times can be computed using dynamic programming.
Dynamic programming
In exponential last-passage percolation, the point-to-point passage times satisfy a recursion in which G from zero to i comma j equals the weight at i comma j plus the maximum of the two neighboring passage times.This recursion expresses the passage time to \((i,j)\) in terms of the two sites that can reach it in one directed step.
\[ G_{0,(i,j)}=\omega_{(i,j)}+\max\{G_{0,(i-1,j)},\,G_{0,(i,j-1)}\}. \]On the coordinate axes there is only one admissible directed path, so one starts from \(G_{0,0}=0\) and then accumulates the weights encountered along that unique path.
As in FPP, there is a deterministic first-order limit. If \(x_n\in\mathbb{Z}_+^2\) satisfies \(x_n/n\to \xi\), where \(\xi\) lies on the line segment joining \(e_1=(1,0)\) and \(e_2=(0,1)\), then the passage time to \(x_n\) grows linearly in \(n\), with a deterministic growth rate depending only on \(\xi\). This first-order law also requires standard moment assumptions on the weight distribution.
Shape function
If x sub n divided by n converges to xi on the line segment between e one and e two, then the last-passage time from the origin to x sub n, divided by n, converges to a deterministic shape function g of xi. The point-to-line first-order growth is the maximum of g over that line segment.The shape function records the first-order growth rate in each direction, and the point-to-line first-order growth is obtained by maximizing that function over all directed endpoints at level \(n\).
\[ \frac{G_{0,x_n}}{n}\to g(\xi), \qquad \frac{G_{(n)}^{\textup{line}}}{n}\to \max_{\xi\in[e_1,e_2]} g(\xi). \]The function \(g\) is the shape function for the directed model. It describes the first-order growth of the infection times, and the corresponding sublevel sets of the infection process converge, after rescaling, to deterministic sets.
Figure 4. Three times in LPP growth
Figure 3. LPP as an infection model
Project 2
Exponential point-to-point and point-to-line LPP
Work here with i.i.d. exponential(1) weights. The recursion above makes this model easy to simulate, and the first-order shape function is known explicitly: \[ g(x_1,x_2)=(\sqrt{x_1}+\sqrt{x_2})^2. \] Along the diagonal this gives \(g(1,1)=4\), so \(G_{0,(n,n)}/n\) should approach \(4\). On the line \(x_1+x_2=1\), the maximum is attained at \((1/2,1/2)\), so \(G_{(n)}^{\textup{line}}/n\) should approach \(2\).
- Generate i.i.d. exponential(1) weights on square boxes and compute \(G_{0,(i,j)}\) by dynamic programming.
- Make one picture or three still frames of the infected region \(\mathcal I_0(t)\) at increasing times, in the style of Figures 3 and 4.
- For a sequence of \(n\), compute both \(G_{0,(n,n)}\) and \(G_{(n)}^{\textup{line}}\).
- Plot \(G_{0,(n,n)}/n\) against \(n\) and check that it moves toward \(4\).
- Plot \(G_{(n)}^{\textup{line}}/n\) against \(n\) and check that it moves toward \(2\).
- Include one short comparison of the point-to-point and point-to-line models at the first-order level.
Geodesics
In both FPP and LPP, a geodesic is a path that realizes the optimizing passage time. When geodesics to each target are unique, the union of the geodesics from a fixed root to many targets forms a finite geodesic tree. These trees encode how optimizing paths branch and merge.
Figure 5. A finite geodesic tree
Competition interfaces
In the infection picture rooted at the origin, each geodesic begins with either \(e_1\) or \(e_2\). When geodesics are unique, this gives a genuine tree from the origin and assigns each site to one of two classes: those whose geodesic starts with \(e_1\), and those whose geodesic starts with \(e_2\). The competition interface is the boundary between those two classes.
This boundary is a geometric object in its own right. It records how the two lineages separate, and it leads to questions about direction, coalescence, and fluctuations that are closely related to the geodesic questions above. A longer discussion appears on the competition-interface page.
Figure 6. A competition interface
Geodesics to distant endpoints are typically close to the straight line joining their endpoints. The transversal fluctuation is the size of the geodesic's deviation from that straight line.
Project 3
Measure geodesic straightness
The goal here is to look for the transversal fluctuation exponent. For a geodesic from \(0\) to \((n,n)\), measure how far the path wanders from the straight line \(x_1=x_2\), and then study how that deviation grows with \(n\). A reasonable first test is to use sizes such as \(n=64,128,256,512\).
- Work in exponential(1) LPP and, for each \(n\), generate many geodesics from \(0\) to \((n,n)\).
- For each sampled geodesic, record the maximal distance from the path to the diagonal line \(x_1=x_2\).
- At each \(n\), compute a summary statistic of that distance, such as the sample mean or median, and make a log-log plot against \(n\).
- Fit a line to that log-log plot and compare the fitted slope with the KPZ prediction \(2/3\). Include one overlay of several geodesics to \((n,n)\) together with the diagonal line.
As one representative example, using \(n=64,128,256,512\) and \(150\) samples at each size produced fitted slopes of about \(0.676\) for the mean deviation and \(0.686\) for the median deviation. A log-log plot in that range should already give a slope close to \(2/3\).
Centering and scaling
The shape limits above already identify the first-order growth of the model. For point-to-point and point-to-line passage times, the centering terms are the linear terms coming from those limits. The next question is how to scale the fluctuations around that first-order behavior.
Centered and scaled passage times
One studies the centered and scaled point-to-point and point-to-line passage times by subtracting deterministic terms b sub n and b tilde sub n and dividing by fluctuation scales a sub n and a tilde sub n.These are the KPZ analogues of subtracting the first-order growth term and dividing by a fluctuation scale.
\[ \frac{G_{0,(n,n)}-b_n}{a_n}, \qquad \frac{G_{(n)}^{\textup{line}}-\widetilde b_n}{\widetilde a_n}. \]The previous shape statements say that these centering terms are linear in \(n\):
First-order growth
In exponential(1) last-passage percolation, the shape function gives the first-order growth explicitly. For point-to-point passage times from the origin to n comma n, the centering term is 4 n. For the point-to-line model, the centering term is 2 n.In exponential(1) LPP these centering terms are explicit: \(g(1,1)=4\) for point-to-point passage times, and the maximum of \(g(\xi)\) over the line segment from \(e_1\) to \(e_2\) is \(2\) for the point-to-line model.
\[ G_{0,(n,n)}=4n+\text{fluctuation}, \qquad G_{(n)}^{\textup{line}}=2n+\text{fluctuation}. \]The shape function describes the first-order growth, but in most models it is not available in closed form. The next question is the fluctuation scale: what is the order of \(a_n\) and \(\widetilde a_n\)? That is one of the KPZ scaling exponents. Once the correct centering and scaling are identified, one can ask for the limiting distributions of the normalized passage times. Random matrix models provide the first examples where those limiting laws can be computed explicitly.
Random matrices
The basic objects here are large random matrices, and the statistic of interest is the largest eigenvalue.
Gaussian matrix ensembles
A GUE matrix is obtained by symmetrizing a matrix of independent complex Gaussian entries, and a GOE matrix is obtained by symmetrizing a matrix of independent real Gaussian entries, with the normalization chosen so that the spectral edge is at 2.One way to define these ensembles is to start with a matrix of independent Gaussian entries and then impose the required symmetry.
\[ H_n=\frac{X_n+X_n^*}{2\sqrt{n}}, \qquad M_n=\frac{Y_n+Y_n^{\mathsf T}}{\sqrt{2n}}. \]Here \(X_n\) has i.i.d. complex Gaussian entries of the form \(U+iV\), where \(U\) and \(V\) are independent \(N(0,1)\) random variables, and \(Y_n\) has i.i.d. \(N(0,1)\) entries. The Hermitian matrix \(H_n\) is the Gaussian Unitary Ensemble (GUE), and the real symmetric matrix \(M_n\) is the Gaussian Orthogonal Ensemble (GOE).
The factor \(1/\sqrt{n}\) is the normalization that keeps the spectrum on an order-one scale. With this convention, the empirical eigenvalue distribution, which is essentially the histogram of eigenvalues, converges as \(n\to\infty\) to a deterministic limiting density.
For these Gaussian ensembles, that limiting density is the semicircle law. The largest eigenvalue sits at the upper edge of this limiting density, so one first identifies that edge and then studies the fluctuations around it.
Figure 7. Single-sample spectra
Figure 7 shows this at the level of one sample: the eigenvalues fill an interval, and the histogram is already close to the semicircle density. The next question is how the edge of that spectrum fluctuates from sample to sample.
Edge heuristic
The semicircle density is rho sub sc of x equals one over two pi times the square root of four minus x squared on the interval from minus two to two. To estimate the distance from the edge to the next eigenvalue, one integrates this density over the interval from 2 minus s to 2. The expected number of eigenvalues there is n times that integral. Since the integral is of order s to the three halves, setting that expected count equal to one suggests the scale n to the minus two thirds.The limiting density is the semicircle law
\[ \rho_{\mathrm{sc}}(x)=\frac{1}{2\pi}\sqrt{4-x^2}\,\mathbf{1}_{|x|\le 2}. \]Heuristically, one looks for the scale \(s\) at which the expected number of eigenvalues in \([2-s,2]\) is of order \(1\). That expected count is approximately \[ n\int_{2-s}^{2}\rho_{\mathrm{sc}}(x)\,dx. \] Near \(x=2\), the semicircle density behaves like a constant times \(\sqrt{2-x}\), so this integral is of order \(s^{3/2}\). Setting the expected count equal to \(1\) suggests \(s\approx n^{-2/3}\).
Figure 8. Edge-scaled largest eigenvalues
After centering at the spectral edge and scaling by \(n^{2/3}\), the largest eigenvalue converges to a non-Gaussian limit law. In the real symmetric case this is the Tracy-Widom GOE law, and in the Hermitian case it is the Tracy-Widom GUE law.
Project 4
Sample GUE and GOE matrices
- Generate GUE matrices and GOE matrices at several sizes.
- For one large matrix in each ensemble, plot the full eigenvalue histogram and compare it with the semicircle curve as in Figure 7.
- Compute the largest eigenvalue from many matrix samples.
- Center at the spectral edge \(2\) and scale by \(n^{2/3}\), as in Figure 8.
- Plot separate histograms for the GOE and GUE edge statistics.
- Compare the empirical distribution functions across two matrix sizes in each ensemble.
KPZ laws
Returning now to exponential(1) last-passage percolation, the same Tracy-Widom laws reappear there as well. In the KPZ class, the universal features are scaling exponents and limit laws. Passage-time fluctuations are expected on the \(n^{1/3}\) scale, and geodesic deviations from the straight line are expected on the \(n^{2/3}\) scale. In exponential(1) LPP these quantities can be computed explicitly, which makes it possible to identify the limiting laws.
Transversal scale
For a geodesic from the origin to n comma n, the maximal distance from the straight line segment joining the endpoints is expected to be of order n to the two thirds.If \(\pi_n\) is a geodesic from \(0\) to \((n,n)\) and \(L_n\) is the straight line segment joining those endpoints, then the KPZ prediction is that the largest deviation of \(\pi_n\) from \(L_n\) is on the \(n^{2/3}\) scale.
\[ \max_{z\in \pi_n}\operatorname{dist}(z,L_n)\asymp n^{2/3}. \]Point-to-point law
In exponential(1) last-passage percolation, the centered and scaled point-to-point passage time from the origin to n comma n converges in distribution to the Tracy-Widom GUE law.In exponential(1) point-to-point LPP, the centered and scaled passage time has a non-Gaussian limit law.
\[ \frac{G_{0,(n,n)}-4n}{2^{4/3}n^{1/3}} \Longrightarrow \mathrm{TW}_{\mathrm{GUE}}. \]The same limit law appears for the largest eigenvalue in GUE after its own centering and scaling. For the point-to-line model, the corresponding exponential(1) LPP limit law is Tracy-Widom GOE.
Point-to-line law
In exponential(1) last-passage percolation, the centered and scaled point-to-line passage time converges in distribution to the Tracy-Widom GOE law.Changing the geometry of the terminal set changes the limiting law. In the point-to-line case, the corresponding limit is Tracy-Widom GOE.
\[ \frac{G_{(n)}^{\textup{line}}-\widetilde b_n}{\widetilde a_n} \Longrightarrow \mathrm{TW}_{\mathrm{GOE}}. \]The same scaling picture is conjectured in two-dimensional first-passage percolation for sufficiently nice i.i.d. weight distributions. There too one expects passage-time fluctuations on the \(n^{1/3}\) scale and geodesic deviations from the straight line on the \(n^{2/3}\) scale. Unlike exponential(1) LPP, the centering and scaling terms are not known explicitly.
Conjectural FPP law
In point-to-point first-passage percolation with sufficiently nice independent identically distributed weights, after suitable centering and scaling, the passage time from the origin to n comma n is conjectured to converge in distribution to the Tracy-Widom GUE law.For point-to-point first-passage percolation with sufficiently nice i.i.d. weights, one expects a model-dependent centering sequence \(b_n\) and scaling sequence \(a_n\) such that the limiting law is again Tracy-Widom GUE.
\[ \frac{T_{0,(n,n)}-b_n}{a_n} \Longrightarrow \mathrm{TW}_{\mathrm{GUE}}. \]Empirical cumulants
If Y sub 1 through Y sub m are centered and scaled samples, then the empirical first cumulant is the sample mean, the second cumulant is the empirical variance, the third cumulant is the average of Y minus Y bar cubed, and the fourth cumulant is the average of Y minus Y bar to the fourth minus three times the empirical variance squared.If \(Y_1,\dots,Y_m\) are already centered-and-scaled samples, their empirical cumulants are estimated by
\[ \widehat\kappa_1=\overline Y=\frac{1}{m}\sum_{i=1}^m Y_i,\qquad \widehat\kappa_2=\frac{1}{m}\sum_{i=1}^m (Y_i-\overline Y)^2, \] \[ \widehat\kappa_3=\frac{1}{m}\sum_{i=1}^m (Y_i-\overline Y)^3,\qquad \widehat\kappa_4=\frac{1}{m}\sum_{i=1}^m (Y_i-\overline Y)^4-3\widehat\kappa_2^2. \]These are the quantities compared with the Tracy-Widom table below.
Project 5
Point-to-point and GUE
- Simulate \(G_{0,(n,n)}\) in exponential LPP for a sequence of \(n\).
- Sample largest eigenvalues of GUE matrices.
- Center and scale both collections using the explicit exponential-LPP and GUE normalizations.
- Compare the resulting histograms directly.
- Compute empirical mean, variance, third cumulant, and fourth cumulant for the normalized LPP passage times and compare them with the corresponding cumulants of \(\mathrm{TW}_{\mathrm{GUE}}\).
| Statistic | Value |
|---|---|
| Mean | -1.7711 |
| Variance | 0.8132 |
| Third cumulant | 0.1643 |
| Fourth cumulant | 0.0618 |
Project 6
Point-to-line and GOE
- Compute \(G_{(n)}^{\textup{line}}\) in exponential LPP for a sequence of \(n\).
- Sample largest eigenvalues of GOE matrices.
- Center and scale both collections using the appropriate point-to-line and GOE normalizations.
- Compare histograms and empirical distribution functions for the normalized samples.
- Record how the point-to-line law differs numerically from the point-to-point law.
Other weights
Exponential(1) weights are special because the model is "solvable." This is a term of art meaning that some algebraic miracles happen which make exact computation possible. When the weight distribution is changed even slightly, the explicit formulas for the shape function and the fluctuation scales that give us very deep insight into the exponential(1) model are no longer available. The universality question remains the same, but it becomes harder to study analytically. The exercise below asks for numerical evidence for universality by comparing dimensionless cumulants of the passage-time fluctuations with the Tracy-Widom GUE values.
Once we change the distribution to be anything other than exponential(1), it becomes much more involved to do the kind of comparisons we did above. Numerically estimating the centering and scaling correctly can be a delicate problem. A simpler comparison is to study normalized cumulants, which do not require us to estimate those terms. After empirical centering and scaling, the first two cumulants are fixed at \(0\) and \(1\), so the skewness and excess kurtosis become dimensionless quantities that can be compared directly with Tracy-Widom GUE.
Standardized cumulants
Given passage-time samples X sub 1 through X sub m, first compute the empirical mean X bar and empirical standard deviation s. Then standardize by setting Z sub i equals X sub i minus X bar divided by s. The standardized cumulants are estimated from sample averages of powers of Z sub i. The third is the average of Z cubed, and the fourth is the average of Z to the fourth minus three.If \(X_1,\dots,X_m\) are simulated passage times, first normalize them by the empirical mean and empirical standard deviation:
\[ \overline X=\frac{1}{m}\sum_{i=1}^m X_i, \qquad s^2=\frac{1}{m}\sum_{i=1}^m (X_i-\overline X)^2, \qquad Z_i=\frac{X_i-\overline X}{s}. \]The standardized cumulants are then estimated from the empirical moments of the \(Z_i\):
\[ \widehat\kappa_3=\frac{1}{m}\sum_{i=1}^m Z_i^3, \qquad \widehat\kappa_4=\frac{1}{m}\sum_{i=1}^m Z_i^4-3, \]| Statistic | Value |
|---|---|
| Skewness | 0.2241 |
| Excess kurtosis | 0.0934 |
Project 7
Non-exponential LPP
- Replace the exponential weights by another distribution such as the uniform, normal, gamma, or lognormal.
- Simulate \(G_{0,(n,n)}\) for a sequence of \(n\) and collect many samples at each size.
- At each \(n\), center and scale the passage times using the empirical sample mean and sample standard deviation.
- Compute the empirical skewness and excess kurtosis and compare them with the \(\mathrm{TW}_{\mathrm{GUE}}\) values in the table above.
- Record how stable those cumulant estimates are as \(n\) increases and as the weight distribution changes.