The Universality Phenomenon in Probability

The Universality Phenomenon

Many systems in nature are built from a large number of microscopic "degrees of freedom." In probabilistic and statistical models, one often represents those degrees of freedom by random variables \(X_1,X_2,\dots\), and for each system size \(n\) studies an observable \[ F_n(X_1,\dots,X_n). \] We are interested in the "macroscopic" behavior, so we “zoom out” by sending \(n\to\infty\). Our interest is often in whether we can find parameters \(a_n\) and \(b_n\) so that when we center and scale appropriately, \[ \frac{F_n(X_1,\dots,X_n)-a_n}{b_n}, \] we see an interesting limit. If this feels abstract at first, don’t worry. We will turn to concrete examples that illustrate what this all means in a moment.

Universality is the observation that very frequently different microscopic models can exhibit the same large-scale behavior after the correct normalization. In many examples, the precise constants in these terms depend on the model, while their orders of growth are shared across a wide class of models and physical phenomena with the same bhavior. This collection is called a "universality class." The shared growth rates are often expressed in terms of "scaling exponents." The universality phenomenon is ubiquitous and is responsible for much of the power of statistical methods. To illustrate the idea, we now turn to some concrete examples.

The Gaussian universality class

The most familiar universality class is the Gaussian one. It appears when a quantity is built from many random contributions that are being added together and are roughly independent and similarly distributed, where no single term dominates the rest. In the classical setting one works with i.i.d. random variables, but exact independence and exact identical distribution are not absolutely necessary. Here we stay in the finite-variance setting. If the variance is infinite, different stable-law behavior can appear.

Start with observations \(X_1,\dots,X_n\), where \(X_i\) is the \(i\)-th sample. The basic statistic is the sample mean The sample mean x bar sub n equals x one plus x two plus through x n, all divided by n. \[ \overline{X}_n=\frac{X_1+\cdots+X_n}{n}. \] We expect \(\overline{X}_n\) to get close to the true mean \(\mu=\mathbb{E}[X_1]\) as \(n\) grows. The more refined question is how \(\overline{X}_n\) fluctuates around \(\mu\). Using the variance identity for uncorrelated random variables, if \(\sigma^2=\mathrm{Var}(X_1)\), then The variance of the sample mean equals sigma squared divided by n. \[ \mathrm{Var}(\overline{X}_n)=\mathrm{Var}\left(\frac{X_1+\cdots+X_n}{n}\right)=\frac{\sigma^2}{n}. \] This tells us the fluctuations of \(\overline{X}_n\) are of order \(n^{-1/2}\). To see a non-trivial limit, it is natural to standardize: The normalized sample mean is square root of n times the sample mean minus mu, divided by sigma. This quantity has mean zero and variance one. \[ \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}, \] which has mean \(0\) and variance \(1\). In the finite-variance setting, the Gaussian law describes the fluctuations of this normalized sample mean.

Central limit theorem

If \(X_1,\dots,X_n\) are i.i.d. observations with mean \(\mu=\mathbb{E}[X_1]\) and finite variance \(\sigma^2=\mathrm{Var}(X_1)\), then

\[ \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}\Longrightarrow N(0,1) \qquad\text{as } n\to\infty. \]

Here \(\Longrightarrow\) means convergence in distribution, and \(N(0,1)\) denotes the standard Gaussian distribution.

Figure 1. Gaussian density

The standard Gaussian density \(N(0,1)\), commonly known as the "bell curve", which describes the limiting fluctuations in the central limit theorem.

Convergence in distribution here means that if we repeatedly draw samples of size \(n\) and compute the normalized sample mean each time, then for large \(n\) the resulting values have a distribution that is close to \(N(0,1)\). Figure 2 illustrates this with simulations: averages of coin flips, uniform random variables, and exponential random variables begin with very different distributions, but after centering and scaling their fluctuations are described increasingly well by the same Gaussian law.

Figure 2. Simulated examples of Gaussian universality

The left column shows samples from the original distributions. In each row and for each value of \(n\), we generate 20,000 independent samples of size \(n\), compute the sample mean of each one, and then normalize that mean by subtracting \(\mu\) and dividing by \(\sigma/\sqrt{n}\). The histograms in the right columns show the 20,000 resulting normalized values, which become more similar as \(n\) increases.

The same phenomenon appears in practical averages built from observed data. In that setting the population mean and standard deviation are not known, so we estimate them from the full dataset and then standardize the averages we want to study.

Empirical standardization

If an observed dataset consists of values \(Y_1,\dots,Y_m\), define the empirical mean \(\widehat{\mu}\) and empirical standard deviation \(\widehat{\sigma}\) by

\[ \widehat{\mu}=\frac{Y_1+\cdots+Y_m}{m}, \qquad \widehat{\sigma}=\left(\frac{1}{m-1}\sum_{i=1}^m (Y_i-\widehat{\mu})^2\right)^{1/2}. \]

If \(A_n\) is an average built from \(n\) observed values, we standardize it by

\[ \frac{\sqrt{n}(A_n-\widehat{\mu})}{\widehat{\sigma}}. \]

In the examples below, \(A_n\) is either the average height in a simple random sample without replacement from the NHANES cohort, or the average of \(n\) distinct July daily average temperatures sampled without replacement from the pooled Indianapolis July record across all years. Figure 3 shows the two observed datasets themselves, and Figure 4 shows the standardized averages obtained from repeated simple random samples.

Datasets

We will illustrate Gaussian universality using two different and unrelated datasets.

NHANES heights

The first empirical distribution uses the standing-height variable BMXHT from the NHANES 2021-2023 Body Measures file BMX_L. Each observation is a measured height in centimeters. Below, we form many simple random samples without replacement from this observed cohort and compute their average heights. The source file is available from the CDC NHANES Body Measures page.

July temperatures

The second empirical distribution uses the daily average temperature variable TAVG from NOAA station USW00093819, Indianapolis International Airport, over the date range 1950-01-01 through 2023-12-31. To avoid mixing all seasons together, we restrict to July observations and treat those daily average temperatures as the observed cohort. Below, we pool the July observations across all years, repeatedly choose \(n\) distinct July days from that pooled cohort, and average their TAVG values. The sampled days are not required to lie in the same calendar year. The underlying series can be obtained directly from NOAA Daily Summaries for Indianapolis temperatures.

Figure 3. Empirical data

These observed datasets do not start out looking alike. One records human heights, and the other records July daily average temperatures from one airport station over many years.

Figure 4. Observed averages

In both rows, each histogram comes from many simple random samples without replacement from the observed dataset. For the temperature row, each sample consists of \(n\) distinct July days drawn from the pooled Indianapolis July record across all years, and the plotted value is the average of their TAVG values. The sample means are centered by \(\widehat{\mu}\), scaled by \(\widehat{\sigma}/\sqrt{n}\), and compared with the standard Gaussian density.

Exercise 2

Real data

Reproduce Figures 3 and 4 using one official dataset from the CDC and one from NOAA. The same examples used here are available from the CDC NHANES Body Measures page and NOAA Daily Summaries for Indianapolis temperatures.

Download the NHANES standing-height variable BMXHT and the Indianapolis daily average temperature series used above.
Plot the observed distributions side by side to show that the datasets do not begin with the same shape.
Compute the empirical mean \(\widehat{\mu}\) and empirical standard deviation \(\widehat{\sigma}\) from each full dataset.
For the temperature example, first restrict the NOAA record to July observations only.
From each dataset, form many simple random samples without replacement of sizes \(n=20\) and \(n=50\); for the NOAA data, the July observations are pooled across years, each sample is a set of distinct July days from that pooled cohort, and the statistic is the average of their TAVG values.
Standardize those sample means using \(\widehat{\mu}\) and \(\widehat{\sigma}\).
Produce the standardized histograms with the Gaussian density overlaid.

The classical extreme-value universality classes

The Gaussian universality class is ubiquitous because sums and averages appear throughout statistics, but it is not the only recurring universal large-scale phenomenon. Another classical example appears when the statistic of interest is the maximum rather than the average. In the classical setting, one studies maxima of large i.i.d. samples and asks how those maxima behave after the right centering and scaling.

Start with the maximum M sub n is the maximum of x one through x n. \[ M_n=\max\{X_1,\dots,X_n\}. \] As in the Gaussian case, the question is not just whether \(M_n\) grows, but how it fluctuates after the right normalization.

Classical extreme-value theorem

If \(X_1,X_2,\dots\) are i.i.d. and there are constants \(a_n>0\) and \(b_n\) such that

\[ \frac{M_n-b_n}{a_n}\Longrightarrow G, \]

where \(G\) denotes the cumulative distribution function of the limiting law. Then \(G\) must be a generalized extreme-value distribution. In standard notation,

\[ G_{\xi,\mu,\sigma}(x)= \begin{cases} \exp\!\left(-\left(1+\xi\dfrac{x-\mu}{\sigma}\right)^{-1/\xi}\right), & \xi\ne 0,\\[0.9em] \exp\!\left(-e^{-(x-\mu)/\sigma}\right), & \xi=0, \end{cases} \]

Here \(\sigma>0\), and in the case \(\xi\ne 0\) the formula is defined on the region where \(1+\xi(x-\mu)/\sigma>0\).

Here \(\mu\) and \(\sigma\) are the location and scale parameters, while the sign of \(\xi\) distinguishes the three classical types: \(\xi=0\) gives the Gumbel case, \(\xi>0\) gives the Fréchet-type case, and \(\xi<0\) gives the Weibull-type case. After affine normalization, one may reduce to the special case \(\mu=0\) and \(\sigma=1\).

The underlying distribution can vary enormously, but after centering and scaling the maxima, the limiting law must fall into this one generalized extreme-value family. Figure 5 shows the three classical parameter regimes first in simulated data.

Figure 5. Extreme values

Three very different input distributions lead, after normalization, to three parameter regimes of the generalized extreme-value family: Gumbel, Weibull, and Fréchet. The curves shown here are the corresponding limiting densities.

Exercise 3

Simulated maxima

Reproduce Figure 5.

For each of the three distributions, repeatedly generate an i.i.d. sample of size \(n\): Exponential\((1)\), Uniform\((0,1)\), and Pareto with parameter \(\alpha=2.5\).
Vary the sample size \(n\) and increase it until the histogram of the normalized maxima is visibly close to the corresponding limiting density in each example.
From each sample, record the maximum \(M_n\). Then repeat this many times, so that you build up a large collection of maxima to plot in a histogram.
In the notation of the theorem above, normalize each case using the corresponding sequences \(a_n\) and \(b_n\).
1. For Exponential\((1)\), take \(a_n=1\) and \(b_n=\log n\), so that \((M_n-b_n)/a_n=M_n-\log n\).
2. For Uniform\((0,1)\), take \(a_n=1/n\) and \(b_n=1\), so that \((M_n-b_n)/a_n=n(M_n-1)\).
3. For the Pareto example with parameter \(\alpha\), take \(a_n=n^{1/\alpha}\) and \(b_n=0\), so that \((M_n-b_n)/a_n=M_n/n^{1/\alpha}\).
Plot the three histograms and overlay the corresponding limiting densities.

Classical extreme-value ideas also appear in real data, but now one begins with an observed series and extracts maxima from blocks of time rather than from repeated i.i.d. samples generated by a model.

Empirical block maxima

If an observed series \(Y_1,\dots,Y_m\) is divided into blocks \(B_1,\dots,B_r\), define the block maxima by

\[ Z_k=\max\{Y_j:j\in B_k\}. \]

In the example below, the blocks are calendar years and calendar quarters of one rainfall record.

For an empirical illustration, we can take a long observed series, divide it into blocks of time, compute one maximum from each block, and compare the resulting histogram with a fitted extreme-value density.

Fitted generalized extreme-value law

If \(Z_1,\dots,Z_r\) are the observed block maxima, we fit a generalized extreme-value distribution by estimating a location parameter \(\widehat{\mu}\), a scale parameter \(\widehat{\sigma}>0\), and a shape parameter \(\widehat{\xi}\). On this page the fit is done by maximum likelihood using SciPy's genextreme.fit routine.

\[ G_{\widehat{\xi},\widehat{\mu},\widehat{\sigma}}(x)= \begin{cases} \exp\!\left(-\left(1+\widehat{\xi}\frac{x-\widehat{\mu}}{\widehat{\sigma}}\right)^{-1/\widehat{\xi}}\right), & 1+\widehat{\xi}\dfrac{x-\widehat{\mu}}{\widehat{\sigma}}>0,\ \widehat{\xi}\ne 0,\\[0.8em] \exp\!\left(-e^{-(x-\widehat{\mu})/\widehat{\sigma}}\right), & \widehat{\xi}=0. \end{cases} \]

The histogram shows the empirical block maxima, and the curve is the density of the fitted generalized extreme-value distribution. The case \(\widehat{\xi}=0\) is the Gumbel family.

Dataset

We will illustrate an empirical extreme-value calculation using one rainfall record.

Daily rainfall

The underlying series is the daily precipitation variable PRCP from NOAA station USW00093819, Indianapolis International Airport, over the date range 1950-01-01 through 2023-12-31. The source data are available directly from NOAA Daily Summaries for Indianapolis rainfall.

Derived maxima

From that daily series we form one maximum for each calendar year and one maximum for each calendar quarter. Figure 6 below compares the resulting histograms with fitted generalized extreme-value densities.

Figure 6. Rainfall maxima

Each panel shows block maxima from the 1950-2023 Indianapolis rainfall record together with a generalized extreme-value density fitted by maximum likelihood. The annual and quarterly block maxima are fitted separately, and the fitted \((\xi,\mu,\sigma)\) values are displayed in the panels.

Exercise 4

Rainfall maxima

Use the Indianapolis rainfall record to reproduce Figure 6.

Download the NOAA daily rainfall series used above, keeping the observation date and the daily precipitation variable PRCP.
Parse the dates into calendar years and calendar quarters.
Group the data by calendar year and take the largest daily rainfall in each year. Then group the same series by calendar quarter and take the largest daily rainfall in each quarter.
You now have two collections of block maxima: one annual and one quarterly. Fit a generalized extreme-value distribution to each collection separately. On this page the fit is done by maximum likelihood with SciPy's genextreme.fit.
For each fitted model, evaluate the corresponding density on a grid of rainfall values.
Plot the histogram of the annual maxima with its fitted density, and the histogram of the quarterly maxima with its fitted density. Include the fitted parameter values \((\xi,\mu,\sigma)\) in the figure.

Toward the KPZ universality class

The classical extreme-value theorems describe maxima of independent samples, or of block maxima treated as approximately independent. A natural next question is what happens when the quantity of interest is still extremal, but the competing random quantities are tied together by geometry and strong dependence.

Some models of this kind fall into the KPZ universality class. Last-passage percolation is a basic example: one maximizes over path weights, but overlapping paths create strong correlations.

Continue to a first introduction to KPZ-type behavior through random growth, first- and last-passage percolation, and random matrices.

The Universality Phenomenon in Probability