Starting point

The Universality Phenomenon in Probability

Why should average July temperatures and average human heights have fluctuations described by the same limiting law? Why should annual rainfall maxima and yearly maximum insurance losses exhibit a different kind of limiting behavior? This page explores those questions through two classical examples of universality: the Gaussian universality class and the extreme-value universality classes.

The Universality Phenomenon

Many systems in nature are built from a large number of microscopic "degrees of freedom." In probabilistic and statistical models, one often represents those degrees of freedom by random variables \(X_1,X_2,\dots\), and for each system size \(n\) studies an observable \[ F_n(X_1,\dots,X_n). \] We are interested in the "macroscopic" behavior, so we “zoom out” by sending \(n\to\infty\). Our interest is often in whether we can find parameters \(a_n\) and \(b_n\) so that when we center and scale appropriately, \[ \frac{F_n(X_1,\dots,X_n)-a_n}{b_n}, \] we see an interesting limit. If this feels abstract at first, don’t worry. We will turn to concrete examples that illustrate what this all means in a moment.

Universality is the observation that very frequently different microscopic models can exhibit the same large-scale behavior after the correct normalization. In many examples, the precise constants in these terms depend on the model, while their orders of growth are shared across a wide class of models and physical phenomena with the same bhavior. This collection is called a "universality class." The shared growth rates are often expressed in terms of "scaling exponents." The universality phenomenon is ubiquitous and is responsible for much of the power of statistical methods. To illustrate the idea, we now turn to some concrete examples.

The Gaussian universality class

The most familiar universality class is the Gaussian one. It appears when a quantity is built from many random contributions that are being added together and are roughly independent and similarly distributed, where no single term dominates the rest. In the classical setting one works with i.i.d. random variables, but exact independence and exact identical distribution are not absolutely necessary. Here we stay in the finite-variance setting. If the variance is infinite, different stable-law behavior can appear.

Start with observations \(X_1,\dots,X_n\), where \(X_i\) is the \(i\)-th sample. The basic statistic is the sample mean The sample mean x bar sub n equals x one plus x two plus through x n, all divided by n. \[ \overline{X}_n=\frac{X_1+\cdots+X_n}{n}. \] We expect \(\overline{X}_n\) to get close to the true mean \(\mu=\mathbb{E}[X_1]\) as \(n\) grows. The more refined question is how \(\overline{X}_n\) fluctuates around \(\mu\). Using the variance identity for uncorrelated random variables, if \(\sigma^2=\mathrm{Var}(X_1)\), then The variance of the sample mean equals sigma squared divided by n. \[ \mathrm{Var}(\overline{X}_n)=\mathrm{Var}\left(\frac{X_1+\cdots+X_n}{n}\right)=\frac{\sigma^2}{n}. \] This tells us the fluctuations of \(\overline{X}_n\) are of order \(n^{-1/2}\). To see a non-trivial limit, it is natural to standardize: The normalized sample mean is square root of n times the sample mean minus mu, divided by sigma. This quantity has mean zero and variance one. \[ \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}, \] which has mean \(0\) and variance \(1\). In the finite-variance setting, the Gaussian law describes the fluctuations of this normalized sample mean.

Central limit theorem

If \(X_1,\dots,X_n\) are i.i.d. observations with mean \(\mu=\mathbb{E}[X_1]\) and finite variance \(\sigma^2=\mathrm{Var}(X_1)\), then

Central limit theorem. For independent identically distributed observations with finite variance, the normalized sample mean converges in distribution to a standard Gaussian random variable. \[ \frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}\Longrightarrow N(0,1) \qquad\text{as } n\to\infty. \]

Here \(\Longrightarrow\) means convergence in distribution, and \(N(0,1)\) denotes the standard Gaussian distribution.

Figure 1. Gaussian density

The standard Gaussian density \(N(0,1)\), commonly known as the "bell curve", which describes the limiting fluctuations in the central limit theorem.

Convergence in distribution here means that if we repeatedly draw samples of size \(n\) and compute the normalized sample mean each time, then for large \(n\) the resulting values have a distribution that is close to \(N(0,1)\). Figure 2 illustrates this with simulations: averages of coin flips, uniform random variables, and exponential random variables begin with very different distributions, but after centering and scaling their fluctuations are described increasingly well by the same Gaussian law.

The same phenomenon appears in practical averages built from observed data. In that setting the population mean and standard deviation are not known, so we estimate them from the full dataset and then standardize the averages we want to study.

Empirical standardization

If an observed dataset consists of values \(Y_1,\dots,Y_m\), define the empirical mean \(\widehat{\mu}\) and empirical standard deviation \(\widehat{\sigma}\) by

The empirical mean mu hat is y one plus through y m divided by m. The empirical standard deviation sigma hat is the square root of one over m minus 1 times the sum of the squared differences between y i and mu hat. \[ \widehat{\mu}=\frac{Y_1+\cdots+Y_m}{m}, \qquad \widehat{\sigma}=\left(\frac{1}{m-1}\sum_{i=1}^m (Y_i-\widehat{\mu})^2\right)^{1/2}. \]

If \(A_n\) is an average built from \(n\) observed values, we standardize it by

The standardized observed average is square root of n times A sub n minus mu hat, divided by sigma hat. \[ \frac{\sqrt{n}(A_n-\widehat{\mu})}{\widehat{\sigma}}. \]

In the examples below, \(A_n\) is either the average height in a simple random sample without replacement from the NHANES cohort, or the average of \(n\) distinct July daily average temperatures sampled without replacement from the pooled Indianapolis July record across all years. Figure 3 shows the two observed datasets themselves, and Figure 4 shows the standardized averages obtained from repeated simple random samples.

Datasets

We will illustrate Gaussian universality using two different and unrelated datasets.

NHANES heights

The first empirical distribution uses the standing-height variable BMXHT from the NHANES 2021-2023 Body Measures file BMX_L. Each observation is a measured height in centimeters. Below, we form many simple random samples without replacement from this observed cohort and compute their average heights. The source file is available from the CDC NHANES Body Measures page.

July temperatures

The second empirical distribution uses the daily average temperature variable TAVG from NOAA station USW00093819, Indianapolis International Airport, over the date range 1950-01-01 through 2023-12-31. To avoid mixing all seasons together, we restrict to July observations and treat those daily average temperatures as the observed cohort. Below, we pool the July observations across all years, repeatedly choose \(n\) distinct July days from that pooled cohort, and average their TAVG values. The sampled days are not required to lie in the same calendar year. The underlying series can be obtained directly from NOAA Daily Summaries for Indianapolis temperatures.

The classical extreme-value universality classes

The Gaussian universality class is ubiquitous because sums and averages appear throughout statistics, but it is not the only recurring universal large-scale phenomenon. Another classical example appears when the statistic of interest is the maximum rather than the average. In the classical setting, one studies maxima of large i.i.d. samples and asks how those maxima behave after the right centering and scaling.

Start with the maximum M sub n is the maximum of x one through x n. \[ M_n=\max\{X_1,\dots,X_n\}. \] As in the Gaussian case, the question is not just whether \(M_n\) grows, but how it fluctuates after the right normalization.

Classical extreme-value theorem

If \(X_1,X_2,\dots\) are i.i.d. and there are constants \(a_n>0\) and \(b_n\) such that

If the centered and scaled maximum converges in distribution to a limit with cumulative distribution function G, then that limit must be a generalized extreme-value distribution. \[ \frac{M_n-b_n}{a_n}\Longrightarrow G, \]

where \(G\) denotes the cumulative distribution function of the limiting law. Then \(G\) must be a generalized extreme-value distribution. In standard notation,

The limiting cumulative distribution function is a generalized extreme-value distribution with location mu, positive scale sigma, and shape xi. If xi is not zero, the cdf equals the exponential of minus one plus xi times x minus mu over sigma, raised to minus one over xi, on the region where that quantity is positive. If xi is zero, the cdf is the exponential of minus the exponential of minus x minus mu over sigma. \[ G_{\xi,\mu,\sigma}(x)= \begin{cases} \exp\!\left(-\left(1+\xi\dfrac{x-\mu}{\sigma}\right)^{-1/\xi}\right), & \xi\ne 0,\\[0.9em] \exp\!\left(-e^{-(x-\mu)/\sigma}\right), & \xi=0, \end{cases} \]

Here \(\sigma>0\), and in the case \(\xi\ne 0\) the formula is defined on the region where \(1+\xi(x-\mu)/\sigma>0\).

Here \(\mu\) and \(\sigma\) are the location and scale parameters, while the sign of \(\xi\) distinguishes the three classical types: \(\xi=0\) gives the Gumbel case, \(\xi>0\) gives the Fréchet-type case, and \(\xi<0\) gives the Weibull-type case. After affine normalization, one may reduce to the special case \(\mu=0\) and \(\sigma=1\).

The underlying distribution can vary enormously, but after centering and scaling the maxima, the limiting law must fall into this one generalized extreme-value family. Figure 5 shows the three classical parameter regimes first in simulated data.

Classical extreme-value ideas also appear in real data, but now one begins with an observed series and extracts maxima from blocks of time rather than from repeated i.i.d. samples generated by a model.

Empirical block maxima

If an observed series \(Y_1,\dots,Y_m\) is divided into blocks \(B_1,\dots,B_r\), define the block maxima by

The k-th block maximum z sub k is the maximum of the observations y j whose indices belong to the block B sub k. \[ Z_k=\max\{Y_j:j\in B_k\}. \]

In the example below, the blocks are calendar years and calendar quarters of one rainfall record.

For an empirical illustration, we can take a long observed series, divide it into blocks of time, compute one maximum from each block, and compare the resulting histogram with a fitted extreme-value density.

Fitted generalized extreme-value law

If \(Z_1,\dots,Z_r\) are the observed block maxima, we fit a generalized extreme-value distribution by estimating a location parameter \(\widehat{\mu}\), a scale parameter \(\widehat{\sigma}>0\), and a shape parameter \(\widehat{\xi}\). On this page the fit is done by maximum likelihood using SciPy's genextreme.fit routine.

The fitted generalized extreme-value distribution has location mu hat, positive scale sigma hat, and shape xi hat. Its cumulative distribution function equals the exponential of minus one plus xi hat times x minus mu hat over sigma hat, raised to minus one over xi hat, whenever that quantity is positive. When xi hat equals zero, the formula becomes the Gumbel distribution. \[ G_{\widehat{\xi},\widehat{\mu},\widehat{\sigma}}(x)= \begin{cases} \exp\!\left(-\left(1+\widehat{\xi}\frac{x-\widehat{\mu}}{\widehat{\sigma}}\right)^{-1/\widehat{\xi}}\right), & 1+\widehat{\xi}\dfrac{x-\widehat{\mu}}{\widehat{\sigma}}>0,\ \widehat{\xi}\ne 0,\\[0.8em] \exp\!\left(-e^{-(x-\widehat{\mu})/\widehat{\sigma}}\right), & \widehat{\xi}=0. \end{cases} \]

The histogram shows the empirical block maxima, and the curve is the density of the fitted generalized extreme-value distribution. The case \(\widehat{\xi}=0\) is the Gumbel family.

Dataset

We will illustrate an empirical extreme-value calculation using one rainfall record.

Daily rainfall

The underlying series is the daily precipitation variable PRCP from NOAA station USW00093819, Indianapolis International Airport, over the date range 1950-01-01 through 2023-12-31. The source data are available directly from NOAA Daily Summaries for Indianapolis rainfall.

Derived maxima

From that daily series we form one maximum for each calendar year and one maximum for each calendar quarter. Figure 6 below compares the resulting histograms with fitted generalized extreme-value densities.

Toward the KPZ universality class

The classical extreme-value theorems describe maxima of independent samples, or of block maxima treated as approximately independent. A natural next question is what happens when the quantity of interest is still extremal, but the competing random quantities are tied together by geometry and strong dependence.

Some models of this kind fall into the KPZ universality class. Last-passage percolation is a basic example: one maximizes over path weights, but overlapping paths create strong correlations.

National Science Foundation logo

This material above is partially based upon work supported by the Simons Foundation Grant MPS-TSM-00012155 and the National Science Foundation under Grant No. DMS-2125961. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or the Simons Foundation.