Why should average July temperatures and average human heights have fluctuations described by the same limiting law? Why should
annual rainfall maxima and yearly maximum insurance losses exhibit a different kind of limiting behavior? This page explores
those questions through two classical examples of universality: the Gaussian universality class and the extreme-value
universality classes.
The Universality Phenomenon
Many systems in nature are built from a large number of microscopic "degrees of freedom." In probabilistic and statistical models, one often
represents those degrees of freedom by random variables \(X_1,X_2,\dots\), and for each system size \(n\) studies an observable
\[
F_n(X_1,\dots,X_n).
\]
We are interested in the "macroscopic" behavior, so we “zoom out” by sending \(n\to\infty\). Our interest is often in whether we can find parameters \(a_n\) and \(b_n\) so that when we center and scale appropriately,
\[
\frac{F_n(X_1,\dots,X_n)-a_n}{b_n},
\]
we see an interesting limit. If this feels abstract at first, don’t worry. We will turn to concrete examples that illustrate what this all means in a moment.
Universality is the observation that very frequently different microscopic models can exhibit the same large-scale behavior after the
correct normalization. In many examples, the precise constants in these terms depend on the model,
while their orders of growth are shared across a wide class of models and physical phenomena with the same bhavior. This collection is called a "universality class." The shared growth rates are often expressed in terms of "scaling exponents."
The universality phenomenon is ubiquitous and is responsible for much of the power of statistical methods. To illustrate the idea, we now turn to some concrete examples.
The Gaussian universality class
The most familiar universality class is the Gaussian one. It appears when a quantity is built from many random contributions that
are being added together and are roughly independent and similarly distributed, where no single term dominates the rest. In the
classical setting one works with i.i.d. random variables, but exact independence and exact identical distribution are not
absolutely necessary. Here we stay in the finite-variance setting. If the variance is infinite, different stable-law behavior
can appear.
Start with observations \(X_1,\dots,X_n\), where \(X_i\) is the \(i\)-th sample. The basic statistic is the sample mean
The sample mean x bar sub n equals x one plus x two plus through x n, all divided by n.
\[
\overline{X}_n=\frac{X_1+\cdots+X_n}{n}.
\]
We expect \(\overline{X}_n\) to get close to the true mean \(\mu=\mathbb{E}[X_1]\) as \(n\) grows. The more refined question is how
\(\overline{X}_n\) fluctuates around \(\mu\). Using the variance identity for uncorrelated random variables, if
\(\sigma^2=\mathrm{Var}(X_1)\), then
The variance of the sample mean equals sigma squared divided by n.
\[
\mathrm{Var}(\overline{X}_n)=\mathrm{Var}\left(\frac{X_1+\cdots+X_n}{n}\right)=\frac{\sigma^2}{n}.
\]
This tells us the fluctuations of \(\overline{X}_n\) are of order \(n^{-1/2}\). To see a non-trivial limit, it is natural to
standardize:
The normalized sample mean is square root of n times the sample mean minus mu, divided by sigma. This quantity has mean zero
and variance one.
\[
\frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma},
\]
which has mean \(0\) and variance \(1\). In the finite-variance setting, the Gaussian law describes the fluctuations of this
normalized sample mean.
Central limit theorem
If \(X_1,\dots,X_n\) are i.i.d. observations with mean \(\mu=\mathbb{E}[X_1]\) and finite variance
\(\sigma^2=\mathrm{Var}(X_1)\), then
Central limit theorem. For independent identically distributed observations with finite variance, the normalized sample mean
converges in distribution to a standard Gaussian random variable.
\[
\frac{\sqrt{n}(\overline{X}_n-\mu)}{\sigma}\Longrightarrow N(0,1)
\qquad\text{as } n\to\infty.
\]
Here \(\Longrightarrow\) means convergence in distribution, and \(N(0,1)\) denotes the standard Gaussian distribution.
Figure 1. Gaussian density
The standard Gaussian density \(N(0,1)\), commonly known as the "bell curve", which describes the limiting fluctuations in the
central limit theorem.
Convergence in distribution here means that if we repeatedly draw samples of size \(n\) and compute the normalized sample mean
each time, then for large \(n\) the resulting values have a distribution that is close to \(N(0,1)\). Figure 2
illustrates this with simulations: averages of coin flips, uniform random variables, and exponential random variables begin with
very different distributions, but after centering and scaling their fluctuations are described increasingly well by the same
Gaussian law.
Figure 2. Simulated examples of Gaussian universality
The left column shows samples from the original distributions. In each row and for each value of \(n\), we generate 20,000
independent samples of size \(n\), compute the sample mean of each one, and then normalize that mean by subtracting \(\mu\)
and dividing by \(\sigma/\sqrt{n}\). The histograms in the right columns show the 20,000 resulting normalized values, which
become more similar as \(n\) increases.
The same phenomenon appears in practical averages built from observed data. In that setting the population mean and standard
deviation are not known, so we estimate them from the full dataset and then standardize the averages we want to study.
Empirical standardization
If an observed dataset consists of values \(Y_1,\dots,Y_m\), define the empirical mean \(\widehat{\mu}\) and empirical
standard deviation \(\widehat{\sigma}\) by
The empirical mean mu hat is y one plus through y m divided by m. The empirical standard deviation sigma hat is the square root
of one over m minus 1 times the sum of the squared differences between y i and mu hat.
\[
\widehat{\mu}=\frac{Y_1+\cdots+Y_m}{m},
\qquad
\widehat{\sigma}=\left(\frac{1}{m-1}\sum_{i=1}^m (Y_i-\widehat{\mu})^2\right)^{1/2}.
\]
If \(A_n\) is an average built from \(n\) observed values, we standardize it by
The standardized observed average is square root of n times A sub n minus mu hat, divided by sigma hat.
\[
\frac{\sqrt{n}(A_n-\widehat{\mu})}{\widehat{\sigma}}.
\]
In the examples below, \(A_n\) is either the average height in a simple random sample without replacement from the NHANES
cohort, or the average of \(n\) distinct July daily average temperatures sampled without replacement from the pooled
Indianapolis July record across all years. Figure 3 shows the two observed datasets themselves, and Figure 4 shows the
standardized averages obtained from repeated simple random samples.
Datasets
We will illustrate Gaussian universality using two different and unrelated datasets.
NHANES heights
The first empirical distribution uses the standing-height variable BMXHT from the NHANES 2021-2023 Body
Measures file BMX_L. Each observation is a measured height in centimeters. Below, we form many simple random
samples without replacement from this observed cohort and compute their average heights. The source file is available from the
CDC NHANES Body Measures page.
July temperatures
The second empirical distribution uses the daily average temperature variable TAVG from NOAA station
USW00093819, Indianapolis International Airport, over the date range 1950-01-01 through 2023-12-31. To avoid
mixing all seasons together, we restrict to July observations and treat those daily average temperatures as the observed
cohort. Below, we pool the July observations across all years, repeatedly choose \(n\) distinct July days from that pooled
cohort, and average their TAVG values. The sampled days are not required to lie in the same calendar year.
The underlying series can be obtained directly from
NOAA Daily Summaries for Indianapolis temperatures.
Figure 3. Empirical data
These observed datasets do not start out looking alike. One records human heights, and the other records July daily average
temperatures from one airport station over many years.
Figure 4. Observed averages
In both rows, each histogram comes from many simple random samples without replacement from the observed dataset. For the
temperature row, each sample consists of \(n\) distinct July days drawn from the pooled Indianapolis July record across all
years, and the plotted value is the average of their TAVG values. The sample means are centered by \(\widehat{\mu}\), scaled by
\(\widehat{\sigma}/\sqrt{n}\), and compared with the standard Gaussian density.
The classical extreme-value universality classes
The Gaussian universality class is ubiquitous because sums and averages appear throughout statistics, but it is not the only
recurring universal large-scale phenomenon. Another classical example appears when the statistic of interest is the maximum rather than the
average. In the classical setting, one studies maxima of large i.i.d. samples and asks how those maxima behave after the right
centering and scaling.
Start with the maximum
M sub n is the maximum of x one through x n.
\[
M_n=\max\{X_1,\dots,X_n\}.
\]
As in the Gaussian case, the question is not just whether \(M_n\) grows, but how it fluctuates after the right normalization.
Classical extreme-value theorem
If \(X_1,X_2,\dots\) are i.i.d. and there are constants \(a_n>0\) and \(b_n\) such that
If the centered and scaled maximum converges in distribution to a limit with cumulative distribution function G, then that
limit must be a generalized extreme-value distribution.
\[
\frac{M_n-b_n}{a_n}\Longrightarrow G,
\]
where \(G\) denotes the cumulative distribution function of the limiting law. Then \(G\) must be a generalized extreme-value
distribution. In standard notation,
The limiting cumulative distribution function is a generalized extreme-value distribution with location mu, positive scale
sigma, and shape xi. If xi is not zero, the cdf equals the exponential of minus one plus xi times x minus mu over sigma,
raised to minus one over xi, on the region where that quantity is positive. If xi is zero, the cdf is the exponential of
minus the exponential of minus x minus mu over sigma.
\[
G_{\xi,\mu,\sigma}(x)=
\begin{cases}
\exp\!\left(-\left(1+\xi\dfrac{x-\mu}{\sigma}\right)^{-1/\xi}\right),
& \xi\ne 0,\\[0.9em]
\exp\!\left(-e^{-(x-\mu)/\sigma}\right), & \xi=0,
\end{cases}
\]
Here \(\sigma>0\), and in the case \(\xi\ne 0\) the formula is defined on the region where
\(1+\xi(x-\mu)/\sigma>0\).
Here \(\mu\) and \(\sigma\) are the location and scale parameters, while the sign of \(\xi\) distinguishes the three
classical types: \(\xi=0\) gives the Gumbel case, \(\xi>0\) gives the Fréchet-type case, and \(\xi<0\) gives
the Weibull-type case. After affine normalization, one may reduce to the special case \(\mu=0\) and \(\sigma=1\).
The underlying distribution can vary enormously, but after centering and scaling the maxima, the limiting law must fall into
this one generalized extreme-value family. Figure 5 shows the three classical parameter regimes first in simulated data.
Figure 5. Extreme values
Three very different input distributions lead, after normalization, to three parameter regimes of the generalized extreme-value family: Gumbel, Weibull, and Fréchet. The curves shown here are the corresponding limiting densities.
Classical extreme-value ideas also appear in real data, but now one begins with an observed series and extracts maxima from
blocks of time rather than from repeated i.i.d. samples generated by a model.
Empirical block maxima
If an observed series \(Y_1,\dots,Y_m\) is divided into blocks \(B_1,\dots,B_r\), define the block maxima by
The k-th block maximum z sub k is the maximum of the observations y j whose indices belong to the block B sub k.
\[
Z_k=\max\{Y_j:j\in B_k\}.
\]
In the example below, the blocks are calendar years and calendar quarters of one rainfall record.
For an empirical illustration, we can take a long observed series, divide it into blocks of time, compute one maximum from each
block, and compare the resulting histogram with a fitted extreme-value density.
Fitted generalized extreme-value law
If \(Z_1,\dots,Z_r\) are the observed block maxima, we fit a generalized extreme-value distribution by estimating a location
parameter \(\widehat{\mu}\), a scale parameter \(\widehat{\sigma}>0\), and a shape parameter \(\widehat{\xi}\). On this page
the fit is done by maximum likelihood using SciPy's genextreme.fit routine.
The fitted generalized extreme-value distribution has location mu hat, positive scale sigma hat, and shape xi hat. Its
cumulative distribution function equals the exponential of minus one plus xi hat times x minus mu hat over sigma hat, raised
to minus one over xi hat, whenever that quantity is positive. When xi hat equals zero, the formula becomes the Gumbel
distribution.
\[
G_{\widehat{\xi},\widehat{\mu},\widehat{\sigma}}(x)=
\begin{cases}
\exp\!\left(-\left(1+\widehat{\xi}\frac{x-\widehat{\mu}}{\widehat{\sigma}}\right)^{-1/\widehat{\xi}}\right),
& 1+\widehat{\xi}\dfrac{x-\widehat{\mu}}{\widehat{\sigma}}>0,\ \widehat{\xi}\ne 0,\\[0.8em]
\exp\!\left(-e^{-(x-\widehat{\mu})/\widehat{\sigma}}\right),
& \widehat{\xi}=0.
\end{cases}
\]
The histogram shows the empirical block maxima, and the curve is the density of the fitted generalized extreme-value
distribution. The case \(\widehat{\xi}=0\) is the Gumbel family.
Dataset
We will illustrate an empirical extreme-value calculation using one rainfall record.
Daily rainfall
The underlying series is the daily precipitation variable PRCP from NOAA station USW00093819,
Indianapolis International Airport, over the date range 1950-01-01 through 2023-12-31. The source data are available
directly from
NOAA Daily Summaries for Indianapolis rainfall.
Derived maxima
From that daily series we form one maximum for each calendar year and one maximum for each calendar quarter. Figure 6
below compares the resulting histograms with fitted generalized extreme-value densities.
Figure 6. Rainfall maxima
Each panel shows block maxima from the 1950-2023 Indianapolis rainfall record together with a generalized extreme-value
density fitted by maximum likelihood. The annual and quarterly block maxima are fitted separately, and the fitted
\((\xi,\mu,\sigma)\) values are displayed in the panels.
Toward the KPZ universality class
The classical extreme-value theorems describe maxima of independent samples, or of block maxima treated as approximately
independent. A natural next question is what happens when the quantity of interest is still extremal, but the competing random
quantities are tied together by geometry and strong dependence.
Some models of this kind fall into the KPZ universality class. Last-passage percolation is a basic example: one maximizes over
path weights, but overlapping paths create strong correlations.
This material above is partially based upon work supported by the Simons Foundation Grant MPS-TSM-00012155 and the National Science
Foundation under Grant No. DMS-2125961. Any opinions,
findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect
the views of the National Science Foundation or the Simons Foundation.