Statistics5 min read

Cointegration and Pairs Trading Statistics

Cointegration tests whether a linear combination of two or more non-stationary price series is itself stationary, producing a mean-reverting spread suitable for relative-value trading. Unlike correlation, which measures co-movement of returns, cointegration measures whether prices share a long-run equilibrium — the statistical foundation of every classical pairs trading strategy.

The Engle-Granger Two-Step Procedure

The standard test regresses one price series on another, then tests the residuals for stationarity using an Augmented Dickey-Fuller (ADF) test. The residual series is the candidate spread.

Y_t = α + β · X_t + ε_t

spread_t = Y_t − β · X_t − α

The hedge ratio β is the OLS slope. The ADF test then evaluates whether the spread reverts to its mean by fitting:

Δspread_t = γ · spread_(t−1) + Σ φ_i · Δspread_(t−i) + u_t

Under the null hypothesis γ = 0 (the spread has a unit root and is non-stationary). Rejecting the null at a chosen significance level provides evidence of cointegration. The Johansen test generalizes this to multivariate systems and avoids the asymmetry of choosing which asset to regress on which.

Interpreting the Test Statistics

The ADF test produces a t-statistic compared against critical values that depend on sample size and whether a constant or trend is included. For a typical pairs trading window of 252 daily observations, critical values are approximately −3.43 (1%), −2.86 (5%), and −2.57 (10%). More negative statistics indicate stronger evidence of stationarity.

A p-value below 0.05 is the conventional threshold for declaring a pair cointegrated. Practitioners often demand p < 0.01 because pairs trading involves multiple-testing bias — scanning hundreds of pairs guarantees spurious rejections at the 5% level. The half-life of mean reversion, computed from the AR(1) coefficient of the spread, should typically fall between 1 and 30 trading days for the relationship to be tradable after costs.

half_life = −ln(2) / ln(1 + γ)

A cointegrated pair with a 60-day half-life is statistically valid but economically marginal: the spread reverts too slowly to outpace financing costs and structural drift in the hedge ratio.

What Cointegration Does Not Capture

Cointegration is a statistical property of historical prices, not an economic guarantee. Two series can test as cointegrated purely by chance, especially when scanning large universes — this is the multiple comparisons problem, and it is severe. A universe of 500 assets produces roughly 125,000 pairs; at p < 0.05 you expect 6,250 false positives before considering any genuine relationship.

The hedge ratio β is estimated, not known. It drifts over time as fundamentals change, corporate actions occur, and market regimes shift. A pair cointegrated in-sample frequently fails out-of-sample because the equilibrium relationship was either spurious or has decayed.

Cointegration says nothing about the magnitude or timing of reversion, only that reversion is statistically present. A spread can widen for months before reverting, and position sizing based on z-score alone routinely produces drawdowns that exceed typical strategy capital allocations.

The test also assumes linearity and constant parameters. Real spreads exhibit regime changes, volatility clustering, and asymmetric reversion — none of which are captured by Engle-Granger or Johansen. Cointegration is necessary for classical pairs trading; it is not sufficient.

Use in Kestrel Signal

Kestrel Signal reports cointegration diagnostics for every candidate pair in the research workspace: ADF statistic, p-value, estimated β, spread half-life, and Hurst exponent of the residual series. Pairs are flagged when the half-life exceeds the configured backtest horizon or when β instability — measured by rolling-window standard deviation of the hedge ratio — crosses a user-defined threshold.

Backtests on cointegrated pairs include walk-forward re-estimation of β by default, so reported equity curves reflect the realistic cost of hedge ratio drift rather than a single in-sample fit. Multiple-testing correction via Bonferroni or Benjamini-Hochberg is available when scanning universes, with the corrected p-value displayed alongside the raw statistic.