Methodology5 min read

Regime Detection and Conditional Performance

Regime detection partitions historical market data into distinct statistical states — typically defined by volatility, trend strength, correlation structure, or macro conditions — and evaluates strategy performance conditional on each state. A strategy with a 1.4 aggregate Sharpe may run at 2.3 in low-volatility trending regimes and -0.4 in high-volatility mean-reverting ones. Without this decomposition, you are averaging over experiments you would not knowingly repeat.

The premise is that financial time series are non-stationary: the data-generating process shifts, and a strategy's edge is rarely uniform across those shifts. Aggregate statistics conflate skill with regime luck, especially when the in-sample window happened to overweight a favorable state.

The computation

Define a regime classifier R(t) that maps each time index to a discrete label k ∈ {1, ..., K}. Common classifiers include rolling realized volatility bucketed by quantile, HMM states fit on returns, trend filters (e.g. sign of a 200-day slope), or exogenous indicators such as VIX terciles or yield-curve sign.

Conditional performance for metric M and regime k is computed over only the timestamps assigned to that regime:

M_k = M( { r(t) : R(t) = k } )

For Sharpe specifically, with daily returns r(t) and trading-day annualization factor 252:

Sharpe_k = mean(r(t) | R(t)=k) / std(r(t) | R(t)=k) × sqrt(252)

The regime weight w_k = N_k / N gives the fraction of sample time spent in state k, and the law of total expectation requires that mean returns reconcile: mean(r) = Σ w_k × mean(r | R=k).

Interpretation

The first diagnostic is dispersion. A strategy whose per-regime Sharpe ranges from 1.1 to 1.6 across four regimes is robust; one ranging from -0.5 to 3.0 is a regime bet wearing a diversified costume. Compute the standard deviation of Sharpe_k weighted by w_k as a quick concentration measure.

The second is sample adequacy per regime. A regime with N_k less than roughly 60 observations produces Sharpe estimates with standard errors above 0.25 — most apparent regime-specific "edges" at small N_k are noise. Demand at least 100–250 observations per regime before treating Sharpe_k as informative, more for skewed return distributions.

The third is regime persistence. If R(t) flips every 2-3 days, your regime variable is capturing noise, not state. Healthy regime classifiers produce average dwell times of weeks to months, with transition matrices whose diagonal entries exceed 0.9 at daily frequency.

A strategy that posts positive Sharpe in every regime, even if some are mediocre, is meaningfully different from one with a high average driven by a single regime. The first survives regime shifts; the second is short an unhedged macro factor.

What it does not capture

Regime detection does not prove causation. A strategy may underperform in high-vol regimes because of the regime itself, or because high-vol regimes happen to coincide with a microstructure change, a policy regime, or a crowding dynamic that is the actual driver. The regime label is a coarse proxy for whatever latent variable matters.

It also does not address regime forecasting. Knowing that a strategy earns 2.1 Sharpe in low-vol regimes is operationally useful only if you can detect the regime in real time without lookahead. HMM smoothed states, in particular, use future information; only filtered (causal) states are honest for live deployment.

Regime boundaries fit to maximize performance separation are a form of data snooping. If you tune the volatility threshold to cleanly split your equity curve, the resulting per-regime Sharpes are inflated. Define regimes from exogenous logic, then measure — not the reverse.

Finally, conditional performance assumes regime membership is the only conditioning variable. Interactions — e.g. low vol AND inverted curve — produce thinner buckets and may reveal further structure that a single-axis decomposition hides. Two-way regime tables are informative when sample size permits.

In Kestrel Signal

Kestrel Signal computes conditional performance across four default regime axes — realized volatility quartiles, trend sign (200-day), correlation regime (rolling 60-day cross-sectional dispersion), and a user-supplied custom classifier — and reports per-regime Sharpe, hit rate, max drawdown, and dwell-weighted contribution to aggregate return. Regimes with N_k below 100 are flagged. Custom regime functions can be passed as any callable returning an integer label aligned to the backtest index, and the conditional report is exportable alongside the standard performance summary.