Metrics5 min read

The Information Ratio

The information ratio (IR) measures the excess return of a strategy relative to a benchmark, normalized by the volatility of that excess return. It quantifies the consistency with which a manager or system generates active return — the alpha per unit of tracking error — and is the natural metric when the question is not "did this make money?" but "did this beat the benchmark reliably?"

Unlike the Sharpe ratio, which compares returns to a risk-free rate, the IR compares returns to an arbitrary benchmark: an index, a factor portfolio, a peer strategy, or a passive baseline. This makes it the standard yardstick for active management evaluation.

Formula

IR = E[Rp - Rb] / σ(Rp - Rb)

Where Rp is the strategy return series, Rb is the benchmark return series, and the denominator — tracking error — is the standard deviation of the period-by-period active return. The numerator is typically annualized by multiplying by the number of periods per year (252 for daily data); the denominator scales by the square root of that same factor. The ratio itself is dimensionless.

Tracking error and active return must be computed on the same return frequency. Mixing daily strategy returns with monthly benchmark returns produces a number, but not the IR.

Interpretation and Thresholds

Grinold and Kahn's widely cited benchmarks for institutional active managers: an IR of 0.50 is good, 0.75 is very good, and 1.00 is exceptional. These thresholds assume long-only equity managers with multi-year track records and were calibrated against survivor-biased industry data — treat them as an upper anchor, not a goal line.

For systematic strategies evaluated in backtest, an IR below 0.30 typically indicates the active return is statistically indistinguishable from noise over realistic sample sizes. An IR between 0.30 and 0.60 represents a credible edge if it survives out-of-sample testing and reasonable cost assumptions. An IR above 1.0 in a backtest, especially over short windows or with leverage, is a red flag for overfitting, look-ahead bias, or benchmark mismatch before it is evidence of skill.

The relationship to statistical significance is direct: the t-statistic of the active return equals IR × sqrt(T), where T is the number of years in the sample. An IR of 0.5 over 4 years yields a t-stat of 1.0 — not significant. The same IR over 16 years yields a t-stat of 2.0 — marginally significant. Short backtests cannot produce reliable IR estimates regardless of how high the point estimate looks.

What the IR Does Not Capture

The IR is a second-moment statistic. It assumes active returns are well-described by their mean and standard deviation, which fails when active returns exhibit skewness, fat tails, or regime dependence. A strategy that quietly outperforms for years and then suffers a single catastrophic drawdown relative to the benchmark may post an attractive IR right up to the failure.

It also says nothing about the benchmark's appropriateness. A long-only equity strategy benchmarked against cash will show an inflated IR that reflects equity premium, not skill. A market-neutral strategy benchmarked against the S&P 500 will show a meaningless IR dominated by the absence of beta exposure. Benchmark selection is a modeling choice, not a measurement detail.

The IR does not adjust for transaction costs, capacity constraints, or factor exposures. Two strategies with identical IRs may have entirely different exposure profiles — one driven by stable alpha, the other by an unhedged value or momentum tilt that the benchmark fails to capture. Decomposing active return into factor-explained and residual components is necessary before attributing an IR to skill.

Maximizing IR in-sample is equivalent to minimizing tracking error subject to an excess return target. This objective is highly sensitive to estimation error in the covariance structure and routinely produces strategies that look superb in backtest and fail immediately out-of-sample.

In Kestrel Signal

Kestrel Signal reports the IR alongside its t-statistic and a bootstrap confidence interval, computed against a user-specified benchmark series at the same frequency as the strategy returns. The default benchmark is configurable per portfolio; the platform refuses to compute IR when the benchmark and strategy return series have fewer than 60 overlapping observations, since the resulting estimate is too noisy to be useful.