Deflated Sharpe Ratio
You ran 50 strategy variations. One came out with a Sharpe ratio of 2.4. Should you be excited? Almost certainly not — and the Deflated Sharpe Ratio (DSR) is the mathematical explanation for why.
The multiple-testing problem
When you test many strategies on the same data, some will look good purely by chance. If you flip a fair coin 50 times, the best-performing sequence will look like edge. It isn't. The more variations you try, the higher the bar you need to clear before concluding that any one result reflects real edge rather than data-mining luck.
Classical statistics doesn't account for this. A Sharpe ratio of 1.5 may carry a p-value of 0.02 for a single test — but after 50 tests, the probability that at least one hits 0.02 by chance alone is over 60%. The Deflated Sharpe Ratio addresses this directly.
What the PSR measures
Before DSR, there is the Probabilistic Sharpe Ratio (PSR), introduced by Bailey and Lopez de Prado in 2012. The PSR answers a single question: what is the probability that the true (population) Sharpe ratio is positive, given the observed sample Sharpe, the number of bars observed, and the skewness and kurtosis of returns?
Where SR̂ is the observed Sharpe ratio, T is the number of return observations, γ₃ is the skewness of returns, γ₄ is the excess kurtosis, SR* is the benchmark Sharpe (usually zero), and Φ is the standard normal CDF.
The denominator inflates the standard error when returns are skewed or fat-tailed — which most strategy return distributions are. A strategy with sharp downside events (negative skew, positive kurtosis) needs a much higher observed Sharpe to achieve the same PSR as a strategy with Gaussian-looking returns.
Deflating for the number of trials
The DSR takes PSR one step further. If you ran N strategies and are reporting the best one, you need to adjust the benchmark Sharpe upward to reflect what the best of N random outcomes would produce.
Where N is the number of independent trials, γ is the Euler–Mascheroni constant (~0.5772), Z[·] is the probit function (inverse normal CDF), e is Euler's number, and V[SR] is the variance of the Sharpe estimator.
The DSR is then computed as PSR(E[max SR]) — the probability that the true Sharpe exceeds the expected maximum Sharpe ratio you'd observe by random chance across N trials.
How to interpret DSR
DSR is a probability between 0 and 1. Kestrel Signal assigns four tiers based on DSR value:
Why sample size matters
The Sharpe ratio estimator is noisy. With 100 daily return observations (roughly 5 months), the standard error of the annualised Sharpe estimate is approximately 0.3. That means a strategy with a true Sharpe of 1.0 will produce observations anywhere from 0.4 to 1.6 in normal range. You need at least 252 bars to get a reliable annual estimate — this is why Kestrel Signal flags all backtests with fewer observations.
The same logic applies to trades: with fewer than 30 completed trades, the skewness and kurtosis estimates used in the PSR formula are themselves unreliable, which means the DSR figure is unreliable. Think of it as uncertainty about the uncertainty.
The takeaway
A high Sharpe ratio on backtested data is evidence of data-mining, not edge, unless it survives the DSR adjustment. Kestrel Signal computes DSR on every backtest result so you see the full picture before making any decision. A “Noise” result isn't a failure — it's correct information about a strategy that isn't ready.