For educational purposes only. This content does not constitute financial advice or a recommendation to buy or sell any security.
← Blog
Methodology17 May 2026 · 6 min read

Why Optimising for Sharpe Ratio Produces Fragile Strategies

Selecting strategies on Sharpe ratio systematically favours overfit, negatively-skewed configurations whose apparent quality is an artefact of the metric.

Optimising a strategy for Sharpe ratio is one of the most common ways to produce a backtest that looks excellent and a live system that disappoints. The objective function rewards smoothness, penalises any deviation from the mean equally in both directions, and assumes returns are approximately normal. Each of these properties is wrong in ways that systematically favour overfitted, fragile parameterisations during the search process. The result is selection bias toward strategies whose apparent quality is an artefact of the metric itself.

The metric punishes the wrong kind of variance

The Sharpe ratio treats upside volatility identically to downside volatility. A strategy that occasionally produces large positive returns is penalised in the denominator the same as one that occasionally blows up. When you grid-search parameters, the optimiser learns to suppress all variance — including the right-tail variance that generates most of the long-run edge in trend-following, breakout, and convexity-seeking systems.

Sharpe = (E[r] - rf) / sqrt(Var[r])

Because Var[r] symmetrically sums squared deviations, the gradient of Sharpe with respect to parameters pushes the search toward configurations that clip both tails. In practice this means tighter stops, smaller positions during regime transitions, and faster exits on winners. Each of these reduces measured volatility while quietly reducing expectancy.

Why higher-Sharpe configurations are usually overfit

For any reasonable parameter grid, the configuration with the highest in-sample Sharpe is almost always the one that happened to avoid the largest drawdowns of the sample. This is not skill — it is the parameter set that best memorised the location of historical adverse events. The smoother the equity curve looks, the more the parameters have absorbed sample-specific noise.

This effect compounds with the number of parameters searched. The expected maximum Sharpe across N independent configurations grows roughly with sqrt(2 ln N), even when no configuration has any real edge. Selecting on Sharpe therefore guarantees a positive deflation between in-sample and out-of-sample performance, and the deflation is largest for the apparent winner.

If your optimiser is choosing between thousands of parameter combinations and you select the highest Sharpe, you are not selecting a strategy — you are selecting a sampling error. Deflated Sharpe ratio (Bailey & López de Prado, 2014) exists precisely to correct for this, and most practitioners ignore it.

The normality assumption breaks where it matters

Sharpe ratio is only a sufficient statistic for risk-adjusted return when returns are normally distributed. Real strategy returns are not. They exhibit skew, excess kurtosis, autocorrelation in volatility, and regime dependence. A strategy with Sharpe 2.0 and skew of -3 is not the same risk profile as a strategy with Sharpe 1.2 and skew of +1, but the metric reports the first as objectively better.

Negative-skew strategies — short volatility, mean reversion in liquid instruments, carry trades — are systematically advantaged by Sharpe-based selection. They produce the steady drip of small wins that the variance term rewards, while concealing the magnitude of the eventual loss in the left tail that no in-sample window may contain. The 2007 quant crisis, the 2018 XIV unwind, and countless smaller blow-ups share this signature.

A strategy that has never experienced its worst day cannot have that day reflected in its Sharpe. Optimisers reliably find such strategies because they look optimal by construction.

What to optimise instead

The correct objective depends on the strategy family, but the general principle is to use metrics that are either robust to tail behaviour or that explicitly model it. Probabilistic Sharpe ratio adjusts for sample size and higher moments. Deflated Sharpe ratio further corrects for the number of trials. For convexity-seeking strategies, optimising on Calmar (return over maximum drawdown) or on the lower partial moment preserves the upside variance that Sharpe destroys.

PSR(SR*) = Φ( (SR - SR*) · sqrt(n-1) / sqrt(1 - γ3·SR + (γ4-1)/4 · SR²) )

The expression above incorporates skewness γ3 and kurtosis γ4 directly, so a strategy cannot earn a high score by hiding fat tails. When the optimiser is forced to account for the shape of the distribution it produces, the selected parameterisations tend to look less impressive on a tearsheet and degrade less out of sample. This is the trade you want.

Practical discipline

Treat Sharpe as a descriptive statistic, not an objective function. Use it to summarise a strategy after selection, not to drive selection itself. When running parameter sweeps in Kestrel Signal, define the objective in terms of out-of-sample robustness: median performance across walk-forward folds, deflated Sharpe across the search space, or a multi-metric Pareto frontier that exposes the tradeoffs between smoothness and tail behaviour explicitly.

The strategies that survive contact with live markets are rarely the ones with the highest backtested Sharpe. They are the ones whose selection process did not actively reward the appearance of safety. Optimising for the right thing is harder, produces worse-looking equity curves in research, and is the difference between a system that compounds and one that quietly bleeds the edge that the metric pretended was there.

More in Methodology
Why most backtests overstate edge — and what to do about it9 min readWhat Backtesting Actually Measures and What It Does Not6 min readWhy You Need More Data Than You Think6 min read
← All postsTry it on a real backtest