Methodology17 May 2026 · 6 min read

CPCV vs Walk-Forward: When to Use Each

A practical comparison of combinatorial purged cross-validation and walk-forward analysis, with guidance on sequencing them in a research workflow.

Walk-forward and Combinatorial Purged Cross-Validation (CPCV) both attempt to answer the same question: how would this strategy have performed on data it was not fit to? They produce different answers, and the gap between those answers is where most overfit strategies hide. Choosing between them is a question of what bias you are willing to tolerate and what statistical power you need.

The short version: walk-forward respects causality and matches how you will actually deploy a model. CPCV gives you a distribution of out-of-sample paths instead of a single one, which is what you need to reason about deployment risk. Most serious research workflows use both, sequentially.

What walk-forward actually measures

Walk-forward analysis fits parameters on a rolling or expanding window, then evaluates on the immediately subsequent block. The procedure is honest about temporal ordering — no future data leaks into training. The output is a single concatenated equity curve representing the strategy as it would have been run if you had been disciplined enough to refit on schedule.

That single path is also walk-forward's weakness. You get one realization of the strategy's behavior, drawn from one ordering of history. If the test period happens to contain a regime the training windows never saw, the result tells you about that specific historical sequence, not about the strategy's underlying edge. Standard errors on walk-forward Sharpe ratios are enormous and rarely reported.

A single walk-forward run with a Sharpe of 1.4 and a CPCV distribution with median Sharpe 1.4 and 5th percentile of -0.2 describe the same strategy. The first number is what marketing decks use. The second is what tells you whether to allocate capital.

What CPCV adds

CPCV, introduced by López de Prado, splits the time series into N blocks and forms every combination of k test blocks against the remaining N-k training blocks. Each combination generates a backtest path by stitching together out-of-sample predictions. The result is not one equity curve but a family of them — typically hundreds — sampled from different orderings of training and test data.

Number of backtest paths = C(N, k) × k / N

For N=10 and k=2, that yields 9 paths; for N=12 and k=4, it yields 165. Each path uses the same total amount of out-of-sample data, but the temporal arrangement varies. The variance across paths is a direct estimate of how sensitive your strategy's measured performance is to the particular slice of history you tested on.

Purging and embargoing are essential and non-optional. Purging removes training observations whose labels overlap with the test set; embargoing drops observations immediately after the test set to prevent serial-correlation leakage. Without these, CPCV reports inflated performance and you have built an elegant overfitting machine.

When to use which

Use walk-forward when the production system will itself be walk-forward — periodic refits on expanding or rolling windows — and you want a faithful simulation of that operational reality. Use it when path-dependent metrics matter: maximum drawdown sequencing, time-to-recovery, capital deployment schedules. Walk-forward is also the right tool for sanity-checking the refit cadence itself.

Use CPCV when the question is statistical rather than operational: does this strategy have an edge that survives across many plausible historical orderings? Use it for parameter selection, feature ablation, and any comparison where you need confidence intervals rather than point estimates. CPCV is also the appropriate framework for computing the Deflated Sharpe Ratio, since it gives you the trial distribution that deflation requires.

CPCV paths are not independent. They share training and test blocks, so the variance across paths underestimates the true sampling variance. Treat the CPCV distribution as a lower bound on uncertainty, not a calibrated confidence interval. Bootstrap on top if you need calibrated intervals.

A practical workflow

Use CPCV during research to filter strategies. Reject anything whose 5th-percentile Sharpe across paths is negative, regardless of how attractive the median looks. Reject anything whose path-to-path variance is large relative to the mean — that variance is telling you the strategy's measured edge is an artifact of which years landed in training versus test.

Then run walk-forward on the survivors with the exact refit schedule, transaction cost model, and capital constraints you plan to deploy with. This second pass is not for discovery; it is for verifying that the operational mechanics — refit lag, position sizing, slippage — do not destroy the edge that CPCV established. If walk-forward and CPCV median diverge sharply, the deployment process itself is the problem, not the underlying signal.

In Kestrel Signal, both methods share the same purging and embargo configuration so results are directly comparable. The asymmetry between the two outputs — a single path versus a distribution — is the whole point. One tells you what happened. The other tells you what could have.