052 - Martyn Tinsley - Beyond the BackTest
First the textbook, then add experience
“Even if you have skill, you can look wrong for a very long time.” – Cliff Asness
A backtest (or even many of them) can tell you whether a strategy survived a historical test. It cannot tell you whether you were testing the right idea, in the right way, for the right purpose.
That gap matters.
There are plenty of methodologies for minimising over-fitting and increasing confidence that an idea might survive the real world: statistical tests, stress tests, multi-market testing, walk-forward analysis and so on. It all looks scientific. Sometimes it is. Sometimes, without experience, we confuse statistical validity with real-world operability and out-of-sample viability.
What I keep landing on is this: robust strategy development demands two complementary forms of competence, and only one of them is obvious at the beginning.
But first…
Algo Collective members get a private walk-through of Martyn Tinsley’s 14-step process for strategy development, and subscribers can get a substantial discount on his software at algoadvantage.io/toolbox.
The first competence is what I’d call the textbook side. Know your tools. Measure what matters. Define objectives. Run the optimisations, the noise tests, the correlation tests, skip trades, shuffle trades and so on. A certain class of avoidable mistakes can and must be caught by a proper research process with gates and decision points.
The second competence is what experience teaches, usually after a few expensive lessons. Which tests suit which strategy type? What is the underlying logic? What question is the model really answering? What role is the strategy meant to play? Those questions stay qualitative and they do not fit neatly into a strict checklist. Over time you also stop thinking in terms of single systems and start thinking at portfolio level, with risk treated as part of the architecture rather than a clean-up exercise.
The textbook gives structure. Experience gives judgement.
You need both.
A trader who has only the textbook can still produce elegant nonsense. A trader who has only intuition can produce persuasive nonsense. Robust strategies live in the overlap.
In episode 52 we are treated to the first of a two-part presentation by Martyn Tinsley where he talks about his secrets to building robust models and it culminates in part 2 with his innovative new methodology: walk forward correlation analysis. Stay tuned for that!
Part I: The Textbook Foundations
Robust research starts before the model
Most weak strategy research does not fail because the maths was too simple or the signal too subtle. It fails earlier, in the data, where the failure is less glamorous and therefore easier to ignore.
Traders often treat data work as plumbing. Necessary, boring, faintly beneath them. They want to get to the interesting bit: testing ideas and looking at results. I understand the impulse. But that is exactly how false confidence gets built. Bad inputs do not merely produce bad outputs. They produce plausible outputs, and that is far more dangerous.
Survivorship bias is the obvious example. If dead instruments vanish from the sample, the strategy often looks sturdier than it really was. Look-ahead bias does similar damage with a cleaner shirt on. Poor corporate-action adjustments, sloppy futures roll logic, missing data handled without thought, regime gaps hidden inside a convenient sample: each can turn research into a confidence-generation machine.
This is not clerical work. This is model risk in its earliest form.
A strategy cannot be more robust than the information it is built on. If the data is lying to you, the backtest will simply lie more elegantly. Serious strategy work starts with distrust. Where did this data come from? What is missing? What was adjusted? What was filtered?
The honest process starts by taking the core alpha and testing it as raw as possible, across as many relevant instruments as possible, before filters, regime adjustments or optimisation. If something has merit in a very raw state, your odds of overfitting are lower. This is observation and data collection, not curve worship.
Simplicity, restraint, and justification
If the core idea looks promising, the next trap is complexity.
Students are often seduced by the belief that more features, more filters and more nuanced logic must produce a smarter model. Sometimes they do. Much more often they produce a more flattering description of the past.
Model selection should be governed by restraint. Begin with a narrow idea. Express it as simply as possible. Add complexity only when it solves a clearly identified problem, not when it merely improves the equity curve. A parameter should not exist because it can exist. It should exist because the model is materially worse without it, and because there is a defensible reason for its presence.
Complexity often enters dressed as intelligence. That is part of its charm. It lets the researcher feel sophisticated while quietly increasing the number of ways the model can overfit history. A system with too many moving parts becomes harder to interpret, harder to stress-test and much harder to trust when live behaviour diverges from the rehearsed script.
Simple models have an underrated virtue: they fail transparently. When something breaks, you have a fighting chance of understanding why. With highly ornamented systems, all you often know is that the magic stopped.
The better place for complexity is often at portfolio level. Add models. Add capital allocation logic. Add risk management that combines good ideas cleanly rather than hiding fragility inside one over-designed machine.
The goal is not to discover the model that explains history most beautifully. The goal is to select the model with the best chance of remaining useful when history stops being cooperative.
Push the strategy until it shows its weaknesses
A decent backtest should not make you relax. It should make you suspicious.
Once a strategy looks promising, the proper response is interrogation. Does the effect survive out of sample? Does it hold up under walk-forward analysis? Does performance collapse when parameters move a little? Does it degrade gracefully when costs or slippage worsen? Does the edge persist across regimes, instruments and sensible slices of history? Are you even working with a valid sample size?
This is where robustness diagnostics earn their keep.
A beginner asks, “Does it work?” A better question is: how does it fail, and what does that failure reveal about the underlying idea? That question changes the quality of the entire research process.
A robust strategy is not one that looks smooth under a single simulation. It is one that continues to show some coherence when pushed around. Perturb the sequence. Disturb the assumptions. Worsen the fills. Shift the sample. Vary the parameters. Examine the weak spots rather than hiding them behind summary statistics.
A strategy must survive contact with reality
Even a well-tested strategy can still fail the oldest test in trading: reality.
Transaction costs are real. Slippage is real. Liquidity constraints are real. Execution frictions are real. Position limits are real. Monitoring burden is real. Platform constraints are real. Capacity is real. The market does not care that your backtest had generous fill assumptions and a charming ignorance of operational mess.
Newer traders often treat implementation as the final stage. That ordering is understandable, but it is wrong. Implementation realism is not a final polish. It is part of the model from the beginning because it determines whether the idea is viable at all.
A short-horizon strategy that requires perfect fills is not the same strategy once realistic slippage is introduced. A high-turnover effect that disappears after fees was never robust. A strategy that needs constant manual intervention may be psychologically or operationally untradeable even if the statistics survive.
By the end of this first half, the point should be clear: define a process, use proper tools, understand them well, and test harshly. That matters. It is also not enough.
Why the BackTest is not enough
Testing can tell you a great deal about the historical behaviour of a constructed model. It cannot decide whether the model deserved to be constructed in the first place.
That is a different question.
A backtest cannot tell you whether the underlying idea rests on credible logic, whether the strategy has a clear design objective, whether the model suits the kind of edge you claim to be capturing, or whether it adds anything useful to a portfolio rather than duplicating risks you already own.
Testing matters. Of course it does.
But what you choose to test, how you frame it, and what you are trying to build all matter first.
This is where strategy development stops being a software exercise and starts becoming a craft. Starting simple, building judgement and earning complexity over time will save you a lot of time and money.
Part II: Designing Beyond the BackTest
Every robust strategy needs a reason to exist
The best strategies do not begin with a pattern. They begin with a reason.
That reason might be behavioural, structural, institutional, microstructural or risk-based. I do not need a grand theory polished to academic shine. Markets are too untidy for that. But I do want coherent logic beneath the signal. Without it, the strategy is hard to trust, hard to diagnose and hard to defend when it inevitably enters a rough patch.
A pattern is not yet an edge.
It becomes an edge when there is some reason it should exist and some reason it might persist.
I’m not against discovering patterns in data. Sometimes you find the anomaly first and the reason later. But logic narrows the search space. It stops the researcher wandering through an enormous parameter garden hoping to discover something statistically flattering. It also gives you a basis for judgement later. If live behaviour deteriorates, a logic-led strategy lets you ask whether the source of edge has weakened, whether implementation changed, whether structure shifted, or whether you are simply living through a normal adverse period.
Traders often underestimate how much stability comes from explanation. Not perfect explanation. Plausible explanation.
Why should this edge exist at all? Why might it continue to exist despite being observed?
If you cannot answer those questions even roughly, you may still have an interesting backtest. You do not yet have much of a strategy.
Define the goal before you build
A robust strategy should have a job description before it has performance statistics.
A surprising amount of retail research works in reverse. The researcher stumbles onto something that backtests well and only afterwards invents a story about what the system is supposedly for. That is not design. That is rationalisation with charts.
Good strategy development is closer to engineering than treasure hunting.
What is this strategy meant to do? Capture trend? Harvest mean reversion? Diversify equity beta? Improve the return path of a broader portfolio? The answer matters because holding period, turnover tolerance, acceptable drawdown, implementation constraints and even the right metrics depend on the job.
Design intent imposes discipline. It prevents you admiring a lovely backtest that solves the wrong problem. It helps you reject models that are statistically attractive but strategically misaligned.
The strategy should know its job before it applies for promotion.
Structure must match the edge
Different edges require different architecture.
A mean reversion model should not be built, risk-managed or evaluated like a trend-following system. An intraday strategy should not inherit assumptions better suited to a weekly positional model. A short-horizon effect that depends on execution precision will demand a very different tolerance for slippage and delay from a slower-moving macro signal.
Robustness is not generic. It is contextual.
Students often borrow structures without noticing. They copy entry logic from one family, exits from another and risk overlays from a third, then wonder why the system feels unstable or incoherent. The issue is not that one component is absurd. It is that the structure no longer matches the nature of the edge.
A strategy can be well tested and still be badly built for what it is.
Model type should shape design choices from the beginning: how signals become trades, how risk is expressed, how much execution precision matters, how performance should be judged and what kind of adverse behaviour is natural rather than pathological.
A strategy’s role matters more than its vanity metrics
Newer traders usually evaluate strategies in isolation. Practitioners rarely do.
The more useful question is not whether a system looks good on its own. It is what it adds next to everything else I already run. Does it diversify the portfolio? Offset drawdowns elsewhere? Behave differently in stress periods? Improve the path, not just the endpoint? Contribute returns from a distinct source of edge? Or is it merely another attractive expression of risk I already own in abundance?
This is where portfolio-level thinking changes the game.
A mediocre standalone strategy can be an excellent portfolio component if its correlation structure is valuable enough. A beautiful standalone strategy can be close to useless if it is just a polished duplicate of exposures already present elsewhere.
A beginner asks, “Is this a good strategy?”
A practitioner asks, “What does this add?”
That shift in framing marks a real step up in maturity. The market does not pay you for owning handsome backtests. It pays you for constructing resilient portfolios.
Survivability is part of the edge
Risk is not something I sprinkle on at the end like compliance seasoning.
It is part of the design from the start.
Too many strategies are built as though the edge arrives first and the risk model shows up later with a mop and bucket. That is backwards. Drawdown shape, loss clustering, tail exposure, liquidity stress, correlation spikes, concentration, gap risk, position sizing and behavioural tolerability are not housekeeping details. They define whether the strategy can survive long enough for expectancy to matter.
I want to know what can hurt the model most. I want to know whether losses arrive gradually or violently, independently or all at once. I want position sizing to reflect the nature of the edge, not the researcher’s optimism.
And I want the strategy to be tradeable by the human being expected to run it. A strategy that cannot be lived with is not robust in any meaningful sense. The trader abandons it, overrides it or distrusts it at exactly the wrong time.
Survivability is part of the edge because only survivable strategies get the chance to realise their edge at all.
Bringing the two together
The textbook teaches me how to test with discipline. Experience teaches me how to design with intelligence.
One without the other is not enough.
Testing without design judgement produces elegant but brittle systems: technically polished, strategically hollow and often doomed the moment market reality stops cooperating. Design instinct without testing discipline produces persuasive stories and beautifully delivered nonsense.
Robust strategies live in the overlap.
They begin with logic, purpose and clear goals. Their structure matches the kind of edge they are trying to express. Their risk is part of their architecture. Their value is judged at portfolio level, not by vanity metrics alone. Then, and only then, are they tested harshly, implemented realistically and reviewed through a process mature enough to learn without thrashing.
The textbook helps me avoid obvious analytical errors.
Experience helps me avoid building the wrong thing expertly.
That is the real meaning of going beyond the backtest.
A practical development framework for traders
Start by defining the market logic. What behaviour, structure or persistent feature are you trying to capture? Then define the strategy’s purpose. What job is it meant to do in the portfolio? What research backs the idea? Where do the problem periods arise, and why?
Only then should the formal work begin. Validate the data. Build the simplest model that honestly expresses the idea. Test the raw edge across every trade and as many relevant markets as possible. Stress-test it until the weak spots become visible. Evaluate what it contributes at portfolio level. Apply realistic implementation assumptions before becoming emotionally attached to the equity curve. Then, after deployment, keep researching. Live trading is not the end of development. It is where the next round of learning begins.
That workflow is not glamorous.
It is far better than glamorous.
It is usable.
The Conclusion
Going beyond the backtest does not mean abandoning rigour. It means putting rigour in its proper place.
Backtests matter. Diagnostics matter. Data integrity matters. Implementation realism matters. None of that changes. But a robust trading strategy is not merely one that performs well in historical tests. It is one that is reasoned well, designed with intent, matched to its model type, understood in portfolio context, shaped by real risk thinking and developed through a research process mature enough to survive contact with both markets and human behaviour.
Beyond the backtest is where strategy development stops being software and starts becoming craft.
Commenting, asking questions and subscribing here and on YouTube all really help keep this thing alive! Thanks!
Simon
Get in Touch with Martyn

