In one way or another, trading is mainly about predicting the future from the past and the main question is to know how likely our bet is to be successful. To try to answer this question without waiting for the future to happen, all we can do is to look at historical data behavior and try to make an educated guess.

We, as humans, are all affected by so many cognitive biases that it shouldn’t really be surprising if our judgement betrays us, even when all the the information to make a relatively accurate forecast is there (The Signal and the Noise by Silver presents numerous examples and an in-depth discussion of such cases).

Within the framework of bringing a backtested strategy to real trading, the way we try to overcome these biases is by doing an inference that a certain system that, under certain conditions, had a positive expectation in the past, will keep having a positive expectation in the immediate future.

Of course the choice of these **conditions **is key to the success of our inference.

Generally speaking, these conditions involve the **number of parameters** in our system, the **number of observations (trades)** we have, the **number of systems** we tested and the **degree of confidence **we require.

I won’t go into details here, I would advise interested readers to have a look in Aronson’s Evidence-Based Technical Analysis book or in the two summary posts from Jez Liberty’s blog (part I and part II) or in the now closed blog PuppetMasterTrading for a non-technical discussion. What this post is really about is not so much describing exactly how to get an answer to overfitting, but rather trying to understand what the question we are asking really is and what shape we can expect the answer to have.

What are we really looking for?

Financial markets are chaotic systems born from the interaction of millions of agents. Nobody can certainly know how another person will exactly act in the future (gosh, we don’t even know how WE will act in the future!), which clearly precludes the possibility of knowing what millions of agents (being them algos or humans) will do.

What this means is that while trying to exactly forecast market’s behavior is impossible (as discussed in this previous post), it might be possible to try catch some anomalies only provided that we are flexible enough to allow the market for some freedom to move: anomalies or recurrences may well keep happening, but they will rarely happen in *exactly* the same shape every time.

I am of course referring to the degrees of freedom of our system: generally speaking, we want to leave our system “many” free degrees of freedom, to make it robust to changes in market behaviours.

This is exactly the step in which I think it’s very easy to fail when backtesting a strategy. **One may try to find what`s the best system that worked in the past, which for actual trading can be of little or no use:** the more accurately we fit the past, the more likely we are to have something with little or no predictive power (i.e. to be overfitting).**
**Rather, what we really want is to extract a signal from the past noise, and particularly a signal stable enough to reasonably expect that it will continue to exist in the (near) future. The concept is less obvious than it seems and fully understanding it is fundamental for having some chance of success in quantitative trading.

Interestingly, the very same concepts apply to pretty much any field involving some kind of forecasting, from sports betting to weather forecast. What makes overfitting trickier to overcome in financial markets is that we are betting on the results of human interactions, in contrast with sportsmen skills or natural phenomena.

How confident can we be we are not overfitting?

Although we can make use of a number of tests to get a certain degree of confidence on whether we are overfitting (see links above) , my opinion is that relying on these statistical tests alone is not enough.

Overfitting can happen even when our strategy passes all the tests, as some aspects can hardly be quantitatively accounted for:

- are we using too few data points/too many parameters?

(can be accounted for in the statistical tests)

- did we test many different strategies on the same data?

(can be accounted for in the statistical tests)

- did we test the same strategy on many markets before finding one in which it “works”?

(can be accounted for in the statistical tests)

- do we have
*any*knowledge on the data (or even related data) we are testing our strategy on?

(difficult to quantify)

- do we have
*any*insight on the behaviour of any endogenous or exogenous variables we are using?

(difficult to quantify)

Because of this, much attention has to be put in all the stages of a strategy development, not only on some final statistical tests, and applying some old-school common sense becomes very important. **Totally avoiding overfitting is almost impossible, all we can do is trying to reduce it as much as possible.**

I wrote enough for this post, I will leave for the next one a couple of specific points that I`d like to remark….stay tuned.

P.P.S (Personal Post Scriptum):

I have just resigned from my job to fully dedicate myself to algorithmic trading and strategy development in the short term, and have my own business on the medium-longer term. This also means that I should have more time to dedicate to this blog.

Part of the merit (blame? :D) goes to an inspiring post by Michael Bigger, which I feel I have to acknowledge: Starting Over.

Liberally quoting someone (whom I apologise to for not remembering the name): in life it’s surely important to try and to do mistakes, as from mistakes you learn. However, you don’t have enough time to make all the mistakes yourself…so learn from other people’s mistakes and be open to other people’s advices.

Wish me luck!

Andrea

cool enough! good luck

Thanks Cean!

Pingback: Features selection in trading algorithms | Math Trading