Underfitting, misfitting and understanding alpha’s drivers

While overfitting is certainly a challenge, falling for the opposite extreme is also a possibility.
Reporting part of an interview of William Echkardt from Futures magazine (which I would recommend to read in full from here):

“I can talk a little more about over-fitting, if not my personal proprietary techniques. First of all I like the [term] over-fitting rather than curve-fitting because curve-fitting is a term from non-linear regression analysis. It is where you have a lot of data and you are fitting the data points to some curve. Well, you are not doing that with futures. Technically there is no curve-fitting here; the term does not apply. But what you can do is you can over-fit. The reason I like the term over-fit rather than curve-fit is that over-fit shows that you also can under-fit. The people who do not optimize are under-fitting.”


Underfitting and Misfitting

If we are using an insufficient number of degrees of freedom, so that our system doesn’t differentiate between some key changes in market’s behaviour, then what we are doing is underfitting. A trivial example of underfitting could be buying a random stock from the stock universe at a random point in time and holding it for a random time period.

Another possibility is that we are not using the
right variables (or we have the right variables but we are using them in a poor way) – let’s call this misfitting. Imagine a model on Italian BTPs that looks at Crude Oil prices and totally ignores the spread with German bonds (now, there could even be some exploitable relationship between BTPs and Crude Oil, just trying to make a point).

Clearly, what makes a variable “right” for a given model and a given asset is highly arguable.
Similarly to what said for overfitting, I don’t think we can easily tell in absolute terms whether a model is flawed with underfitting or misfitting (except for very obvious cases). Rather, I like to reason in terms of the possible existence of a better model’s specification that we are ignoring, e.g. there could be a key factor that our model is particularly sensitive to and that we are not accounting for (either in terms of the specific asset we apply the model to or in terms of market’s current dynamics).  Or it could be the case that we are using some variables that are only linked to the real factor, but are not the actual alpha driver.

Techniques to perform this kind of analysis include PCA and factor analysis, but according to what one exactly does many other quantitative techniques can be applied (at a portfolio level, something like market clustering  presented from David Varadi seems promising).
Of course (and unfortunately), we have to keep in mind that the more we operate this kind of a posteriori analysis, the more likely we are to go one extreme (underfitting/misfitting) to the other (overfitting).


Fat tails and changing market dynamics

In another part of the interview mentioned above, Mr Echkardt strictly relates the number of degrees of freedom to the number of trades in our backtest, arguing that one needs more trades than expected in a “Gaussian world” because of fat tails of markets’ returns. While I agree with the qualitative relationship between degrees of freedom and number of trades, I am not sure I agree with the strict quantitative relationship between the two variables.
The reason for this is twofold:

1) It’s not always possible to exactly quantify the actual number of degrees of  freedom being used or how much hindsight we are pouring into our modelling (as discussed in my previous post);

2) I think fat-tails is only part of the story. Another big part is the continuous changes that markets go trough (under the shape of heteroskedasticity but not only).

Imagine you test a model over 2 years of data, and that because the model is a relatively high-frequency model (and so produces a very high number of trades) you think you are guarding your self from overfitting. What you might be ignoring is that having tested the model over a relatively short time window, you could have not tested it against different market conditions. It might well be that 2.5 years ago markets were somewhat different and your model was useless, which implies that as soon as markets change again you will lose your edge.  An example could be a model that unknowingly takes advantage of some market behaviour born from the Fed being on hold over such a long time period.
This is another form of overfitting if you want, but one which can’t be accounted for by simply looking at the number of trades vs the number of model’s parameters.

Because of this, I’d always like to test any new strategy on as much historical data as possible. In regards to this, I am in partial disagreement with Dr Chan, who states that he seldom tests strategies with data older than 2007 (read more here: The Pseudo-science of Hypothesis Testing). All other things being equal, I find a strategy that worked well for a long time to be more likely to work in the near future than a strategy who worked well over a short history (which is not to mean that something that started working only recently can’t keep working). Also, even if you have something that started working only recently, having a look at how it behaved when it didn’t really perform can certainly offer some interesting insights – especially if you are not sure on what the driver behind your alpha really is.


Alpha’s drivers

This leads me to the final point before concluding this long post: do we really have to understand what our model is doing and what kind of inefficiency we are exploiting?

Personally, I think that understanding the underlying driver of our alpha is certainly a big plus, as it lets you directly monitor the behaviour of the prime driver, which in turn could give you some practical insights in troubled times. However, this is not always enough – think of the quant funds during the 07-08 meltdown: they were fully aware of the driver behind their equities stat arb strategies, but they still got trapped in orders flows and forced-liquidations. Another example could well be the blow-up of LTCM.

Moral of the story is that there could always be an additional layer of complexity not being considered, so that (partly) understanding our alpha’s driver might not offer any additional upside.
Therefore, although nice I don’t deem it necessary to understand the real driver behind our alpha – provided that our statistical analysis gives us enough confidence to trade our strategy.

Schwager’s Market Wizards series presents supporter of both sides, under the names of D.E. Shaw and Jaffray Woodriff. You can read more about their views in William Hua’s post in Adaptive Trader: Ensemble Methods with Jaffray Woodriff, or have a look at this QUSMA’s post for a more in-depth example of Woodriff’s approach: Doing the Jaffray Woodriff Thing (Kinda)

Andrea

Posted in On backtesting, Trading Strategies Design | Tagged , , , , , | 1 Comment

Overfitting, forecasting and trading

In one way or another, trading is mainly about predicting the future from the past and the main question is to know how likely our bet is to be successful.  To try to answer this question without waiting for the future to happen, all we can do is to look at historical data behavior and try to make an educated guess.

We, as humans, are all affected by so many cognitive biases that it shouldn’t really be surprising if our judgement betrays us, even when all the the information to make a relatively accurate forecast is there (The Signal and the Noise by Silver presents numerous examples and an in-depth discussion of such cases).
Within the framework of bringing a backtested strategy to real trading, the way we try to overcome these biases is by doing an inference that a certain system that, under certain conditions, had a positive expectation in the past,  will keep having a positive expectation in the immediate future.

Of course the choice of these conditions is key to the success of our inference.
Generally speaking, these conditions involve the number of parameters in our system, the number of observations (trades) we have, the number of systems we tested and the degree of confidence we require.

I won’t go into details here, I would advise interested readers to have a look in Aronson’s Evidence-Based Technical Analysis book or in the two summary posts from Jez Liberty’s blog (part I and part II) or in the now closed blog PuppetMasterTrading for a non-technical discussion. What this post is really about is not so much describing exactly how to get an answer to overfitting, but rather trying to understand what the question we are asking really is and what shape we can expect the answer to have.


What are we really looking for?

Financial markets are chaotic systems born from the interaction of millions of agents. Nobody can certainly know how another person will exactly act in the future (gosh, we don’t even know how WE will act in the future!), which clearly precludes the possibility of knowing what millions of agents (being them algos or humans) will do.

What this means is that while trying to exactly forecast market’s behavior is impossible (as discussed in this previous post), it might be possible  to try catch some anomalies only provided that we are flexible enough to allow the market for some freedom to move: anomalies or recurrences may well keep happening, but they will rarely happen in exactly the same shape every time.

I am of course referring to the degrees of freedom of our system: generally speaking, we want to leave our system “many” free  degrees of freedom, to make it robust to changes in market behaviours.

This is exactly the step in which I think it’s very easy to fail when backtesting a strategy. One may try to find what`s the best system that worked in the past, which for actual trading can be of little or no use: the more accurately we fit the past, the more likely we are to have something with little or no predictive power (i.e. to be overfitting).
Rather, what we really want is to extract a signal from the past noise, and particularly a signal stable enough to reasonably expect that it will continue to exist in the (near) future. The concept is less obvious than it seems and fully understanding it is fundamental for having some chance of success in quantitative trading.

Interestingly, the very same concepts apply to pretty much any field involving some kind of forecasting, from sports betting to weather forecast. What makes overfitting trickier to overcome in financial markets is that we are betting on the results of human interactions, in contrast with sportsmen skills or natural phenomena.


How confident can we be we are not overfitting?

Although we can make use of a number of tests to get a certain degree of confidence on whether we are overfitting (see links above) , my opinion is that relying on these statistical tests alone is not enough.
Overfitting can happen even when our strategy passes all the tests, as some aspects can hardly be quantitatively accounted for:

  • are we using too few data points/too many parameters?
    (can be accounted for in the statistical tests)
  • did we test many different strategies on the same data?
    (can be accounted for in the statistical tests)
  • did we test the same strategy on many markets before finding one in which it “works”?
    (can be accounted for in the statistical tests)
  • do we have any knowledge on the data (or even related data) we are testing our strategy on?
    (difficult to quantify)
  • do we have any insight on the behaviour of any endogenous or exogenous variables we are using?
    (difficult to quantify)

Because of this, much attention has to be put in all the stages of a strategy development, not only on some final statistical tests, and applying some old-school common sense becomes very important. Totally avoiding overfitting is almost impossible, all we can do is trying to reduce it as much as possible.

I wrote enough for this post, I will leave for the next one a couple of specific points that I`d like to remark….stay tuned.

P.P.S (Personal Post Scriptum):
I have just resigned from my job to fully dedicate myself to algorithmic trading and strategy development in the short term, and have my own business on the medium-longer term. This also means that I should have more time to dedicate to this blog.
Part of the merit (blame? :D) goes to an inspiring post by Michael Bigger, which I feel I have to acknowledge: Starting Over.
Liberally quoting someone (whom I apologise to for not remembering the name): in life it’s surely important to try and to do mistakes, as from mistakes you learn. However, you don’t have enough time to make all the mistakes yourself…so learn from other people’s mistakes and be open to other people’s advices.

Wish me luck!

Andrea

Posted in On backtesting, Trading Strategies Design | Tagged , , | 3 Comments

We don’t quite know what moving averages are

A couple of days ago I was reflecting on what the moving average of a price really is.
The idea stemmed from the consideration that any price can be expressed as a starting price plus the sum of subsequent returns. From this, it’s easy to see that:

MA(day 6,5) = (P1 + P2 + P3 + P4 + P5)/5

using the formalism MA(day X,5) = MA to be used to trade on day X based on previous 5 days prices.
In the example above, the next closing price would be P6, and of course we can’t use the information given by P6 to trade on day 6.

Then, given that:
P2 = P1 + r2
P3 = P2 + r3 = P1 + r2 + r3
….

We have:
MA(day 6,5) = P1 + (4*r2 + 3*r3 + 2*r4 + r5)/5

I.e. the simple 5 days moving average is given by 5-days-ago price plus a weighted average of the previous 4 days returns, with more weight given to the older returns.

Personally, I found this small result quite amusing. The fact that a non weighted moving average effectively gives more weight to the return occurred on a particular day (the oldest), it’s a good example of how even simple data transformations can be misleading.

If we apply the same reasoning to a Moving Average cross-over “system”, say MA(3) vs MA(5), we get:

MA(3) = (P3 + P4 + P5)/3
MA(5) = (P1 + … + P5)/5

Skipping the calculations (you can easily reproduce them yourself using the above substitution) we get:

MA(3) – MA(5) > 0       if

3*r2 + 6*r3 + 4*r4 + 2*r5 >0

That is, whenever this particularly weighted sum of previous 4 days returns is positive, we go long, otherwise we go short. From this, I think we can all agree that MA cross-over “system” are not a particularly clever (at best) way of building a system, doesn’t matter which combination of look -back periods we use.

Some further considerations:

1) the smoothing effect of a MA(N) comes from the fact that a particularly day return is used for a N number of days and with an increasing weight, so that the daily rate of change of the MA is kept small. E.g. observe how r5 is used in successive days:

MA(day 6, 5) = P1 + (4*r2 + 3*r3 + 2*r4 + r5)/5
MA(day 7, 5) = P2 + (4*r3 + 3*r4 + 2*r5 + r6)/5
MA(day 8, 5) = P3 + (4*r4 + 3*r5 + 2*r6 + r7)/5
MA(day 9, 5) = P4 + (4*r5 + 3*r6 + 2*r7 + r8)/5

2) based on the above, if we use an inversely weighted scheme for the returns to give more weight to recent returns, we would get a non smoothed series, as each day the MA values change significantly based on the newest return:

modified_MA(day 6, 5) = P1 + (r2 + 2*r3 + 3*r4 + 4*r5)/5
modified_MA(day 7, 5) = P2 + (r3 + 2*r4 + 3*r5 + 4*r6)/5

3) for MA-lovers, something that could be interesting to do is to project the moving average one day ahead. Given that a certain MA(N) assigns the lowest weight to most recent return, it’s easy to project the MA(N) one day forward with a low % error (with the error being lower the bigger is the look-back period N). If we are at the beginning of day 6, the MA(5) we would use to trade is:

MA(day 6, 5) = P1 + (4*r2 + 3*r3 + 2*r4 + r5)/5

However, if we look at what tomorrow’s MA(5) would look like:

MA(day 7, 5) = P2 + (4*r3 + 3*r4 + 2*r5 + r6)/5

we notice that all we are missing is r6 – which is the return having the lowest weight on tomorrow’s MA value. There are a number of ways in which we could proxy it, the simplest probably being considering it =0 (after all that’s the standard expected return value):

projected_MA(day 7, 5) =  P2 + (4*r3 + 3*r4 + 2*r5)/5

or alternatively one could use the average of the N-1 previous days returns (which would make the smoothing effect more substantial). Or really anything that comes to mind (e.g. some scheme based on volatility).

In the graph below I plotted the 10 days moving average of a price series and its one day ahead projection calculated using a 0 zero return for the missing day.

Projected Moving Average
Please note that I am in no way suggesting that this would make a better trading system than using MA cross-over, as it should be clear that I don’t consider MA cross-over a “system”. But it was nice to play around with it.
Interested readers can play around with the Matlab code I am attaching and see the effects of different weighting schemes:

movret_projection

(when I find some time I will figure out how to insert it directly into the post nicely formatted).
The function allows to specify the look-back period, the projection forward period and (optionally) whether the series is already in returns format or not.

PS: possibly all the above is quite obvious to some of you, probably even dull for those with a signal processing background…for those people, please don’t be offended by the catchy title.

Andrea

Posted in Trading Strategies Design | Tagged , , | 7 Comments

Defining a market VS characterizing a market

When trying to analyse market data it is common practice to use techniques borrowed from different fields to transform the data. Examples of these are probability distributions of price returns, moving averages, autocorrelations, rolling volatility estimates,  data mining techniques and pretty much any kind of operation we use to treat the data/build an indicator.

Nothing wrong per se in using any of these methods, but I see a problem arising when we try to define the market behaviour by means of these. In general, most of these methods cause a a great loss of information about the actual market structure. Again, this is not necessarily wrong, and actually I find it a needed step in the analysis: markets are so complex that filtering out some “noise” is necessary, but one has to be aware of what’s being considered as noise.

Take empirical probability distributions/pdf of daily returns: they are great in that they tell you more or less what’s the range of possible events on a daily basis, but nothing more really. A market could trend up for some time and then sell off right back to its original level and have one probability distribution, or it could trade on a range for the whole same period, and still have exactly the same distribution of returns. In this case, an important piece of information that gets lost (among others) is the time evolution of returns (and hence of course of their moments).
In this case, combining the knowledge of returns probability distribution with an analysis of autocorrelations could clarify the picture, as it could the use of a rolling probability distribution with a smaller time window. These expedients would allow us to use more information but of course we would still suffer a loss of information (but this is also part of our goal).
Similar argument goes for moving averages…all they tell us is how the average price has moved in the last X days . They, alone, don’t tell us anything about the actual range of the price movements, nor about how the average price will move in the future.

What I am trying to say here is that markets have a very fine and complex structure and trying to fully define them is not only almost impossible, but also not needed from a trading point of view.
If we consider markets as an ocean of information, trading consists in finding a small but meaningful  and recurrent wave (or better, many waves) out of this ocean and to ride it until it changes direction. Spotting the wave is only half of the game, building  an algorithm to ride it is the other half.
Likewise, my approach so far has been to look for small inefficiencies characterizing a market and try to take advantage of them while limiting the impact of other factors emerging from the inherent limitations of the data transformation technique in use.

PS: Happy New Year everyone!

Andrea

Posted in Trading Strategies Design | Tagged | 5 Comments