We don’t quite know what moving averages are

A couple of days ago I was reflecting on what the moving average of a price really is.
The idea stemmed from the consideration that any price can be expressed as a starting price plus the sum of subsequent returns. From this, it’s easy to see that:

MA(day 6,5) = (P1 + P2 + P3 + P4 + P5)/5

using the formalism MA(day X,5) = MA to be used to trade on day X based on previous 5 days prices.
In the example above, the next closing price would be P6, and of course we can’t use the information given by P6 to trade on day 6.

Then, given that:
P2 = P1 + r2
P3 = P2 + r3 = P1 + r2 + r3
….

We have:
MA(day 6,5) = P1 + (4*r2 + 3*r3 + 2*r4 + r5)/5

I.e. the simple 5 days moving average is given by 5-days-ago price plus a weighted average of the previous 4 days returns, with more weight given to the older returns.

Personally, I found this small result quite amusing. The fact that a non weighted moving average effectively gives more weight to the return occurred on a particular day (the oldest), it’s a good example of how even simple data transformations can be misleading.

If we apply the same reasoning to a Moving Average cross-over “system”, say MA(3) vs MA(5), we get:

MA(3) = (P3 + P4 + P5)/3
MA(5) = (P1 + … + P5)/5

Skipping the calculations (you can easily reproduce them yourself using the above substitution) we get:

MA(3) – MA(5) > 0       if

3*r2 + 6*r3 + 4*r4 + 2*r5 >0

That is, whenever this particularly weighted sum of previous 4 days returns is positive, we go long, otherwise we go short. From this, I think we can all agree that MA cross-over “system” are not a particularly clever (at best) way of building a system, doesn’t matter which combination of look -back periods we use.

Some further considerations:

1) the smoothing effect of a MA(N) comes from the fact that a particularly day return is used for a N number of days and with an increasing weight, so that the daily rate of change of the MA is kept small. E.g. observe how r5 is used in successive days:

MA(day 6, 5) = P1 + (4*r2 + 3*r3 + 2*r4 + r5)/5
MA(day 7, 5) = P2 + (4*r3 + 3*r4 + 2*r5 + r6)/5
MA(day 8, 5) = P3 + (4*r4 + 3*r5 + 2*r6 + r7)/5
MA(day 9, 5) = P4 + (4*r5 + 3*r6 + 2*r7 + r8)/5

2) based on the above, if we use an inversely weighted scheme for the returns to give more weight to recent returns, we would get a non smoothed series, as each day the MA values change significantly based on the newest return:

modified_MA(day 6, 5) = P1 + (r2 + 2*r3 + 3*r4 + 4*r5)/5
modified_MA(day 7, 5) = P2 + (r3 + 2*r4 + 3*r5 + 4*r6)/5

3) for MA-lovers, something that could be interesting to do is to project the moving average one day ahead. Given that a certain MA(N) assigns the lowest weight to most recent return, it’s easy to project the MA(N) one day forward with a low % error (with the error being lower the bigger is the look-back period N). If we are at the beginning of day 6, the MA(5) we would use to trade is:

MA(day 6, 5) = P1 + (4*r2 + 3*r3 + 2*r4 + r5)/5

However, if we look at what tomorrow’s MA(5) would look like:

MA(day 7, 5) = P2 + (4*r3 + 3*r4 + 2*r5 + r6)/5

we notice that all we are missing is r6 – which is the return having the lowest weight on tomorrow’s MA value. There are a number of ways in which we could proxy it, the simplest probably being considering it =0 (after all that’s the standard expected return value):

projected_MA(day 7, 5) =  P2 + (4*r3 + 3*r4 + 2*r5)/5

or alternatively one could use the average of the N-1 previous days returns (which would make the smoothing effect more substantial). Or really anything that comes to mind (e.g. some scheme based on volatility).

In the graph below I plotted the 10 days moving average of a price series and its one day ahead projection calculated using a 0 zero return for the missing day.

Projected Moving Average
Please note that I am in no way suggesting that this would make a better trading system than using MA cross-over, as it should be clear that I don’t consider MA cross-over a “system”. But it was nice to play around with it.
Interested readers can play around with the Matlab code I am attaching and see the effects of different weighting schemes:

movret_projection

(when I find some time I will figure out how to insert it directly into the post nicely formatted).
The function allows to specify the look-back period, the projection forward period and (optionally) whether the series is already in returns format or not.

PS: possibly all the above is quite obvious to some of you, probably even dull for those with a signal processing background…for those people, please don’t be offended by the catchy title.

Andrea

Posted in Trading Strategies Design | Tagged , , | 5 Comments

Defining a market VS characterizing a market

When trying to analyse market data it is common practice to use techniques borrowed from different fields to transform the data. Examples of these are probability distributions of price returns, moving averages, autocorrelations, rolling volatility estimates,  data mining techniques and pretty much any kind of operation we use to treat the data/build an indicator.

Nothing wrong per se in using any of these methods, but I see a problem arising when we try to define the market behaviour by means of these. In general, most of these methods cause a a great loss of information about the actual market structure. Again, this is not necessarily wrong, and actually I find it a needed step in the analysis: markets are so complex that filtering out some “noise” is necessary, but one has to be aware of what’s being considered as noise.

Take empirical probability distributions/pdf of daily returns: they are great in that they tell you more or less what’s the range of possible events on a daily basis, but nothing more really. A market could trend up for some time and then sell off right back to its original level and have one probability distribution, or it could trade on a range for the whole same period, and still have exactly the same distribution of returns. In this case, an important piece of information that gets lost (among others) is the time evolution of returns (and hence of course of their moments).
In this case, combining the knowledge of returns probability distribution with an analysis of autocorrelations could clarify the picture, as it could the use of a rolling probability distribution with a smaller time window. These expedients would allow us to use more information but of course we would still suffer a loss of information (but this is also part of our goal).
Similar argument goes for moving averages…all they tell us is how the average price has moved in the last X days . They, alone, don’t tell us anything about the actual range of the price movements, nor about how the average price will move in the future.

What I am trying to say here is that markets have a very fine and complex structure and trying to fully define them is not only almost impossible, but also not needed from a trading point of view.
If we consider markets as an ocean of information, trading consists in finding a small but meaningful  and recurrent wave (or better, many waves) out of this ocean and to ride it until it changes direction. Spotting the wave is only half of the game, building  an algorithm to ride it is the other half.
Likewise, my approach so far has been to look for small inefficiencies characterizing a market and try to take advantage of them while limiting the impact of other factors emerging from the inherent limitations of the data transformation technique in use.

PS: Happy New Year everyone!

Andrea

Posted in Trading Strategies Design | Tagged | 3 Comments

Quantile normalization: a simple trick to reduce overfitting

To understand whether a strategy is able to perform in the future, the first question to ask is probably whether our strategy really showed great performance in the historic back-test or all it was doing was just describing past data accurately.

In regards to this, something I recently started doing when searching the strategy parameters space (i.e. “optimizing the strategy”), is to evaluate only the 98% quantile of the returns, that is leaving out the top 2% trades/days.

This approach has 2 main advantages:

1- It reduces the chances that the search algorithm gets stuck to some particular parameters values catching one-of/rare patterns in market history (e.g. 1987 crash, 2010 flash crash, or just  particular squeezes).

2- If the search has positive outcome, what you find is a strategy that is profitable even if the top 2% of his trades never occurred, which means that the strategy is relatively robust.

Of course depending on the trading strategy in use, you may want to change the actual percentage value (if you are using a MR strategy, usually having high win %, you might want to use an higher threshold), but you get the idea.

In some extent, this is similar to using the Omega ratio with a variable threshold level (given by the 98% quantile). However, I find this quantile normalization approach somewhat more powerful, as it allows you to calculate the full range of strategy statistics (including your chosen fitness function) across these “normalised” returns – in contrast with the single number spit out using the Omega ratio.

Andrea

Posted in On backtesting, Performance Metrics | Tagged , , | Leave a comment

The perils and rewards of Diversification: improve your system’s Sharpe Ratio

Without exaggeration, diversification is one of the most powerful tools of a trader, allowing to enhance many characteristics of a system with relatively little effort.
However, as with many things, one has to be very careful on using it properly.

How to exactly define and approach diversification is not trivial, and I may have a series of posts on this. For now, as an example, I’ll try to increase the Sharpe Ratio of a strategy through diversification (whether Sharpe’s improvement is something to aim at is another matter).

To prove the point, I generated two negatively correlated (rho = -0.4) series of numbers and created two virtual strategies performance series out of them.

From these, I then created a “diversified system” composed of the two sub strategies. To make things comparable, the diversified system weights the two strategies so that the total weight sums up to 1, so that the virtual allocated capital is the same in the 3 cases.

As you may have guessed, this system shows an higher Sharpe than any of the two individual strategies (*note of caution on this at the bottom of the post).

What’s really interesting, however, is that one of the 2 starting strategies is a loser. Here are the Sharpe of the 3 strategies:

Sharpe ratio 1                                   0.86
Sharpe ratio 2                                  -0.19
Sharpe ratio diversified:               0.88

Adding a losing strategy to a profitable strategy increased the Sharpe Ratio of our system.

Put it this way, improving the “performance” of a system seems quite simple, we don’t even need to find a profitable strategy!
Is it really that simple? Well, no…it really depends on your definition of “performance”.

As discussed extensively (here and here),  Sharpe Ratio is not a perfect measure of performance.
In this case, the catch is that the total return of the diversified system is lower than the total return of the profitable strategy. Here is a plot of the strategies results:

Although at the end of the day the evaluation of a system boils down to the utility function of your choice, personally I would not add a losing strategy to a system just to improve its Sharpe Ratio. I would rather allocate part of the capital to the cash, or even just hide it under the mattress.

Andrea

*Note of caution: combining two negatively correlated strategy doesn’t always increase the Sharpe Ratio of the individual strategies, main reason being that mere correlation is not able to exhaustively quantify the relationship between two time series (looking at the covariance matrix of strategies returns and at its evolution over time may be a more sophisticated approach)

Posted in Trading Strategies Design | Tagged , | Leave a comment