Feature selection in trading algorithms

Lately I have been looking for a more systematic way to get around overfitting and in my quest I found it useful to borrow some techniques from the Machine Learning field.

If you think about it, a trading algorithm is just a form of AI applied to prices series. This statement, although possibly obvious, puts us in the position to apply a number of Machine Learning techniques to our trading strategies design.

Expanding what discussed here (and here), it seems intuitive that the more features in a model, the more generally speaking the model might be subject to overfitting. This problem is known as the bias-variance trade-off and is usually summarised by the graph on the right.

bias-variance tradeoff

As complexity increases, performance in the training set increases while prediction power degrades

What’s possibly less intuitive is that the specific features used in relation with the dynamics to predict play a key role in determining whether we are overfitting past data, so that the error behaviour showed in the graph is just a generalization.

Something particularly interesting is that the use of the very same feature (e.g. in our application an indicator, a take profit or stop loss mechanism, etc) might or might not cause overfitting according to the dynamics we are trying to fit.

The reason behind this is that some phenomena (or some times even variants of the same phenomenon) simply can’t be described by some features.

As an example, imagine you are trying to forecast the future sales of a sportwear store in Australia. A “good” feature to use could be the season of the year, as (say) Aussies are particularly keen in water sports and so springs and summers tend to show the best sales for the year.
Now imagine trying to forecast the future sales of a similar sportwear store located somewhere in the US. It might be the case that US citizens don’t have a preference for any particular season, as in the summer they practice water sports and in the winter they go skiing. In this new scenario, a model using the season of the year as a feature is more likely to result in an overfitted model because of the different underlying dynamics.

Back to financial markets, an example of this could be how a stop loss mechanism tends to be (generally speaking and according to my experience) a good feature for trend-following strategies, but not for mean-reversion strategies (and viceversa for target profit orders). A possible explanation of this could be that trends are well described by the absence of big adverse movements, but their full extension can’t be known beforehand (but this is just me trying to rationalize my empirical findings).

So, how do you understand which features are good candidates?
Luckily for us, there are a whole bunch of techniques developed in the Machine Learning field to operate feature selection. I recommend the following 2003 paper for an overview of the methods: An Introduction to variable and feature selection by Isabelle Guyon. Any text of Machine Learning should also cover some of the techniques, as it does the exceptional Stanford’s Machine Learning class in Coursera 
Any other readers’ recommendation (or comment) is of course very welcome.


Posted in On backtesting, Trading Strategies Design | Tagged , , , , | 12 Comments

Resources for Quantitative Trading

Since a few readers asked me about this, here is a list of resources/tools useful in approaching quantitative trading.

For a similar (and more detailed in some aspects) list, you can have a look at this series of 3 posts from Quantivity:

In regards to books, there are obviously so many that are worth including, here I am selecting only among those that I personally read and hence feel comfortable to recommend.
For reviews on more financial books that you’ll ever be able to read, I suggest to have a look at Reading the Markets.

If there is some particular resource/link that you feel I am missing, please feel free to share it and I will include it in this list.

Where to look for ideas

  • Observing the markets
  • Understanding the inside-out of financial instruments
  • Academic Papers and Journals in general
  • Newspapers
  • Blogs (see blog roll)
  • Forums (Elite Trader, Tradersplace, Trade2win, Big Mike, Wilmott, Quantnet, Nuclear Phynance)

Statistical/Mathematical tools

Really anything coming from a number of fields:

  • Statistics
  • Machine Learning
  • Signal Processing
  • Physics (Complex Systems, Chaos Theory,…)
  • Econometrics
  • Stochastics Calculus

Market Data

Main Programming Languages used

  • C++ (execution mainly)
  • C# (execution mainly)
  • Java (execution mainly)
  • Python (execution and research)
  • Matlab (execution and research)
  • R (research mainly)





Posted in Resources for trading, Trading Strategies Design | Tagged , , , , , | 3 Comments

Trimmed performance estimators

This is a quick follow-up on my previous post on Quantile normalization.

Instead of removing just the top X quantile of returns/trades when optimizing a strategy’s parameters space, my recent approach has been to remove the top and bottom X quantiles, so effectively using a robust trimmed estimator of performance instead of the estimator itself.

The advantages are symmetric to those discussed in the previous post, as long as your backtest allows for realistic modelling of trades execution – e.g. if  you are using stop orders and trade bars (as opposed to tick data), you probably want to add an amount of slippage in some way proportional to the size of bar (specification needed because a conservative modelling of limit orders is easier to achieve).

Trimming out the worst returns is particular useful in case of strategies having single big losses (such are mean-reversion strategies of some kind usually), whereas trimming the best returns is more useful for strategies with big positive days (e.g. trend-following strategies).

Two (of many) possible variants are:

-To preserve autocorrelations of a strategy’s returns, one could decide to remove blocks of trades/days, instead of individual trades/days (in a similar fashion to what one does when bootstrapping blocks of trades/days).

-To preserve the number of samples in our results instead of removing the top (worst) days, one could replace them with the average/median positive (losing) days.

Something else to note is that if your performance measure makes use of std deviation (as it’s the case for Sharpe Ratio), trimming the tails of the returns from its computation is likely to result in an overestimation of the performance.

Finally, here’s the Matlab code:

normalise_excess_pnl = 1;
normalisation_quantile = 0.98;

if normalise_excess_pnl

best_daily_pnl = quantile(pnl_daily,normalisation_quantile);
worst_daily_pnl = quantile(pnl_daily,1-normalisation_quantile);

pnl_daily(pnl_daily>=best_daily_pnl ) = [];
pnl_daily(pnl_daily<=worst_daily_pnl ) = [];



(I usually have the variable normalise_excess_pnl automatically initialised to 1 or 0 from the external environment, according to whether or not I’m running an optimisation).


Posted in On backtesting, Performance Metrics | Tagged , , , | Leave a comment

Underfitting, misfitting and understanding alpha’s drivers

While overfitting is certainly a challenge, falling for the opposite extreme is also a possibility.
Reporting part of an interview of William Echkardt from Futures magazine (which I would recommend to read in full from here):

“I can talk a little more about over-fitting, if not my personal proprietary techniques. First of all I like the [term] over-fitting rather than curve-fitting because curve-fitting is a term from non-linear regression analysis. It is where you have a lot of data and you are fitting the data points to some curve. Well, you are not doing that with futures. Technically there is no curve-fitting here; the term does not apply. But what you can do is you can over-fit. The reason I like the term over-fit rather than curve-fit is that over-fit shows that you also can under-fit. The people who do not optimize are under-fitting.”

Underfitting and Misfitting

If we are using an insufficient number of degrees of freedom, so that our system doesn’t differentiate between some key changes in market’s behaviour, then what we are doing is underfitting. A trivial example of underfitting could be buying a random stock from the stock universe at a random point in time and holding it for a random time period.

Another possibility is that we are not using the
right variables (or we have the right variables but we are using them in a poor way) – let’s call this misfitting. Imagine a model on Italian BTPs that looks at Crude Oil prices and totally ignores the spread with German bonds (now, there could even be some exploitable relationship between BTPs and Crude Oil, just trying to make a point).

Clearly, what makes a variable “right” for a given model and a given asset is highly arguable.
Similarly to what said for overfitting, I don’t think we can easily tell in absolute terms whether a model is flawed with underfitting or misfitting (except for very obvious cases). Rather, I like to reason in terms of the possible existence of a better model’s specification that we are ignoring, e.g. there could be a key factor that our model is particularly sensitive to and that we are not accounting for (either in terms of the specific asset we apply the model to or in terms of market’s current dynamics).  Or it could be the case that we are using some variables that are only linked to the real factor, but are not the actual alpha driver.

Techniques to perform this kind of analysis include PCA and factor analysis, but according to what one exactly does many other quantitative techniques can be applied (at a portfolio level, something like market clustering  presented from David Varadi seems promising).
Of course (and unfortunately), we have to keep in mind that the more we operate this kind of a posteriori analysis, the more likely we are to go one extreme (underfitting/misfitting) to the other (overfitting).

Fat tails and changing market dynamics

In another part of the interview mentioned above, Mr Echkardt strictly relates the number of degrees of freedom to the number of trades in our backtest, arguing that one needs more trades than expected in a “Gaussian world” because of fat tails of markets’ returns. While I agree with the qualitative relationship between degrees of freedom and number of trades, I am not sure I agree with the strict quantitative relationship between the two variables.
The reason for this is twofold:

1) It’s not always possible to exactly quantify the actual number of degrees of  freedom being used or how much hindsight we are pouring into our modelling (as discussed in my previous post);

2) I think fat-tails is only part of the story. Another big part is the continuous changes that markets go trough (under the shape of heteroskedasticity but not only).

Imagine you test a model over 2 years of data, and that because the model is a relatively high-frequency model (and so produces a very high number of trades) you think you are guarding your self from overfitting. What you might be ignoring is that having tested the model over a relatively short time window, you could have not tested it against different market conditions. It might well be that 2.5 years ago markets were somewhat different and your model was useless, which implies that as soon as markets change again you will lose your edge.  An example could be a model that unknowingly takes advantage of some market behaviour born from the Fed being on hold over such a long time period.
This is another form of overfitting if you want, but one which can’t be accounted for by simply looking at the number of trades vs the number of model’s parameters.

Because of this, I’d always like to test any new strategy on as much historical data as possible. In regards to this, I am in partial disagreement with Dr Chan, who states that he seldom tests strategies with data older than 2007 (read more here: The Pseudo-science of Hypothesis Testing). All other things being equal, I find a strategy that worked well for a long time to be more likely to work in the near future than a strategy who worked well over a short history (which is not to mean that something that started working only recently can’t keep working). Also, even if you have something that started working only recently, having a look at how it behaved when it didn’t really perform can certainly offer some interesting insights – especially if you are not sure on what the driver behind your alpha really is.

Alpha’s drivers

This leads me to the final point before concluding this long post: do we really have to understand what our model is doing and what kind of inefficiency we are exploiting?

Personally, I think that understanding the underlying driver of our alpha is certainly a big plus, as it lets you directly monitor the behaviour of the prime driver, which in turn could give you some practical insights in troubled times. However, this is not always enough – think of the quant funds during the 07-08 meltdown: they were fully aware of the driver behind their equities stat arb strategies, but they still got trapped in orders flows and forced-liquidations. Another example could well be the blow-up of LTCM.

Moral of the story is that there could always be an additional layer of complexity not being considered, so that (partly) understanding our alpha’s driver might not offer any additional upside.
Therefore, although nice I don’t deem it necessary to understand the real driver behind our alpha – provided that our statistical analysis gives us enough confidence to trade our strategy.

Schwager’s Market Wizards series presents supporter of both sides, under the names of D.E. Shaw and Jaffray Woodriff. You can read more about their views in William Hua’s post in Adaptive Trader: Ensemble Methods with Jaffray Woodriff, or have a look at this QUSMA’s post for a more in-depth example of Woodriff’s approach: Doing the Jaffray Woodriff Thing (Kinda)


Posted in On backtesting, Trading Strategies Design | Tagged , , , , , | 1 Comment