Milktrader

Iterating Until Convergence

Sunday, November 8, 2009

Evaluating the Fitness of a Fitness Function

The walk-forward process of system development is the final test of a system before real capital gets allocated. It validates the system on out-sample data, or data that hasn't been peeked at during development. It's not very complicated, really. You optimize your system on a range of data and then choose the best parameter set to trade with in the out-sample period. Then you observe the results and determine if the system warrants any capital investment. As with most things in system trading, you need to make a decision on how you approach this idea of picking the best parameter set. Who or what decides what is best? The arbiter is known as a fitness function, because it determines which parameter set is best suited for future trading. Not all fitness functions are created equal though. Let's look at three fitness functions and see how they offer different results.

I'm using data derived from a walk-forward of the White Bumblebee system and the DX future contract (US Dollar). The out-sample data is over 10 years in length, and the total number of walk-forward periods comes to 10, as I've optimized on 400 days and traded on 200 days. The different fitness functions include Net Profit, Net Profit/Drawdown, and Pessimistic Return on Margin (PROM). For Net Profit, the system returns the parameter set that had the highest Net Profit during the optimization period and then uses those settings for the trading window. Same logic for the other fitness functions.

I've broken down the analysis into two sections: Profit Metrics, and Drawdown Metrics. In each category, I'm interested in the total, range, max, min, mean and standard deviation. My backtesting software is TradersStudio and the statistical software that paints the graphs is R.

First let's look at the Profit Metrics.


In the end, all three produced about the same total profit over 10 years. Net Profit registered the largest profit during a single period, but also the highest range. All three fitness functions had losing periods, but it is interesting to note that the Net Profit/Drawdown function had the least worst period, or it lost small when it lost.

Next, let's look at the Drawdown Metrics.



Here again, the Net Profit/DD fitness function yields the most attractive results. It experienced the least drawdown in a period and also the least amount of total drawdown during the 10-year testing period.

Keep in mind that the sample size is fairly small here with only 10 walk-forward periods being analyzed. You could easily argue that these results are not statistically valid, but at least it points out that all fitness functions are not created equal. Finding the best one may require some time and brain cells on the system trader's part, but hopefully well-spent.


7 comments:

Jez Liberty said...

Hi Milk,

I see you've made great inroads in R - well done. Great charts and interesting analysis results. It seems to possibly validate what you inferred (ie that Profit/DD is the best measure). Good luck on using these brain cells to find the best fitness function! How long did the walk-forward tests take to run in TradersStudio by the way?

I was actually planning a post on the bliss function myself (following our discussion on the walk-forward post comments) to clarify my thoughts and expand on it (albeit more from a measuring aspect only rather than predictability also).

PS: I managed to get up early this morning to work on the e-ratio code in Amibroker - looks much faster than TraderStudio (ie. mins vs hours). Felt good to manage and squeeze a bit of work at the weekend..

-Jez

Milktrader said...

The walk forward didn't take long at all, about 20 minutes.

I'd be interested in knowing how Amibroker is so much faster than TradersStudio. Are the results the same from each program?

This sketch of fitness function comparison is by no mean conclusive, and I'd like to see how the comparison plays out with multiple markets.

Jez Liberty said...

I am going to run a speed comparison test (hopefully tonight) and post the results on the blog. Obviously I will try to get the results as close to each other in each system/platform (with same data and system parameters) - completely agree that the test is only relevant if trades and calculations are the same (or close enough).

From reading the Amibroker user manuals I got the impression that Amibroker was built from the start to optimise speed - and I can believe that it could have a (big) impact on performance compared to a platform built with only functionality in mind. In my "real job" I have cut down running times of some batches by a factor 10 to 100 just by a re-write with performance as the main objective (ie bulk operations vs. individual loops, data caching, etc.). There are usually lots of places code can be optimised for performance if you put your mind to it.

The downside - for Amibroker - is that it might have its limitations in terms of functionality available (ie I get a felling TradersStudio with its dedicated Trade Plan functionality allows for more flexible money management/position sizing/portfolio allocation).

So it might be that Amibroker as a quick and fast prototyping tool combined with TradersStudio as a more "in-depth" full strategy testing is a great combo. This is definitely an impression I got from reading some other user comments (on elitetrader, etc.).

Milktrader said...

I'm looking forward to the results. Please list the number of permutations that are being tested. Once you get into four parameters, the number of permutations can be astounding for an optimization run, so brute force is typically abandoned in favor of more complex search methods such as genetic algorithms, particle swarm and simulated anealing.

Jez Liberty said...

Milk - you might have seen it but I did put the results on the blog:
http://bit.ly/3SPEV4

This was for a simple e-ratio calc going through 50 steps and there is no optimisation logic there as they all need to be run.

AmiBroker was 25x faster!

rara avis said...

It's a free country, and fitness centers abound, let alone fitness functions (why the ugly name? Common mortals call them objective functions, or just metrics). But any function that takes min and max or a random quantity has usually very bad statistical properties (e.g., see quantile estimators in statistics). Different fitness functions will have different values on identical data (obvious). The issue is on which functions make economic sense and have good statistical properties in a backtest.

Last remark. Fitting strategies on a training set, and selecting the best based on out-of-sample test is just another form of data mining and will result in overfitting. Backtesting is not very complicated to perform, but is actually hard to do right.

Milk Trader said...

I opted for the phrase 'fitness' function because I just bought the Perfect Pushup and had 'fitness' on the brain.

Good point on selecting the fitness function that yields the best results, even on out-of-sample testing. I'm not advocating cherry-picking a fitness function based on results, but rather am highlighting the fact that there are a wide range of metrics one can use to select a discriminating function.

Post a Comment