Thursday, January 14, 2010

A custom Optimization Profile using R

While the backtest gives us a historical perspective on the viability of a trade system, the optimization process illumines how stout, sturdy, strapping and hardy (yes, I know how to research synonyms for robust) our trade system is across a variation of parameter sets. What does this all mean? Well, if a 10/30 moving average crossover system yields spectacular returns, then we'd like to rest assured that the 9/30 or the 10/31 parameter sets are fairly similar in their results. This is a test to weed out lucky systems that have absolutely no predictive value or money-making potential. Most backtesting systems will come with a handy little optimizer. The art of choosing what range to test over is a topic for another day so let's keep it simple for now and pick simple ranges with about equal number of variations per parameter. Using our moving crossover system example, let's test the fast moving average (10-day) between the values of 3 and 18 in steps of 3 for a total of six possible values, and the slow moving average (30-day) between the values of 20 and 45 in steps of 5 for a total of six possible values. To makes things really exciting, let's add a third parameter. How about a Bollinger Band around the slow moving average that confirms the crossover signal? Great. Vary that between .3 and .9 standard deviations in steps of .2 for a total of four possible values.



This is statistics 101 review time. How many permutations are we talking about here? The method of calculating this figure is to multiply all the possible values together. Mathematically, 6 x 6 x 4 = 144 possible parameter sets. Pretty tame, but we're trying to keep it simple for now. We are also using a small number to illustrate a problem we are now faced with. During the backtest, our data took up a little over 3,000 rows, which corresponds to one trade per row. This is for a single parameter set. Once we expand our survey to include 144 parameter sets, we have increased the data file to an almost unmanageable length. Though every parameter set is not going to trigger the same number of trades, we are well on our way to 1/2 a million lines of data with just a simple 144 permutation optimization. This is not going to work for us so we need to make a compromise. Instead of taking a granular view of every trade, we are going to make some generalizations. How did the trades withing a particular parameter set within a particular market perform? This gives us one line of data per parameter set for a total of 144. Times that by 47 markets and we are in a manageable 6,738 lines of data.

I use TraderStudio to generate the optimization data and for each market it spits out certain columns of data, including the definitive parameter set column. Instead of trying to combine all 47 markets into a single file, I've opted to read-in (the R term for recognizing data) 47 separate files. The code looks like this:

################### Read Data
################### mark f

AN <- read.csv ("C:/R/BUMBL.WHITE.OPTIX/BUMBL.WHITE.OPTIX.AN.csv", skip=3)
BO <- read.csv ("C:/R/BUMBL.WHITE.OPTIX/BUMBL.WHITE.OPTIX.BO.csv", skip=3)
BN <- read.csv ("C:/R/BUMBL.WHITE.OPTIX/BUMBL.WHITE.OPTIX.BN.csv", skip=3)
.

.
.
ZZ <- read.csv ("C:/R/BUMBL.WHITE.OPTIX/BUMBL.WHITE.OPTIX.ZZ.csv", skip=3)

That's a lot of typing, but they invented this thing called copy and paste which makes the process a lot easier. May I also recommend looking into a good text editor if you're going to pursue this. I've chosen VIM and though it has a steep learning curve, it's really cool.

One view of this data is to see how many of the total 144 parameter sets were actually profitable. Did only one show a profit? Not good. If 121 showed a profit, then now we're talking. Your spreadsheet (data frame in R) is not going to look the same as mine unless you're using TradersStudio so I offer the next piece of code as a general guideline. I'm extracting a column (vector in R) of data from the whole file and creating an object that I can then manipulate.

################## Define Variables from existing vectors
################## the convention is MARKET.CAPS mark a


AN.NP   <-  AN$NetProfit
BO.NP   <-  BO$NetProfit
BN.NP   <-  BN$NetProfit
.
.
.
ZZ.NP   <-  ZZ$NetProfit

Now we further manipulate this simple vector of data to derive a percentage of profitability. I'm only going to include one market since by now you get the idea that there are actually 47 markets.

AN.npro   <-  AN.NP [AN.NP>0]  # a subset of rows with values greater than zero


AN.nprolen   <- length (AN.npro)  # how long is this subset?


AN.len   <- length (AN.NP)  # how long is the original vector (we know this is 144 in our case)


AN.pp  <-  AN.nprolen /  AN.len   # create a new object that represents percentage of profitable sets


pp <- c(AN.pp*100, ... ZZ.pp*100)  # a list comprised of each market's percent profitable sets


hist (pp, ylab="Number of Markets", col="whitesmoke",breaks=4,main="Optimization of White Bumblebee",xlab="What Percentage of All Parameter Sets Were Profitable?")  # plot the histogram

I've also added some text to the graph with the following piece of code:


text (9,7,less20, cex=.4)  # x axis location =9, y =7, object name and magnification
text (29,4,less40, cex=.4)
text (49,4,less60, cex=.4)
text (69,4,less80, cex=.4)
text (89,12,less99, cex=.4)


I have not included the code for the object 'less20' but it follows the same variable defining method. If you really wan to see it, let me know and I'll send it to you. But for most of us, we've had more than enough code so let's see the final product, please.


Here it is:




This is only one view of five I'm planning for Optimization Profile 1.0. Next up is profit expectancy in the place of net profit, and then some other metrics. But in the mean time, we have the basic framework for a working cool graphics engine.

1 comments:

Jez Liberty said...

Wow - pretty cool use of R to make sense of all this backtest and optimisation data.
I am personally getting a bit tired of doing it semi-manually in Excel... :(

Post a Comment