So far, I’ve been building models one month at a time. A month may have a couple of thousand observations. When cross-validating, we build 10 models using about 1800 observations to predict the 200 left out. One purpose of CV is to estimate the accuracy of one’s predictions on out of sample data. The flaw in my original method is that the cross validation is done with data from the same month. (When I say the same month, I mean the X data, aka features, are from the same month and the Y data are from the subsequent future months). Within a month, certain factors may be in favor such as a particular sector. It’s easy for the model to identify stocks of that sector within the out of month sample which are likely to do well.
It would seem to be an improvement to have data from N months (let’s say 12). We cross validate by using data from N-1 months to build a model and use that to predict the hold out month. On one cross validation we build a model with Jan-Nov and predict Dec. Fine. But let’s say the holdout month is Jan. We’d be using Feb-Nov. On the surface this seems horrible. With the Feb data we might be have as a feature the one month prior return which would be Jan. So we’d be using Jan returns to predict Jan returns at least in part. I don’t think this is actually a problem because we never use Jan data to predict Jan returns. We might have Jan returns in the Feb month, but that is used to predict March returns.
So, I’ve created a dataset with 24,503 observations from 12 months. Instead of 10-fold validation, we are using 12-fold with each fold representing the observations for a month. I’ve tested 3 models: random forest (RF), support vector machine (SVM), gradient boosting (GBM) and multivariate adaptive regression splines (MARS).
The following results are from the cross-validation process of the Caret package in R. For an explanation of the meaning of LongEx, ShortEx and Hedge, see A first look at 12 forests (011.0). In this case we have 12 fold CV. Each row of the results shows one of more parameters which were tuned in the CV process. For RF it was mtry. For SVM it was sigma and C. And so forth.
RF
## mtry LongEx ShortEX Hedge R2 RMSE LongExSD
## 1 2 1.141314 -1.342558 -0.20124361 0.002092556 11.18143 2.316107
## 2 40 1.484456 -1.382043 0.10241272 0.002238694 11.45352 2.948969
## 3 807 1.777955 -1.875619 -0.09766457 0.002179788 11.60491 3.565247
## ShortEXSD HedgeSD R2SD RMSESD
## 1 2.612861 2.473641 0.001885054 0.1746272
## 2 3.030051 3.273976 0.001971684 0.2620603
## 3 2.455914 4.179275 0.002426408 0.2666756
SVM
## sigma C LongEx ShortEX Hedge R2 RMSE
## 1 0.0008541638 0.25 0.7562042 0.63117314 1.3873774 0.002501806 11.15582
## 2 0.0008541638 0.50 1.1884288 0.05036088 1.2387897 0.002130477 11.32222
## 3 0.0008541638 1.00 0.9547220 -1.40792876 -0.4532068 0.001797831 11.55554
## LongExSD ShortEXSD HedgeSD R2SD RMSESD
## 1 2.779977 2.317431 4.692584 0.002110341 0.2175823
## 2 2.225350 2.540525 4.442606 0.001785876 0.2289691
## 3 1.726498 2.787355 3.594226 0.001464053 0.2398939
GBM
## shrinkage interaction.depth n.minobsinnode n.trees LongEx ShortEX
## 1 0.1 1 10 50 1.746685 -1.4222948
## 4 0.1 2 10 50 2.288911 -1.3838885
## 7 0.1 3 10 50 1.601154 -1.4642436
## 2 0.1 1 10 100 2.037417 -1.5966308
## 5 0.1 2 10 100 1.744456 -2.0851169
## 8 0.1 3 10 100 1.626590 -0.5796921
## 3 0.1 1 10 150 2.514393 -1.1923629
## 6 0.1 2 10 150 1.578869 -1.4178275
## 9 0.1 3 10 150 1.571405 -1.9220012
## Hedge R2 RMSE Univ LongExSD ShortEXSD HedgeSD
## 1 0.3243900 0.003956021 11.09884 2.49564e-17 3.173278 3.187041 4.836315
## 4 0.9050224 0.003373404 11.25843 2.49564e-17 2.235598 3.362059 4.157458
## 7 0.1369100 0.002832907 11.35919 2.49564e-17 2.171321 3.118587 4.167112
## 2 0.4407858 0.003752674 11.26066 2.49564e-17 2.874733 2.997295 4.803741
## 5 -0.3406605 0.002743086 11.46796 2.49564e-17 2.121899 2.788519 3.561567
## 8 1.0468981 0.002313846 11.62327 2.49564e-17 2.150218 3.411398 4.621269
## 3 1.3220299 0.003351896 11.38118 2.49564e-17 3.376122 3.210034 4.818545
## 6 0.1610410 0.002360889 11.62840 2.49564e-17 2.010223 3.264411 4.031547
## 9 -0.3505964 0.002103674 11.81194 2.49564e-17 2.295809 2.181663 3.397723
## R2SD RMSESD UnivSD
## 1 0.002958894 0.2546977 2.417891e-17
## 4 0.002756307 0.2883435 2.417891e-17
## 7 0.002264026 0.3309934 2.417891e-17
## 2 0.002670553 0.3029848 2.417891e-17
## 5 0.002154382 0.3072906 2.417891e-17
## 8 0.002014233 0.3299410 2.417891e-17
## 3 0.002627641 0.2970865 2.417891e-17
## 6 0.001708097 0.3146898 2.417891e-17
## 9 0.001713968 0.3264113 2.417891e-17
MARS
## degree nprune LongEx ShortEX Hedge R2 RMSE
## 1 1 2 3.182824 -0.3611753 2.821649 0.0022206360 11.02002
## 2 1 9 2.731354 -1.4765764 1.254778 0.0018012048 47.28738
## 3 1 17 2.645173 -1.3171756 1.327998 0.0009557406 52.17719
## Univ LongExSD ShortEXSD HedgeSD R2SD RMSESD
## 1 2.49564e-17 4.967208 1.625688 4.724085 0.002195756 0.1699836
## 2 2.49564e-17 3.103665 3.005855 5.228706 0.001884462 87.9279017
## 3 2.49564e-17 3.223307 2.686878 4.956471 0.001516772 84.2746371
## UnivSD
## 1 2.417891e-17
## 2 2.417891e-17
## 3 2.417891e-17
Observations and questions
We shouldn’t select a model based on 1 year of data given we do have 15 years. But these results can help us figure out how to deal with more data.
So what’s the best model? It depends on your evaulation metric. Using the five metrics themselves: LongEx: MARS 3.18; GBM 2.51; RF 1.77; SVM 1.19
ShortEX: SVM 0.63; MARS -0.36; GBM -0.58; RF -1.34;
Hedge: MARS 2.82; SVM 1.39; GBM 1.32; RF 0.10 R2: GBM 0.0040; SVM 0.0025; RF 0.0022; MARS 0.0022 RMSE: MARS 11.02; GBM 11.10; SVM 11.16; RF 11.18
ShortEX: SVM 0.63; MARS -0.36; GBM -0.58; RF -1.34;
Hedge: MARS 2.82; SVM 1.39; GBM 1.32; RF 0.10 R2: GBM 0.0040; SVM 0.0025; RF 0.0022; MARS 0.0022 RMSE: MARS 11.02; GBM 11.10; SVM 11.16; RF 11.18
On this basis, MARS is one of the top performers except for the R2. Interestingly, RF is near the bottom. Common machine learning practice would not endorse picking the model with the best metric. It suggests picking the parameters based on the simplest model within one standard devation of value of the best metric. However, the simplest model produced the best metric for each of the five above.
Are these results significant? In the case of LongEx, outpeforming the average stock by 3% per month would be outstanding. But its standard deviation is 5%.
These results also make me wonder if we should use separate models for Long and Short to produce Hedge. On one hand separate models would seem to produce better results (3.18 + 0.63 > 2.82). It’s bothersome to me that one model doesn’t produce the best long, short and hedge.
No comments:
Post a Comment