Tuesday, January 5, 2016

More Cross Validation (14.0)

Here are the cross validation results for 18 models which various combinations of 3 algorithms, 2 time periods (1 to 12 months and 13 to 24 months), 2 Y variables (1 month and 12 month) and whether PCA was used in pre-processing.  The main point is to illustrate lack of consistency from one year to the next.  
VariationValues
Model typeGBM (Stochastic Gradient Boosting), SVM, MARS (Multivariate Adaptive Regression Spline)
Time PeriodMonths 1-12 (essentially 2003) cross-validated or 13-24 (2004)
Y Variable1 Month return or 12 month return
PreProcessWith or without PCA
Since PCA did not help much, I only performed it for the 1 month Y variable. Below there are 5 panels: LongEx is the average return of the 50 stocks with highest predicted return minus the average of all stocks. ShortEX is the average return of the 50 stocks with the lowest predicted return minus the average of all stocks. Hedge is the LongEx-ShortEX. R2 and RMSE are the r-square and root mean square error. I think of LongEx, ShortEX and Hedge as measures of how well the models predict the right, left and both tails. R2 and RMSE measure how well all observations are predicted.
Most importantly it appears the models produce good LongExresults for the first 12 month period and poor for the second. That is reversed for ShortEx. GBM produces ok results accross the board.
## $LongEx
##          Y1M_Mos1_12 Y1M_Mos13_24 Y12M_Mos1_12 Y12M_Mos13_24
## GBM             2.51         0.29         2.26          0.53
## MARS            3.18        -0.67         3.29         -1.19
## SVM             1.19         0.07         1.57          0.85
## GBMwPCA         1.48         0.99           NA            NA
## MARSwPCA        1.36         0.69           NA            NA
## SVMwPCA         1.21         0.14           NA            NA
## 
## $ShortEX
##          Y1M_Mos1_12 Y1M_Mos13_24 Y12M_Mos1_12 Y12M_Mos13_24
## GBM            -0.58         1.88        -0.84          1.89
## MARS           -0.36         1.70         9.57          4.10
## SVM             0.63         0.02        -0.15         -0.43
## GBMwPCA        -1.22         0.64           NA            NA
## MARSwPCA       -0.56        -0.30           NA            NA
## SVMwPCA         0.27        -0.26           NA            NA
## 
## $Hedge
##          Y1M_Mos1_12 Y1M_Mos13_24 Y12M_Mos1_12 Y12M_Mos13_24
## GBM             1.32         2.06         1.20          1.58
## MARS            2.82         0.19         8.59          0.11
## SVM             1.39        -0.37         1.04          0.13
## GBMwPCA        -0.02         1.27           NA            NA
## MARSwPCA        0.64         0.40           NA            NA
## SVMwPCA         0.49        -0.12           NA            NA
## 
## $R2
##          Y1M_Mos1_12 Y1M_Mos13_24 Y12M_Mos1_12 Y12M_Mos13_24
## GBM             0.40         0.16         0.31          0.18
## MARS            0.14         0.04         0.13          0.41
## SVM             0.25         0.13         0.15          0.12
## GBMwPCA         0.23         0.08           NA            NA
## MARSwPCA        0.00         0.00           NA            NA
## SVMwPCA         0.18         0.09           NA            NA
## 
## $RMSE
##          Y1M_Mos1_12 Y1M_Mos13_24 Y12M_Mos1_12 Y12M_Mos13_24
## GBM            11.81        11.04        20.07         12.85
## MARS           52.18        11.46        63.97         32.60
## SVM            11.56        10.87        19.90         12.75
## GBMwPCA        11.73        10.98           NA            NA
## MARSwPCA       36.65        92.58           NA            NA
## SVMwPCA        11.48        10.80           NA            NA

Sunday, January 3, 2016

Four Models Cross-Validated (13.0) Rex Macey

So far, I’ve been building models one month at a time. A month may have a couple of thousand observations. When cross-validating, we build 10 models using about 1800 observations to predict the 200 left out. One purpose of CV is to estimate the accuracy of one’s predictions on out of sample data. The flaw in my original method is that the cross validation is done with data from the same month. (When I say the same month, I mean the X data, aka features, are from the same month and the Y data are from the subsequent future months). Within a month, certain factors may be in favor such as a particular sector. It’s easy for the model to identify stocks of that sector within the out of month sample which are likely to do well.
It would seem to be an improvement to have data from N months (let’s say 12). We cross validate by using data from N-1 months to build a model and use that to predict the hold out month. On one cross validation we build a model with Jan-Nov and predict Dec. Fine. But let’s say the holdout month is Jan. We’d be using Feb-Nov. On the surface this seems horrible. With the Feb data we might be have as a feature the one month prior return which would be Jan. So we’d be using Jan returns to predict Jan returns at least in part. I don’t think this is actually a problem because we never use Jan data to predict Jan returns. We might have Jan returns in the Feb month, but that is used to predict March returns.
So, I’ve created a dataset with 24,503 observations from 12 months. Instead of 10-fold validation, we are using 12-fold with each fold representing the observations for a month. I’ve tested 3 models: random forest (RF), support vector machine (SVM), gradient boosting (GBM) and multivariate adaptive regression splines (MARS).
The following results are from the cross-validation process of the Caret package in R. For an explanation of the meaning of LongEx, ShortEx and Hedge, see A first look at 12 forests (011.0). In this case we have 12 fold CV. Each row of the results shows one of more parameters which were tuned in the CV process. For RF it was mtry. For SVM it was sigma and C. And so forth.

RF

##   mtry   LongEx   ShortEX       Hedge          R2     RMSE LongExSD
## 1    2 1.141314 -1.342558 -0.20124361 0.002092556 11.18143 2.316107
## 2   40 1.484456 -1.382043  0.10241272 0.002238694 11.45352 2.948969
## 3  807 1.777955 -1.875619 -0.09766457 0.002179788 11.60491 3.565247
##   ShortEXSD  HedgeSD        R2SD    RMSESD
## 1  2.612861 2.473641 0.001885054 0.1746272
## 2  3.030051 3.273976 0.001971684 0.2620603
## 3  2.455914 4.179275 0.002426408 0.2666756

SVM

##          sigma    C    LongEx     ShortEX      Hedge          R2     RMSE
## 1 0.0008541638 0.25 0.7562042  0.63117314  1.3873774 0.002501806 11.15582
## 2 0.0008541638 0.50 1.1884288  0.05036088  1.2387897 0.002130477 11.32222
## 3 0.0008541638 1.00 0.9547220 -1.40792876 -0.4532068 0.001797831 11.55554
##   LongExSD ShortEXSD  HedgeSD        R2SD    RMSESD
## 1 2.779977  2.317431 4.692584 0.002110341 0.2175823
## 2 2.225350  2.540525 4.442606 0.001785876 0.2289691
## 3 1.726498  2.787355 3.594226 0.001464053 0.2398939

GBM

##   shrinkage interaction.depth n.minobsinnode n.trees   LongEx    ShortEX
## 1       0.1                 1             10      50 1.746685 -1.4222948
## 4       0.1                 2             10      50 2.288911 -1.3838885
## 7       0.1                 3             10      50 1.601154 -1.4642436
## 2       0.1                 1             10     100 2.037417 -1.5966308
## 5       0.1                 2             10     100 1.744456 -2.0851169
## 8       0.1                 3             10     100 1.626590 -0.5796921
## 3       0.1                 1             10     150 2.514393 -1.1923629
## 6       0.1                 2             10     150 1.578869 -1.4178275
## 9       0.1                 3             10     150 1.571405 -1.9220012
##        Hedge          R2     RMSE        Univ LongExSD ShortEXSD  HedgeSD
## 1  0.3243900 0.003956021 11.09884 2.49564e-17 3.173278  3.187041 4.836315
## 4  0.9050224 0.003373404 11.25843 2.49564e-17 2.235598  3.362059 4.157458
## 7  0.1369100 0.002832907 11.35919 2.49564e-17 2.171321  3.118587 4.167112
## 2  0.4407858 0.003752674 11.26066 2.49564e-17 2.874733  2.997295 4.803741
## 5 -0.3406605 0.002743086 11.46796 2.49564e-17 2.121899  2.788519 3.561567
## 8  1.0468981 0.002313846 11.62327 2.49564e-17 2.150218  3.411398 4.621269
## 3  1.3220299 0.003351896 11.38118 2.49564e-17 3.376122  3.210034 4.818545
## 6  0.1610410 0.002360889 11.62840 2.49564e-17 2.010223  3.264411 4.031547
## 9 -0.3505964 0.002103674 11.81194 2.49564e-17 2.295809  2.181663 3.397723
##          R2SD    RMSESD       UnivSD
## 1 0.002958894 0.2546977 2.417891e-17
## 4 0.002756307 0.2883435 2.417891e-17
## 7 0.002264026 0.3309934 2.417891e-17
## 2 0.002670553 0.3029848 2.417891e-17
## 5 0.002154382 0.3072906 2.417891e-17
## 8 0.002014233 0.3299410 2.417891e-17
## 3 0.002627641 0.2970865 2.417891e-17
## 6 0.001708097 0.3146898 2.417891e-17
## 9 0.001713968 0.3264113 2.417891e-17

MARS

##   degree nprune   LongEx    ShortEX    Hedge           R2     RMSE
## 1      1      2 3.182824 -0.3611753 2.821649 0.0022206360 11.02002
## 2      1      9 2.731354 -1.4765764 1.254778 0.0018012048 47.28738
## 3      1     17 2.645173 -1.3171756 1.327998 0.0009557406 52.17719
##          Univ LongExSD ShortEXSD  HedgeSD        R2SD     RMSESD
## 1 2.49564e-17 4.967208  1.625688 4.724085 0.002195756  0.1699836
## 2 2.49564e-17 3.103665  3.005855 5.228706 0.001884462 87.9279017
## 3 2.49564e-17 3.223307  2.686878 4.956471 0.001516772 84.2746371
##         UnivSD
## 1 2.417891e-17
## 2 2.417891e-17
## 3 2.417891e-17

Observations and questions

We shouldn’t select a model based on 1 year of data given we do have 15 years. But these results can help us figure out how to deal with more data.
So what’s the best model? It depends on your evaulation metric. Using the five metrics themselves: LongEx: MARS 3.18; GBM 2.51; RF 1.77; SVM 1.19
ShortEX: SVM 0.63; MARS -0.36; GBM -0.58; RF -1.34;
Hedge: MARS 2.82; SVM 1.39; GBM 1.32; RF 0.10 R2: GBM 0.0040; SVM 0.0025; RF 0.0022; MARS 0.0022 RMSE: MARS 11.02; GBM 11.10; SVM 11.16; RF 11.18
On this basis, MARS is one of the top performers except for the R2. Interestingly, RF is near the bottom. Common machine learning practice would not endorse picking the model with the best metric. It suggests picking the parameters based on the simplest model within one standard devation of value of the best metric. However, the simplest model produced the best metric for each of the five above.
Are these results significant? In the case of LongEx, outpeforming the average stock by 3% per month would be outstanding. But its standard deviation is 5%.
These results also make me wonder if we should use separate models for Long and Short to produce Hedge. On one hand separate models would seem to produce better results (3.18 + 0.63 > 2.82). It’s bothersome to me that one model doesn’t produce the best long, short and hedge.

Friday, January 1, 2016

SVM Results (012.4)

In this post, we look at out-of-sample performance for some support vector machine (SVM) models. The purpose here is not an exhaustive analysis. We are only hoping to get an indication that our efforts might bear fruit.
Each model we build uses predictor (x) data from one month (e.g. Dec 2002) and response (y) data from the subsequent month (e.g. Jan 2003). We refer to a model here by xmonth/ymonth (e.g, Dec02/Jan03) to indicate the months of the x and y data. The y variable is a company’s excess return defined as the return of company for the month less the average return of all companies for that month.
The first model is Dec02/Jan03. The first set of predictions was for Jan 2004 created by averaging the 12 predicted returns for each company by feeding in the Dec03 x data into the Dec02/Jan03 through the Nov03/Dec03 models. The last prediction is for Oct 2015. We have 142 months of predicted return, a bit shy of 12 years.
For each month with a predicted return, we calculate three values: LongEx, ShortEx and Hedge. Respectively, these roughly represent buying the 50 stocks with the top predicted returns, shorting the bottom 50, and doing both which can be thought of as long, short and hedged portfolios. However, since our y variable is an excess return over the average stock, the values for LongEx is the return over the average. The values for ShortEx are the returns under the index. Hedge represents the long minus the short. In all cases, positive values are desirable.
Below is a summary of the 142 observations for each.
##      LongEx            ShortEX            Hedge        
##  Min.   :-22.5302   Min.   :-32.275   Min.   :-46.755  
##  1st Qu.: -1.6911   1st Qu.: -1.435   1st Qu.: -2.763  
##  Median :  0.8873   Median :  1.280   Median :  2.070  
##  Mean   :  0.6054   Mean   :  1.349   Mean   :  1.954  
##  3rd Qu.:  2.8920   3rd Qu.:  4.568   3rd Qu.:  7.004  
##  Max.   : 19.3902   Max.   : 21.453   Max.   : 38.014
The means and medians are all positive, but there is substantial variation. Below we look to see if the means are significantly different from zero. The standard error is the standard deviation divided by the square root of 142. That is that standard deviation of the sample mean used to determine the signicance of a sample mean. The z score is the mean divided by the standard error.
##           LongEx   ShortEX      Hedge
## Mean   0.6053754 1.3486556  1.9540310
## SD     5.0889217 6.5421246 10.2368647
## StdErr 0.4270528 0.5490029  0.8590586
## z      1.4175657 2.4565545  2.2746190
It appears that the model is better at finding short candidates than long. For fun, the average annual return for the hedged portfolio (without transaction costs or taxes) would have been 18.15% which sounds nice but it had an annualized standard deviation of 35.46 which is much higher than a stock index.
The data in graphical and tabular forms follows.
##               LongEx       ShortEX       Hedge           R2      RMSE
## 20040102   4.6985561   2.694902012   7.3934581 1.379280e-02 11.580118
## 20040130  -3.2886064  -0.529383004  -3.8179894 1.540982e-02  9.832418
## 20040227   1.1691026   0.453155712   1.6222583 3.306673e-05 10.107648
## 20040402  -9.4197351  -5.162604176 -14.5823392 4.068867e-02 11.785238
## 20040430  -2.4486500  -2.728383848  -5.1770338 1.016575e-02 11.720396
## 20040604   2.0121839   1.505728151   3.5179121 1.343413e-02  9.551383
## 20040702   8.0205122   6.147703657  14.1682158 2.246450e-02 10.503473
## 20040730  -3.9090475   1.550578868  -2.3584686 3.820769e-03 10.185049
## 20040903   7.7911959  -5.382921111   2.4082748 9.784061e-04  9.686064
## 20040930  -1.1508347  -5.031083623  -6.1819183 1.032160e-02 10.649651
## 20041029  -5.3518646  -4.301093109  -9.6529577 1.363384e-01  4.523042
## 20041203  -6.2159879  -1.086511442  -7.3024994 5.084027e-03 10.082328
## 20041231   4.9204895  11.510903842  16.4313933 8.262045e-02  9.426351
## 20050131  10.1772604   5.080044931  15.2573053 4.480866e-02  9.867702
## 20050228  -1.1684236   5.278319163   4.1098956 2.176766e-03  9.475941
## 20050331  -0.5282040   2.610445860   2.0822419 1.266045e-02  9.545485
## 20050429   0.6247247  -2.256541253  -1.6318166 2.394808e-02 11.260544
## 20050531   5.0352092  -0.913236628   4.1219726 3.004012e-03  9.146220
## 20050630   2.7664134  -2.203862367   0.5625510 6.369541e-03  9.789016
## 20050729   5.0818389   0.397479009   5.4793179 9.274058e-03  9.841564
## 20050831   5.9381303   1.452395046   7.3905253 2.970105e-02 10.021611
## 20050930  -5.7560095   2.696531632  -3.0594778 3.815124e-03  9.652464
## 20051031  -2.0949916  -0.039789787  -2.1347814 6.963174e-04  9.563784
## 20051130   3.4136096   4.647954537   8.0615641 9.050108e-03  8.172802
## 20051230   8.4483295  -0.381620938   8.0667085 2.714860e-02 11.340995
## 20060131 -12.1742304   2.631561766  -9.5426687 1.421208e-02  9.780551
## 20060228   2.1173905   5.101000525   7.2183910 1.119775e-02  9.763087
## 20060331   4.8802301   1.889904626   6.7701347 1.327410e-02  9.601770
## 20060428  -1.2667677  -3.126810393  -4.3935781 2.321707e-02  9.831850
## 20060531   3.2954483   1.719109982   5.0145583 6.174438e-03  8.873465
## 20060630  -3.0001241   2.121893309  -0.8782308 1.204514e-03  9.915648
## 20060731  -2.9519613  -2.424451377  -5.3764127 1.623641e-02 10.395490
## 20060831  -5.8639552   0.963745152  -4.9002101 1.228122e-03  8.582910
## 20060929   0.7407814   0.885725566   1.6265070 2.870203e-04 10.038357
## 20061031   2.3024105  -1.300622627   1.0017878 3.010293e-04  9.440758
## 20061130   1.0249685   1.466223683   2.4911922 9.155474e-03  8.116759
## 20061229   2.5375217   0.005573975   2.5430956 7.218869e-03  8.558668
## 20070131  -0.6019387   3.151324637   2.5493860 1.399973e-04  8.643099
## 20070228   2.4756421   4.120796129   6.5964382 6.368336e-03  8.774652
## 20070330   2.8922754   2.497992920   5.3902683 2.757660e-03  8.303947
## 20070430   1.0951503   5.437779536   6.5329298 2.968128e-03  9.952062
## 20070531   2.7275424   1.457198392   4.1847408 7.304716e-04  8.325163
## 20070629   2.7571032   5.163734031   7.9208372 2.878221e-02 10.235410
## 20070731  -0.8270457  -5.128787655  -5.9558334 2.054421e-03 10.876710
## 20070831   9.9080912   8.096688020  18.0047792 4.729174e-02  9.946572
## 20070928  12.2245405   7.519328688  19.7438692 6.453732e-02 12.212728
## 20071031   0.8746712   5.989696081   6.8643673 1.260344e-02 11.005084
## 20071130   3.0809504   3.500391305   6.5813417 2.679592e-02 11.351225
## 20071231  -3.4125767 -16.088007679 -19.5005843 8.960132e-02 13.364583
## 20080131  14.4102077   6.581377521  20.9915852 6.625561e-02 12.387180
## 20080229  -1.4101853   6.552416407   5.1422311 3.064980e-04 11.772409
## 20080331   5.1874578   3.972513329   9.1599711 4.541956e-03 12.856980
## 20080430   8.4525058  10.209135146  18.6616410 2.898336e-02 11.754873
## 20080530  19.3902135  18.623526192  38.0137397 1.985966e-01 12.599652
## 20080630 -22.5302026  -6.662501689 -29.1927043 7.068278e-02 15.545567
## 20080731  -4.4282657  -3.324941758  -7.7532075 3.306981e-02 12.125835
## 20080829 -11.7722720   0.263123137 -11.5091489 2.532377e-03 14.980266
## 20080930   6.4162058  11.366816239  17.7830220 7.446795e-02 16.508013
## 20081031   6.5469029  10.902521769  17.4494247 6.309035e-02 16.765770
## 20081128  -4.2175975  -5.877812827 -10.0954103 2.277161e-02 16.406279
## 20081231  -6.7720346   7.445946414   0.6739118 5.081219e-04 15.932738
## 20090130   0.4117665  14.864446576  15.2762131 4.632900e-02 13.488844
## 20090227  -4.3974121  -4.918795213  -9.3162073 2.335671e-02 13.990093
## 20090331 -14.4797172 -32.275294291 -46.7550115 2.234701e-01 21.347236
## 20090430  -4.7057675  -9.717338123 -14.4231056 6.171574e-02 15.892092
## 20090529   4.6972006   2.818316135   7.5155168 2.968280e-02 11.827252
## 20090630  -0.1121372  -1.993525191  -2.1056624 8.520777e-03 12.328718
## 20090731   3.1016299  -4.183537480  -1.0819076 3.564415e-03 12.362953
## 20090831   0.1524758  -2.029120718  -1.8766449 2.133177e-03 10.820495
## 20090930  -2.7514725   0.686879790  -2.0645927 2.335912e-05  9.813227
## 20091030   0.7103931  -0.428367589   0.2820255 1.162558e-04 10.777253
## 20091130   1.9437377   2.305396547   4.2491342 1.612888e-02  9.549518
## 20091231  -0.7367724  -1.217666948  -1.9544394 9.613093e-06  9.421133
## 20100129   0.1045668   1.952638201   2.0572050 6.257510e-03  9.237443
## 20100226   4.3245460   2.726498883   7.0510448 1.817045e-02 10.484150
## 20100331   2.7296818   1.662756620   4.3924384 2.719764e-03 10.063540
## 20100430  -3.8015262  -0.452613701  -4.2541399 6.562637e-03  8.420164
## 20100528  -7.2575387  -0.911258113  -8.1687968 3.653815e-02  9.435504
## 20100630   1.6027876   2.679733175   4.2825207 6.390563e-03  9.809369
## 20100730  -2.7221437   1.621680830  -1.1004628 1.388381e-03 10.516280
## 20100831  -0.6089243   0.432357443  -0.1765669 1.076919e-06  9.545242
## 20100930   0.9420235   0.129855493   1.0718790 3.161409e-03  8.932311
## 20101029   0.8097786   4.326887549   5.1366661 1.251836e-02  9.596541
## 20101130  -2.8945491  -1.777852727  -4.6724018 6.277160e-03  9.693821
## 20101231  -0.2060415   2.475000124   2.2689586 1.200716e-03  9.093803
## 20110131   6.6128069   0.591296109   7.2041030 1.262807e-02  9.385216
## 20110228   3.7905835   6.240848831  10.0314323 1.710657e-02  9.171467
## 20110331  -2.3158950   3.987703908   1.6718089 3.226784e-04  9.182035
## 20110429  -0.9262357   0.115318542  -0.8109172 2.168133e-06  8.487364
## 20110531   1.5646763  -0.537999759   1.0266766 1.002818e-02  7.642411
## 20110630   0.9411761  -0.626953795   0.3142223 9.817661e-05  9.037727
## 20110729  -2.9199226   3.488792014   0.5688694 4.493216e-07 10.814098
## 20110831  -3.1865255   3.563159650   0.3766341 5.535636e-03 10.539890
## 20110930  -7.5664193 -12.528642700 -20.0950620 1.259601e-01 12.653926
## 20111031   1.9263643  10.756965392  12.6833297 4.244968e-02 10.356395
## 20111130   0.8999014   5.184810862   6.0847123 1.128724e-02  8.384039
## 20111230  -6.7221313  -9.691540645 -16.4136719 1.240757e-01 11.469275
## 20120131   0.2687574  -1.479615073  -1.2108576 2.920551e-03  9.030588
## 20120229   2.4354056   7.816017713  10.2514233 3.065025e-02  8.175265
## 20120330   3.6834006   0.680030588   4.3634312 2.224765e-02  8.490556
## 20120430   4.1638102  12.253976739  16.4177870 1.114420e-01  9.606406
## 20120531  -0.4048295   3.228558339   2.8237288 7.092628e-04  8.509756
## 20120629   1.6456021  -0.482056819   1.1635453 5.121104e-03 10.150198
## 20120731  -1.7659512  -0.316094792  -2.0820460 9.635983e-03  9.902787
## 20120831   0.6468250  -4.337629552  -3.6908045 6.746121e-03  7.296174
## 20120928   2.5129265  -2.047623802   0.4653027 1.488669e-04  9.060010
## 20121031   0.2621510   1.669420998   1.9315720 2.253398e-03  8.367610
## 20121130   2.7323326  -0.931846104   1.8004865 3.679373e-03  6.968976
## 20121231   2.6522951  -1.142367414   1.5099276 5.552424e-03  8.066738
## 20130131  -1.1875919   3.770420963   2.5828291 1.182282e-02  8.174054
## 20130228   4.0820997   0.755573156   4.8376729 1.802635e-02  7.285747
## 20130329   0.4781656   5.345692993   5.8238586 8.559810e-04  8.581097
## 20130430  -0.7071254   4.850713604   4.1435882 5.453313e-04 10.534415
## 20130531   0.2197082   6.878535056   7.0982432 1.310865e-03  8.509980
## 20130628   4.2246477   3.844475843   8.0691236 5.207451e-03  9.641167
## 20130731   0.1271500  -3.061753216  -2.9346032 3.533501e-04  8.744177
## 20130830   1.5263991   7.970015378   9.4964145 1.807064e-02  9.166737
## 20130930   2.4794686   1.106906480   3.5863751 3.344829e-04 10.090434
## 20131031   2.5477359   8.175574683  10.7233105 3.481279e-02  9.558321
## 20131129  -0.6981850   3.909423640   3.2112387 6.510337e-03  9.428769
## 20131231  -0.1352843  -5.486022703  -5.6213070 1.089779e-02 12.359731
## 20140131   2.6096398  -0.342498973   2.2671408 4.779181e-03  9.792065
## 20140228  -4.0514968   0.211724646  -3.8397721 1.127459e-02  8.945963
## 20140331  -0.5357733  -4.585307983  -5.1210813 2.257639e-02  9.503867
## 20140430   2.1111778   0.800018464   2.9111963 2.909139e-03  8.540904
## 20140530  -1.4664814  -3.362621251  -4.8291026 1.476260e-02  9.937716
## 20140630   1.3885452  -6.075509073  -4.6869639 2.689076e-04  9.304842
## 20140731   1.2757093  -1.584550258  -0.3088410 9.702190e-05  8.754481
## 20140829   0.9920803  -0.051360438   0.9407199 1.415707e-03  9.050361
## 20140930  -1.0384572  -1.755297434  -2.7937546 2.408312e-03 11.661520
## 20141031   5.7042581   7.826509440  13.5307676 4.991654e-02  9.954761
## 20141128   1.3398425   7.233493338   8.5733358 2.048029e-02 10.208432
## 20141231   2.8913374   3.170057740   6.0613951 2.774182e-02 10.458340
## 20150130  -6.3852028  -5.201137102 -11.5863399 2.696450e-02 10.245243
## 20150227   1.9768171  11.943729396  13.9205465 2.212988e-02  9.219389
## 20150331  -2.3002665 -22.693066046 -24.9933326 1.522786e-01 10.921062
## 20150430   7.2972803  13.304049165  20.6013294 7.528878e-02 10.049901
## 20150529   3.1343957   6.583574212   9.7179699 4.454163e-02  8.842418
## 20150630   6.0152399  21.452527715  27.4677676 1.538402e-01 10.712322
## 20150731  -2.0159471  -0.652881651  -2.6688288 2.088631e-03  9.964481
## 20150831   3.6665013  15.511893059  19.1783944 1.135260e-01 10.511436
## 20150930  -0.2650218  -4.227637311  -4.4926591 1.391680e-02 12.880863

Gradient Boosted Machine Results (012.2)

In this post, we look at out-of-sample performance for some gradient boosted machine (GBM) models. The purpose here is not an exhaustive analysis. We are only hoping to get an indication that our efforts might bear fruit.
Each model we build uses predictor (x) data from one month (e.g. Dec 2002) and response (y) data from the subsequent month (e.g. Jan 2003). We refer to a model here by xmonth/ymonth (e.g, Dec02/Jan03) to indicate the months of the x and y data. The y variable is a company’s excess return defined as the return of company for the month less the average return of all companies for that month.
The first model is Dec02/Jan03. The first set of predictions was for Jan 2004 created by averaging the 12 predicted returns for each company by feeding in the Dec03 x data into the Dec02/Jan03 through the Nov03/Dec03 models. The last prediction is for Oct 2015. We have 142 months of predicted return, a bit shy of 12 years.
For each month with a predicted return, we calculate three values: LongEx, ShortEx and Hedge. Respectively, these roughly represent buying the 50 stocks with the top predicted returns, shorting the bottom 50, and doing both which can be thought of as long, short and hedged portfolios. However, since our y variable is an excess return over the average stock, the values for LongEx is the return over the average. The values for ShortEx are the returns under the index. Hedge represents the long minus the short. In all cases, positive values are desirable.
Below is a summary of the 142 observations for each.
##      LongEx            ShortEX            Hedge        
##  Min.   :-21.8452   Min.   :-24.769   Min.   :-33.005  
##  1st Qu.: -2.8444   1st Qu.: -1.654   1st Qu.: -2.607  
##  Median :  0.2659   Median :  1.209   Median :  1.648  
##  Mean   :  0.2940   Mean   :  1.421   Mean   :  1.715  
##  3rd Qu.:  2.8928   3rd Qu.:  5.286   3rd Qu.:  6.773  
##  Max.   : 22.8778   Max.   : 23.424   Max.   : 46.302
The means and medians are all positive, but there is substantial variation. Below we look to see if the means are significantly different from zero. The standard error is the standard deviation divided by the square root of 142. That is that standard deviation of the sample mean used to determine the signicance of a sample mean. The z score is the mean divided by the standard error.
##           LongEx   ShortEX      Hedge
## Mean   0.2939676 1.4212453  1.7152129
## SD     5.4530391 6.9189390 10.2775924
## StdErr 0.4576089 0.5806245  0.8624764
## z      0.6423993 2.4477875  1.9887071
It appears that the model is better at finding short candidates than long. For fun, the average annual return for the hedged portfolio (without transaction costs or taxes) would have been 15.17% which sounds nice but it had an annualized standard deviation of 35.6 which is much higher than a stock index.
The data in graphical and tabular forms follows.
##               LongEx      ShortEX        Hedge           R2      RMSE
## 20040102   4.4804562   1.57539047   6.05584666 2.051306e-02 11.531041
## 20040130  -3.1004470  -1.65188009  -4.75232710 1.750013e-02  9.979721
## 20040227   0.1136727   0.19691053   0.31058323 8.836536e-04 10.200688
## 20040402  -6.8675942  -1.37402545  -8.24161965 4.501892e-02 11.933072
## 20040430  -6.5257645  -6.57552158 -13.10128604 1.370959e-02 11.772875
## 20040604   5.8045868   1.02379413   6.82838097 1.164503e-02  9.544704
## 20040702   1.1833060   7.79095590   8.97426189 4.452512e-03 10.566717
## 20040730  -0.8397685   1.87906727   1.03929881 1.290337e-02 10.118168
## 20040903   5.6043349  -7.24342919  -1.63909426 1.372271e-03  9.702592
## 20040930  -2.2496437   1.34523990  -0.90440375 1.323665e-02 10.682788
## 20041029  -4.8729916  -2.53875630  -7.41174787 1.153225e-01  4.572047
## 20041203  -4.9016458  -3.00009469  -7.90174051 4.306946e-03 10.093777
## 20041231   5.1859436   9.48016279  14.66610640 6.338233e-02  9.439808
## 20050131   8.3554239   3.17228764  11.52771154 3.560239e-02  9.888518
## 20050228   1.1398614  -1.65429137  -0.51443001 6.942819e-04  9.512741
## 20050331  -2.9022806   5.51431793   2.61203732 1.557547e-02  9.531778
## 20050429  -2.9317298  -6.98733369  -9.91906352 2.923010e-02 11.295694
## 20050531   6.6079376  -3.23712871   3.37080888 5.108261e-03  9.130988
## 20050630   4.0869711  -3.54998242   0.53698869 4.182929e-03  9.765563
## 20050729   4.4467331  -2.81801575   1.62871731 3.740829e-03  9.869402
## 20050831   5.9119463  -1.99713549   3.91481085 2.911387e-02 10.014786
## 20050930  -4.3744453   2.80701964  -1.56742571 1.410876e-02  9.707607
## 20051031   0.4936483  -1.91087025  -1.41722193 3.964903e-06  9.534197
## 20051130   1.3152118   3.31415404   4.62936585 4.459394e-03  8.197680
## 20051230  10.4466053  -1.86030232   8.58630295 2.581204e-02 11.349427
## 20060131 -10.7935793   1.08769631  -9.70588296 3.504893e-02  9.868935
## 20060228   0.6297246   0.61394818   1.24367280 7.498170e-03  9.779378
## 20060331  -0.6400524   3.37151326   2.73146090 6.756148e-03  9.633340
## 20060428  -3.6849981  -0.41999246  -4.10499055 2.803800e-02  9.848106
## 20060531  -1.2282337   4.76475783   3.53652412 2.992966e-03  8.893213
## 20060630   2.0611674   4.96603277   7.02720019 9.006035e-06  9.866460
## 20060731  -3.7489384   0.42759653  -3.32134190 8.361648e-03 10.332663
## 20060831  -3.9105259   2.95843656  -0.95208939 5.893924e-05  8.543617
## 20060929   1.3318449  -2.95841884  -1.62657396 4.971125e-03 10.070357
## 20061031   0.6743648  -4.44497331  -3.77060848 9.707332e-04  9.469400
## 20061130  -0.2912007   6.82119372   6.52999305 2.123290e-02  8.075700
## 20061229  -1.1050421   6.72039079   5.61534868 1.220891e-02  8.535747
## 20070131   0.2813190   1.38671605   1.66803509 3.801413e-04  8.637763
## 20070228  -1.4166892   5.34846860   3.93177939 1.002809e-02  8.755420
## 20070330  -1.8134228  -1.61399920  -3.42742196 1.526751e-07  8.352992
## 20070430   2.1966099   5.09304581   7.28965569 3.527596e-03  9.947826
## 20070531  -0.3484850   3.45312623   3.10464119 5.236355e-03  8.290130
## 20070629  -0.6902493   5.65858171   4.96833240 8.614327e-03 10.288178
## 20070731  -1.7350226   3.60422634   1.86920371 2.552182e-05 10.840216
## 20070831   6.6274811   0.70090681   7.32838792 1.817729e-02 10.015743
## 20070928  12.1550218   8.67353742  20.82855921 6.751654e-02 12.192783
## 20071031  -1.5577721   8.04080024   6.48302816 9.717738e-03 11.030412
## 20071130   1.0707887  -0.47628167   0.59450699 1.059295e-02 11.444802
## 20071231  -4.1298808 -16.11316912 -20.24304989 8.802494e-02 13.334368
## 20080131   7.1115468  10.59594501  17.70749184 5.729564e-02 12.426358
## 20080229  -2.7388912  12.99850664  10.25961545 2.594665e-03 11.719874
## 20080331   4.0338069   4.54339825   8.57720515 7.415183e-03 12.818374
## 20080430   7.7046779  12.86765563  20.57233350 4.926652e-02 11.665036
## 20080530  22.8777690  23.42405619  46.30182515 2.185188e-01 12.460203
## 20080630 -21.8452053  -7.14260982 -28.98781510 9.976686e-02 15.854723
## 20080731  -6.3124261  -9.83763975 -16.15006584 4.455588e-02 12.160854
## 20080829 -11.1003861  -5.71285630 -16.81324240 2.105534e-02 15.094456
## 20080930   6.3087282   6.35733551  12.66606375 4.373598e-02 16.619119
## 20081031   6.9581293  11.99049505  18.94862438 6.495681e-02 16.787016
## 20081128  -0.9585375 -12.60407976 -13.56261729 3.513564e-02 16.485555
## 20081231  -8.2017433   4.50898824  -3.69275507 1.318709e-03 15.939062
## 20090130   0.4607981  15.80131193  16.26211005 7.545809e-02 13.364931
## 20090227   0.2504831  -3.98860509  -3.73812198 9.366807e-03 13.900018
## 20090331  -8.2352165 -24.76935176 -33.00456824 1.654495e-01 21.181400
## 20090430  -6.6755156 -12.97698002 -19.65249560 2.408758e-02 15.729201
## 20090529  -4.9018633  -0.85202016  -5.75388345 1.215713e-03 12.016162
## 20090630  -0.6200505  -3.92020524  -4.54025572 1.408301e-04 12.230451
## 20090731   8.8289690  -4.88941648   3.93955256 7.934652e-04 12.265497
## 20090831   6.8330282  -1.66075608   5.17227210 1.014569e-02 10.668416
## 20090930  -6.5173888   5.54391933  -0.97346943 8.227045e-04  9.930213
## 20091030  -3.3226677  -1.15454973  -4.47721747 2.390326e-03 10.923646
## 20091130   5.9842343   0.28345390   6.26768818 2.099230e-02  9.577300
## 20091231   0.4246416  -0.76824734  -0.34360573 1.474345e-05  9.493154
## 20100129   1.9497804   1.34379621   3.29357661 5.861863e-03  9.249832
## 20100226   5.8241792   0.78403315   6.60821237 2.003099e-02 10.470405
## 20100331   2.1724846  -0.01338648   2.15909817 9.830716e-03  9.978537
## 20100430  -4.0222670  -0.75430582  -4.77657279 2.159198e-02  8.488533
## 20100528 -10.3988513  -1.37558122 -11.77443255 6.669126e-02  9.593447
## 20100630   7.3897421   1.59886651   8.98860858 7.464327e-03  9.796380
## 20100730  -3.1515735   0.39769321  -2.75388026 7.631506e-03 10.614657
## 20100831   3.5541220  -4.68430002  -1.13017802 3.949245e-04  9.528127
## 20100930   2.5430752   2.28489812   4.82797331 2.966763e-03  8.926510
## 20101029  -0.3747559   2.47588034   2.10112448 5.030222e-03  9.617739
## 20101130   7.1592502  -0.93817542   6.22107477 6.459225e-03  9.558991
## 20101231   0.8093349   6.57756802   7.38690288 4.947966e-03  9.058666
## 20110131   5.4958556   2.83249943   8.32835507 2.547845e-02  9.326603
## 20110228   3.6982796   5.32997404   9.02825366 1.427021e-02  9.161439
## 20110331  -1.5056150   1.54687332   0.04125832 1.506031e-03  9.241698
## 20110429  -0.2284042   1.60536486   1.37696065 5.219065e-05  8.505485
## 20110531  -0.4956597   1.66295489   1.16729518 2.456909e-03  7.685961
## 20110630  -1.4866051   2.38164003   0.89503493 9.434753e-05  9.038818
## 20110729  -3.3768675   7.26962723   3.89275977 3.528697e-05 10.830363
## 20110831  -1.8263393  10.12701630   8.30067698 7.419489e-03 10.525453
## 20110930  -4.8773408  -8.44679081 -13.32413156 1.076053e-01 12.785670
## 20111031   6.0658434   9.99291772  16.05876116 4.857237e-02 10.307887
## 20111130   0.4086228   3.18809485   3.59671767 8.388918e-03  8.401387
## 20111230  -5.2805620  -7.81151194 -13.09207395 9.642607e-02 11.538577
## 20120131   2.6704043   0.09519766   2.76560200 4.044009e-05  9.006189
## 20120229   2.5201274   1.40957042   3.92969781 2.075017e-02  8.198825
## 20120330   0.4768408  -0.56528695  -0.08844614 1.102996e-02  8.532742
## 20120430   4.7082126   7.42593015  12.13414276 7.476680e-02  9.645694
## 20120531   0.2088106   1.03475503   1.24356561 1.411428e-03  8.514347
## 20120629  -0.7890062   5.15371649   4.36471031 1.098969e-02 10.104601
## 20120731   1.3627650  -1.56265771  -0.19989275 2.151484e-03  9.844206
## 20120831   1.2228883  -3.17089554  -1.94800727 6.180135e-03  7.324758
## 20120928   2.2014011  -0.76737442   1.43402664 4.712458e-04  9.054782
## 20121031   0.7662031   5.48627504   6.25247810 1.083755e-02  8.325804
## 20121130  -0.5426604  -0.58835749  -1.13101786 4.419362e-04  6.920809
## 20121231   0.1320228   1.56948250   1.70150529 3.832838e-03  8.079075
## 20130131   4.5509483   8.34282238  12.89377071 2.345268e-02  8.133354
## 20130228   0.2227061  -0.29514265  -0.07243656 1.481332e-02  7.293513
## 20130329   2.0880405   1.85345985   3.94150039 2.764153e-03  8.570177
## 20130430   2.8641943   3.14529514   6.00948944 2.053273e-03 10.492903
## 20130531  -2.2466517   9.49288056   7.24622887 5.608706e-03  8.488379
## 20130628   0.7401412  -1.51166180  -0.77152060 3.211673e-03  9.660531
## 20130731  -1.2150014  -3.14794863  -4.36295004 4.672797e-04  8.802981
## 20130830   1.6582574  10.33943056  11.99768795 3.862723e-02  9.100338
## 20130930  -2.8375587   8.21986907   5.38231039 8.821435e-05 10.140551
## 20131031   2.8991279   7.11798627  10.01711417 2.395812e-02  9.572929
## 20131129  -0.2203424   1.00449063   0.78414821 6.256081e-03  9.434207
## 20131231   8.7929624  -8.51994247   0.27301988 9.756253e-05 12.261133
## 20140131   2.8486403   0.51828385   3.36692411 1.009918e-02  9.763306
## 20140228  -6.9509677   0.55038918  -6.40057848 5.333648e-02  9.201548
## 20140331  -6.1162032   0.65732288  -5.45888033 3.506616e-02  9.591483
## 20140430  -2.8467217   3.68465637   0.83793471 2.831008e-05  8.601572
## 20140530   6.0681905  -7.80803288  -1.73984236 1.610949e-03  9.892745
## 20140630  -2.7482977   1.33120773  -1.41709000 4.559307e-04  9.343817
## 20140731   0.9038226  -3.07087883  -2.16705620 1.215868e-04  8.763249
## 20140829  -3.6656411  -1.09496768  -4.76060881 2.013618e-03  9.127370
## 20140930  -2.3057577   3.64365706   1.33789939 1.021884e-03 11.660196
## 20141031   2.2538410   4.60841098   6.86225196 2.276797e-02  9.990055
## 20141128   2.8736441   5.72015638   8.59380050 1.959293e-02 10.201338
## 20141231   4.8432917   6.30486373  11.14815546 2.524939e-02 10.461886
## 20150130  -0.4686961  -6.90220626  -7.37090240 1.170588e-02 10.204333
## 20150227   1.8615128  12.37942370  14.24093650 3.272233e-02  9.176222
## 20150331  -8.1689649 -24.59796908 -32.76693398 1.988255e-01 11.102598
## 20150430  15.5061029  13.35092502  28.85702793 1.135935e-01  9.943807
## 20150529   2.3726033   6.62318450   8.99578775 4.649454e-02  8.805501
## 20150630   6.2049437  22.99736412  29.20230784 1.456164e-01 10.637548
## 20150731  -8.0587545   0.81176157  -7.24699292 1.179361e-02 10.206703
## 20150831  -9.6637881  17.69056709   8.02677896 5.339833e-02 10.642361
## 20150930  -7.1714385  -2.52125216  -9.69269066 7.668640e-03 12.903508