Tuesday, December 15, 2015

Predicting Stock Market Returns - Part III (Occam's Razor)

This is a followup post to the previous “Predicting Stock Market Returns - Part II”. In that post I remarked that it seemed odd that the closing price would be an important variable. I did not think it should be included because I could not imagine how the price which would essentially rise through time would help determine a return in two days. In fact the variable importance indicated it was the most important variable. In this post I remove that variable and a few others and compare the results of the models. There is little change in the effectiveness of the simpler (fewer variables) model on the out of sample results.
The simpler model (rf2) excludes the closing price and the event variables related to the 10, 20, 50, and 200 day SMA. It also removes the events related to the SMA50 - SMA200 values. In all, the 33 variables are reduced to 22.

RF1 (33 Variables)

## Loading required package: randomForest
## Warning: package 'randomForest' was built under R version 3.1.3
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
##             Min. 1st Qu.   Median     Mean 3rd Qu.   Max. % Neg
## Pred Neg  -8.807 -0.6364 -0.04975 -0.10640  0.4488  5.417  52.5
## Pred Pos -20.470 -0.3246  0.13530  0.15580  0.6528 11.580  41.5
## All      -20.470 -0.4579  0.05660  0.03752  0.5656 11.580  46.5
## [1] "Buy and Hold Growth of $1: $12.74"
## [1] "    Strategy Growth of $1: $830.68"

RF2 (22 Variables)

##             Min. 1st Qu.   Median     Mean 3rd Qu.   Max. % Neg
## Pred Neg  -8.807 -0.6386 -0.04952 -0.10090  0.4479  5.417  52.5
## Pred Pos -20.470 -0.3069  0.15540  0.16290  0.6574 11.580  41.0
## All      -20.470 -0.4579  0.05660  0.03752  0.5656 11.580  46.5
## [1] "Buy and Hold Growth of $1: $12.74"
## [1] "    Strategy Growth of $1: $836.85"
As can be seen, there is little difference between the performance of the two models on the test data. It’s interesting that “the most important” variable can be eliminated without effect. It should be noted that other variable such as the SMAs are highly correlated. In aggregate these are probably more important but their individual importance is diminished in the presence of others.

No comments:

Post a Comment