Rex Macey
November 17, 2015
Predictors that have zero or near zero variance (nzv) can cause models to crash or the fit to be unstable. In this study we identify numeric variables with nzv. In our case, this presents an issue. We have a model each month. Conceivably, a variable might have nzv in one month but not another. One solution is to remove a variable from all months. If we only remove nzv from specific months we must take care to track which set of variables are used each month. To predict Y with a model, the model must be called with the same set of variables that created it.
The following list shows the percentage of months in which a variable is considered to have nzv. One of the interesting ones is Yield. This is likely because so many companies pay no dividend (near 50% of our 3000 stock universe).
nzv_freq (% of months)
RPAYOUT_12 RPAYOUT_Y1 RPAYOUT_Y4 RPAYOUT_Y5 RPAYOUT_Y3
100.0 100.0 100.0 100.0 98.7
RPAYOUT_Y2 YIELD RDPS_G1F RDPS_G1T IW_SGB
98.1 95.5 74.7 40.9 40.3
EPSDMP_EY2 INS_PR_SHR EPSDMP_EQ1 RDM_12M TL_TA_Q1_Y1
35.1 24.0 22.1 18.2 18.2
LTD_TC_Q1_Y1 RDPS_G3F EPSDMP_EQ0 EPSDMP_EY0 GPM_12M_Y1
16.9 16.2 13.0 11.7 11.7
EPSDMP_EY1 EPSDM_EY2 RDPS_G5F CURR_Q1_Y1 EPSUM_EG5
9.7 3.2 3.2 2.6 2.6
EPSUM_EY2 LTD_EQ_Q1_Y1 RYIELD_1T EPSDM_EG5 QUICK_Q1_Y1
2.6 2.6 2.6 1.3 1.3
PAYOUT_12M_Y1 RYIELD
0.6 0.6
At this stage, I plan to exclude variables with nzv month by month. This means the variables in a model can be dynamic. While this adds to the complexity of the programming, it allows the flexibility of adding variables in the future.
nzv_freq (% of months)
RPAYOUT_12 RPAYOUT_Y1 RPAYOUT_Y4 RPAYOUT_Y5 RPAYOUT_Y3
100.0 100.0 100.0 100.0 98.7
RPAYOUT_Y2 YIELD RDPS_G1F RDPS_G1T IW_SGB
98.1 95.5 74.7 40.9 40.3
EPSDMP_EY2 INS_PR_SHR EPSDMP_EQ1 RDM_12M TL_TA_Q1_Y1
35.1 24.0 22.1 18.2 18.2
LTD_TC_Q1_Y1 RDPS_G3F EPSDMP_EQ0 EPSDMP_EY0 GPM_12M_Y1
16.9 16.2 13.0 11.7 11.7
EPSDMP_EY1 EPSDM_EY2 RDPS_G5F CURR_Q1_Y1 EPSUM_EG5
9.7 3.2 3.2 2.6 2.6
EPSUM_EY2 LTD_EQ_Q1_Y1 RYIELD_1T EPSDM_EG5 QUICK_Q1_Y1
2.6 2.6 2.6 1.3 1.3
PAYOUT_12M_Y1 RYIELD
0.6 0.6
No comments:
Post a Comment