- What to do about missing values (NAs)? I have a lot of them. Currently I replace them with the median values. I briefly experimented with imputation using the MICE package and others. It appears that the computational time is enormous.
- Parallel Processing on a 4 core Windows 7 machine. I haven't gotten this to work. This would help speed up the process.
- Variable Importance and Feature selection. I have a lot of variables. I spent some time thinking about these, but I probably missed some. I'm also thinking that I should use the variable importance reporting from the random forests to eliminate variable. Not sure how to do this.
- I'm using price appreciation, not total return for the Y variable. Each month Stock Investor Pro provides the last 120 monthly prices for a stock along with the dividends for the last 8 fiscal quarters. It might be ok the way it is, but it would be better to have total returns. I'll post more on this.
- RStudio and Github (testing before merging). One collaborator has given code changes. If I understand the process, I need to fork those changes test them and then do something to merge them back.
- Right now I equally weight the forecasts of stocks using the models created using each of the previous 12 months? Should I be using 12? Should I equally weight? If I don't, how should I weight? Let's say X(t) represents features at a point in time. Perhaps I should find the X data previous to t most like X(t). Sounds like a nearest neighbor. While the columns in X will represent the same feature, the rows will vary as companies enter and leave.
Thursday, November 5, 2015
Issues to Ponder
For those who have offered to collaborate/review, thank you. Here are some areas that I have run into questions regarding or tasks to do soon.. I'm probably missing some.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment