Skip to content

Latest commit

 

History

History
51 lines (33 loc) · 2.29 KB

README.md

File metadata and controls

51 lines (33 loc) · 2.29 KB

Predicting Stock Returns Through CNN

This repository explores whether stock returns can be accurately predicted based on CNN image classification.

Code location

All code is located in the /stock_return_cnn folder.

Necessary packages

The following packages are required to run all the code from this repository:

  • pandas
  • os
  • PIL
  • matplotlib
  • random
  • datetime
  • gc
  • pyts (to construct the GADF images)
  • numpy
  • optuna (to perform the hpo)
  • torch (the models are trained using the Pytorch framework)
  • torchvision
  • pickle (to save intermediare results)
  • sklearn (for the benchmark model)

Data

The data used for this project is made available under CC0 and can be found at: https://www.kaggle.com/datasets/borismarjanovic/price-volume-data-for-all-us-stocks-etfs

To ensure that the project runs correctly, download the data from the above link and populate the stock_return_cnn/Data folder so that it contains the ETFs and Stocks folders from the downloaded data.

Sample Data

The notebooks are numbered in the order that they should be executed. Notebooks 0 and 1 have to do with turning the raw data into image files and performing some basic EDA. As this is a step that takes a lot of time (depending on the machine a few hours to a full day), a folder with already processed images is provided.

  • The folder "ModelData", contains the .csv files that outline where the train, test, & validation data can be found as well as the labels.
  • The folder "stock_return_cnn/SampleData.zip", contains all the area chart and GADF image files. If you decide to skip over running notebooks 0 & 1 than you simply have to UNZIP AND THEN RENAME /SampleData to /Charts and then you should be able to run notebooks 2 and beyond without issue.

Hyperparameter Tuning

Notebooks 2_a and 2_b contain the code for the hyperparameter tuning. Again, this code takes a long time to run (On my machine it took around 12 hours). Therefore, these notebooks can be skipped as the result of the hyperparameter tuning is already incorporated into notebooks 3_a and 3_b.

Model Training

Notebooks 3_a and 3_b contain the code to train the final area chart and GADF image models and perform some analysis on the trianing and results.

Benchmark Model Training

Lastly, notebook 4 contains the code to train the benchmark model.