Skip to content

Multiple supervised machine learning classifiers are used and tested to enhance trading signals' accuracy and trading bot's ability to adapt to new data.

License

Notifications You must be signed in to change notification settings

YanjunLin-Andrie/machine_learning_trading_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

header

Machine Learning Trading Bot

Multiple supervised machine learning classifiers are used and tested; input features and parameters were adjusted in the analysis to enhance trading signals' accuracy and trading bot's ability to adapt to new data.

Analysis includes:


Technologies

This project leverages python 3.7 with the following packages:

  • pandas - To read, calculate, analysis, visualize data

  • pathlib - For providing paths of files and directories

  • matplotlib - Creating data visualization

  • numpy - Performing mathematical calculation

  • sklearn - Predictive data analysis


Installation Guide

Before running the Jupyter notebook file, first, install the following dependencies in Terminal or Bash under the dev environment.

  pip install pandas
  pip install matplotlib
  pip install -U scikit-learn
  pip install pathlib
  pip install numpy

General Information

It is necessary to import all libraries and dependencies. first

Establish a Baseline Performance

-- After importing the original dataframe, calculate 'Actual Returns' based on closing price.

signals_df["Actual Returns"] = signals_df["close"].pct_change()

-- Generate trading signals using short- and long-window SMA values.

second

signals_df['Signal'] = 0.0
signals_df.loc[(signals_df['Actual Returns'] >= 0), 'Signal'] = 1
signals_df.loc[(signals_df['Actual Returns'] < 0), 'Signal'] = -1

-- Calculate the strategy returns and plot the original strategy returns

signals_df['Strategy Returns'] = signals_df['Actual Returns'] * signals_df['Signal'].shift()

third

-- Feature data

X = signals_df[['SMA_Fast', 'SMA_Slow']].shift().dropna()
y = signals_df['Signal']

-- Split data into training and testing datasets by dates

training_begin = X.index.min()
training_end = X.index.min() + DateOffset(months=3)

X_train = X.loc[training_begin:training_end]
y_train = y.loc[training_begin:training_end]

X_test = X.loc[training_end+DateOffset(hours=1):]
y_test = y.loc[training_end+DateOffset(hours=1):]

-- Scale featured datasets forth

-- Use the SVC classifier model from SKLearn's support vector machine (SVM) learning method to fit the training data and make predictions based on the testing data

svm_model = svm.SVC()
svm_model = svm_model.fit(X_train_scaled, y_train)
svm_pred = svm_model.predict(X_test_scaled)

-- Generate classification report with the SVC model predictions fifth

-- Create a predictions dataframe

predictions_df = pd.DataFrame(index=X_test.index)
predictions_df['Predicted'] = svm_pred
predictions_df['Actual Returns'] = signals_df['Actual Returns']
predictions_df['Strategy Returns'] = predictions_df['Actual Returns'] * predictions_df['Predicted']

-- Plot the actual returns versus the strategy returns of the SVM model sixth

-- Conclusions about the performance of the baseline trading algorithm:

Based on the testing report and the SVM cumulative return plot, the SVM model performed well from the beginning of the period until mid 2018. That is when the actual and predicted returns start to be slightly differ. And the difference kept growing until the beginning of 2020, then it again drifted apart. The testing report indicates the model predicts well for buy signals with .96 recall rate, while it performed poorly predicting sell signals (.04 recall).

The SVM model made trading decisions that outperformed the actual returns in the scond half of the market data according to the plot. Overall, dispite the volatility, the SVM model's trading strategy produced a higher cumulative return value than the original.

Tune the Baseline Trading Algorithm

-- Tune the training algorithm by adjusting the size of the training dataset seventh

What impact resulted from increasing or decreasing the training window?

By increasing testing dataset to 20 months, the prediction accuracy has improved to 0.57, especially the recall score for buying signal has increased to 1. Also the stragegy returns greatly out performed actual returns from the plot. By decreasing the training dataset to 1 month, the accuracy score has decreased. And the cumulative return brought by two strategies shows greater differences.

-- Tune the trading algorithm by adjusting the SMA input features. eighth

What impact resulted from increasing or decreasing either or both of the SMA windows?

As shown in the table above: when increasing the short window, the accuracy score increased slightly but the cumulative return shows the same. When increase the long window, the accuracy score decreased badly to 0.45 and the cumulative return delivered by bot was greatly under perform comparing to the original trading strategy. And lastly, when change both windows, the accuracy score appeared a subtle increase and the bot is slightly under perform to the original strategy.

Optimize cumulative return on the baseline trading algorithm

-- Choose the set of parameters that best improved the trading algorithm returns

svm_high

It shows that the current values of windows produce the highest cumulative return, yet by increasing the training dataset to 20 months, the machine learning algorith has delivered a cumulative return as high as 1.8, which is higher than modifying other parameters.

Evaluate a new machine learning classifier

Use the original parameters that the starter code provided. But, apply to the performance of a second machine learning model.

-- Use classifier - AdaBoost

from sklearn.ensemble import AdaBoostClassifier
abc = AdaBoostClassifier()
abc_model = abc.fit(X_train_scaled, y_train)
abc_pred = abc_model.predict(X_test_scaled)
abc_testing_report = classification_report(y_test, abc_pred)

ninth abc

-- Use classifier - DecisionTreeClassifier tenth dtc

-- Use classifier - LogisticRegression eleventh lr

The best performing classifier among all three new machine learning classifier is the AdaBoostClassifier.

Did this new model perform better or worse than the provided baseline model?

The AdaBoost model performs slightly better than the SVM model because it delivers a higher cumulative return while the accuracy score remains unimproved from prior.

Did this new model perform better or worse than your tuned trading algorithm?

The AdaBoost model performs slightly worse than the tuned SVM trading algorithm since the later returns a 1.8 cumulative return.

Evaluation Report

In conclusion, 4 machine learning classifiers from SKlearn were used: support vector machine (SVM), AdaBoost, DecisionTreeClassifier, and LogisticRegression to enhance trading signals' accuracy as well as achieve optimal cumulative return.

During the SVM analysis, by adjusting training dataset size and altering long, short-window parameters, the accuracy score was improved by 0.01, cumulative return was elevated to 1.8 as shown below.

twelveth

Then, 3 new models were implemented using the original SVM parameters. The best performing classifier among all 3 were the AdaBoost Classifier. (Left to right: AdaBoost, DecisionTree, LogisticREgression)

thirteenth

The preceeding image indicates, the AdaBoost Classifier performed better than the original SVM given the same parameters. But delivered a slightly worse return than the tuned SVM trading algorithm.


Contributors

UC Berkeley Extension

Brought you by Yanjun Lin Andrie


License

MIT

About

Multiple supervised machine learning classifiers are used and tested to enhance trading signals' accuracy and trading bot's ability to adapt to new data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published