This project is about the idea of using AI to predict stock trends for different time frames. It combines methods to download and explore data from different tickers, to get an idea of how well a stock price is performing in future. For that, it uses multiple ML-models to conduct the predictions.
The program consists of multiple python classes and a Jupter notebook, which is divided into three main sections:
- Prepare Data
(Download and Clean and calculate features)
- Analysis Datasets
(Provides statistical numbers and visualizations to explore the dataset)
(Performs predictions via KI)
The notebook provides widgets, which enables the user to set a list of ticker symbols, statistical values, a range for historical data load and time frames by which the prediction should happen.
During the implementation of this program, I was going to use the yfinance API, which provided daily historical data for a large kind of tickers. Unfortunately, the API is facing some issues since 2nd of July 2021, thus,
I decided to download CSV files from the tickers Nvidia and Daimler AG ranging from 22.02.2010 until the 5th of July 2021.
Once the API is back online, the notebook should make automatically use
As a standard setting, the program provides forecasts for 7, 14 and 30 days.
Based on the given architecture, the classes can be used to build a web or mobile app later.
To grep data, the program is using the “yfinance API”. It queries stock information from yahoo. To start with, the user adds ticker symbols into the form field and specifies the time range for the historical data load.
The API returns a data frame, which consist of the date as the index, “open-price”, “min-price”, “max-price”, “close-price” and the “volume” of traded stocks.
Additionally the user can set up multiple forecast windows and adjust some statistical rolling indicators, which are part of the features columns the KI is learning with.
During the data processing, the program provides in total seven different indicators:
- Rolling Standard deviation
- Simple moving average (two)
- Upper- lower Bollinger band
- Daily returns
- Cumulative returns
- Relative Strength Index (RSI)
However, before I going to calculate the features above, I do some data cleansing tasks to provide a dataset, that is better for analyzation and visualization.
Usually, the API only provides data from trading days. That means, if you would make a visualization of the stock trend, there would be some gaps. Because of that, I am going to pick the first and the last date index of the downloaded dataset and create a new data-frame with the entire date range, hence weekends and public holidays are also included. Next, I join the given dataset with the new data-frame and using the front fill method to provide a continuous data.
Data Exploration & Visualization:
To get some insight of the given data sets, the section “Analysis Datasets” provides global statistical data like mean of daily return, the cumulative return, which describes the total return of a stock since the beginning of the record, standard deviation or the sharp risk ration.
A lower number of the standard deviation indicates if the stock is of less risk of high variability or better say, the volatility of a stock.
The sharp risk ratio indicates if a stock could generate a higher return in comparison to a risk free market investment like putting money to a bank account with some interest. As a conclusion. The higher number of the sharp risk ratio, the better are returns over time.
For each stock, I provide global statistics for a one-year period and for the entire dataset. The one-year observation is actually interesting to see how a stock is performing lately. The total number is more important for the prediction, which I going to explain below in this text.
The program also provides a bunch of visualizations for the data analyzation part. The first visual denotes the trend of the current stock combined with additional indicators.
I used ploty to be able to drill into the figure. A closer look shows how the actual price is moving between the “Bollinger Bands”. A trading strategy says, buy if the stock price is below the lower Bollinger band, and sell if the stock price is above the upper Bollinger band.
(An automatically notification could be actually a nice extension for this program.)
However, we can see that the stock trend seems to be volatile over the years. To get a closer look, I created a histogram of the distributions from the daily returns.
Here, we can see that the stock has mostly a slight increase. However, there are also high number of ups and downs, that explains the volatility of the stock price. So way is this from interest?
My assumption here: The more volatile a stock, the harder to predict.
I also want to show the stock trend of Nvidia. The figure shows that a long period of the dataset looks linear, but at the end, there is a high increase. Since the training data set is mostly with low volatility, it will be interesting to see how the algorithms perform with does data.
The last figure I provide in the analyzation section is a correlation matrix. I going to use this plot to check, whether the features I select, have influence to the stock price or not.
We can denote that the most features have a very strong positive influence into the price. Only RSI and the daily return seems to have no influence at all. I could observe the same with the Nvidia stock.
However, for this project, I decided to keep both features within the dataset, but for the next development iteration, a feature selection based on the correlation values would be an important feature.
Further data preparation:
Before I can start with the prediction of the stock price, the dataset needs to be further prepared.
Currently I have no y_train/ y_test target where a supervised model can be trained on.
In order to implement this, I take the “price” column and shift it by the number of days, I want to forecast.
For instance, if I want to predict the price in five days, I make a copy of the price column and shift it by the number of five like shown in figure below. Afterwards I adapt the index column. Otherwise, I would have a misleading date index.
Now I’m able perform the split into the test and training dataset. In this project, the ratio between training and test 75% to 25%.
It is also important to understand, that for each window prediction, the dataset must be adapted first.
To benchmark the results of every ML algorithm, it is necessary to define some statistical criteria’s. For this project I made my choice for three performance indicators, which I want to describe a bit closer in this section.
Mean square error (MSE):
The first indicator, I want to talk about is the mean square error (MSE). This value describes the variance of the predicted value. For example, if we imagine a two dimensional plane with two dots. First dot is the expected value, and the second dots is the predicted value, the number of the MSE describes how close the predicted dot is ranging around the expected dot. The smaller the value, the better are my predictions.
Root mean square error (RMSE):
The second indicator I take into consideration is the root square error (RMSE). Like the name describes, the RSME is the root of the MSE. It is actually the standard deviation between the predicted and the estimated values. Since we want to know the distance between predicted an actual stock price, the RSME describes it in a one-dimensional manner. Same as MSE, the smaller the better.
Mean absolute error (MAE):
The third indicator is the mean absolute error. It describes the accuracy of a prediction. To use this metrics, the values, which are to compare, must have the same dimension. In my project, this also given. Because of that, I will also consider this value.
Coefficient of determination (R2):
The coefficient of determination (R2) or r-square is a statistical metric which describes the distance between the predicted values and the adjustment line in percent, whereby 0% means the distance from the adjustment line is huge, hence the quality of the model is bad or 100%, that indicates the predicted values are close to the adjustment line. Thus, the model is good. It is one of the most important metrics to say if a ml model is performing well.
Since all of this metrics are usefull the describe the results of my ML model,
I will consider all four of them.
Algorithm Technics and Evaluation:
For the experiment, I decided to use three different machine-learning (ml) algorithms, which all belongs to the class of supervised machine learning algorithms:
- Linear Regression
- Multi-layer Perceptron
The idea is, to create a model of each ml technique and compare the results by the numbers of the:
- coefficient of determination (R2),
- mean square error (MSE)
- mean absolute error (MAE)
In the notebook, I also added the ratio of accuracy, but since there are a lot of rounding errors, the expressiveness of this number is low.
The linear regression model is an approach that tries to explain an observed variable by the use of different independent variables. In other words, it tries to identify the relationships between the
target and the dependent variables, whereby the target is the linear combination of the regression coefficients . As input, I take the entire feature list.
The Multi-layer perceptron (MLP) belongs to the class of the artificial neural network (ANN) which consists at least of three layers of nodes (input layer, hidden layer and output layer). All the given nodes except from the input layer are neurons that uses a nonlinear activation function .
 Source: Lineare Regression, https://de.wikipedia.org/wiki/Lineare_Regression
 Source: Multilayer perceptron, https://en.wikipedia.org/wiki/Multilayer_perceptron
Since neuronal networks are more effective by using normalized values, I perform a normalization of the entire dataset between 0 and 1.
For the MLP model, I set the maximum number of iterations to 1000 and the number of hidden layers to 100.
max_iter = 1000
hls = 100MLP = MLPRegressor(random_state=0, max_iter=max_iter, hidden_layer_sizes=(hls,),
By setting the number of hidden layers to 100 I could gain very good results, but it was also necessary to increase the number of maximum iterations to 1000, otherwise the algorithm runs into an error, because it was is able to train with the entire dataset.
The term LSTM is the abbreviation for “long short-term memory” and belongs to the class of the artificial neural networks as well. The main difference to other neuronal networks is the memory function by using three types of gates: The input gate, remember and forget gate and the output gate .
Like with the MLP model, I also perform a normalization on the dataset first. Next, I set up a model with three input layers using 100 units each and one output layer.
The batch size is set to 4 and the epochs (iterations) is set to 6.
model = Sequential()
# add first layer
model.add(LSTM(units=100, return_sequences=True, input_shape=(x_train_data.shape, 1)))
# add second layer
# add third layer
model.add(Dropout(0.2))model.fit(x_train_data, y_train_data, batch_size=4, epochs=6, verbose=0)
I have chosen the settings for MLP and LSTM based on the results of several test runs, whereby these settings lead to the best performance yet.
 Source: LSTM, https://de.wikipedia.org/wiki/Long_short-term_memory
In the section above, i have already provided the final parameter setting of each ML model. However, in this section I want to show, how difficult it is to set up the correct parameters in respect to the performance. As an example a take the LSTM, since with this algrithmn the parameters seems to have the biggest influence.
Like described above, I created the model with three input layers and one output layer. Since the output layer must be one, because of one target column it is possible to set the numbers per unit of the input layer, the batch size, and the number of epochs.
LSTM R2: 0.8343208888037233
LSTM MSE: 24.18871589943589
LSTM RMSE: 4.918202506956773
LSTM MAE: 3.6720203234698316
Test loss: 0.004514301661401987
Test accuracy: 0.004514301661401987
LSTM R2: 0.8529816884815744
LSTM MSE: 21.464288066592907
LSTM RMSE: 4.632956730490034
LSTM MAE: 3.1555438136686225
Test loss: 0.004005847033113241
Test accuracy: 0.004005847033113241
LSTM R2: 0.9393601858190183
LSTM MSE: 8.853253900430836
LSTM RMSE: 2.975441799200723
LSTM MAE: 2.27944727267998
Test loss: 0.0016522685764357448
Test accuracy: 0.0016522685764357448
LSTM R2: 0.9373253905196233
LSTM MSE: 9.15032867983537
LSTM RMSE: 3.0249510210638735
LSTM MAE: 2.350033736063246
Test loss: 0.0017077106749638915
Test accuracy: 0.0017077106749638915
LSTM R2: 0.9441181986701426
LSTM MSE: 8.158596497510764
LSTM RMSE: 2.856325698779949
LSTM MAE: 2.2533935607262094
Test loss: 0.0015226254472509027
Test accuracy: 0.0015226254472509027
LSTM R2: 0.9518972432497814
LSTM MSE: 7.02287638199774
LSTM RMSE: 2.650071014519751
LSTM MAE: 2.052696560230034
Test loss: 0.001310668420046568
Test accuracy: 0.001310668420046568
LSTM R2: 0.9573160407850155
LSTM MSE: 6.231746147474334
LSTM RMSE: 2.4963465599700565
LSTM MAE: 1.9181539341105456
Test loss: 0.0011630207300186157
Test accuracy: 0.0011630207300186157
LSTM R2: 0.9551101251093278
LSTM MSE: 6.553804053217859
LSTM RMSE: 2.5600398538338927
LSTM MAE: 1.987840666826167
Test loss: 0.0012231270084157586
Test accuracy: 0.0012231270084157586
As can be denoted from the result list, by increasing the number of units and decreasing the number of batches I was able to increase the R2 value from 0.834 to 0.944. But the decrease of the batch size, which is actually the number of tupels per iteration results in much longer training time. Same of the parameter epochs, which indicate the repetition of the train.
Becaues of that, i decided to take units=100, batch=4 and epochs=6 as my final setting, since i could gain the best tradeoff between quality and time.
Benchmark & Results:
Now let’s take a look at the results after running the last section of the notebook. For the first observation, we are looking to the Daimler stock on a seven-day forecast.
7 days out:------------
Linear Regression R2: 0.9599340051810165
Linear Regression MSE: 5.849530208769271
Linear Regression RMSE: 2.418580205155345
Linear Regression MAE: 1.8381742201585387
With the linear regression model (LR) and the MLP model, the coefficient of determination (R2) is at 0.96. The mean square error is at 5.8, the RMSE is 2.4 and the mean absolute error is at 1.8
Muli-layer Perceptron R2: 0.9597540943486109
Muli-layer Perceptron MSE: 5.875796718656164
Muli-layer Perceptron RMSE: 2.4240042736464313
Muli-layer Perceptron MAE: 1.842234816874342
With LSTM, the R2 value is at 0.95. The MSE at 7.02, RMSE at 2.65 and MAE 2.05 which are all pretty good values.
LSTM R2: 0.9518996750746772
LSTM MSE: 7.022521341938096
LSTM RMSE: 2.6500040267777134
LSTM MAE: 2.050466546018151
Test loss: 0.0013106020633131266
Test accuracy: 0.0013106020633131266
The first look on the trend chart accredit the numbers. LR and MLP are close to the actual trend whereby LSTM has more variance.
If I going to drill into the chart, the figure denotes, that LR is following the stock trend but has a right shift on the date column. The same is observed with MLP as well.
With LSTM, there is also a shift on the x-axis but additionally, the predictions are following correctly the trend of the actual price.
Let’s also take a look at the correlation scatter plot of LR and LSTM. The predicted/ actual dots of LR are closer to the line. With LSTM, the arrangement of the dots are looking nearly identical.
However, in both ML algortihms, some dots are scattered at the tails of the line, which indicates a higher variance because of the volatility of the stock.
Lets take a look at the following results with higher number of days to predict.
As expected, with all ML algorithms the accuracy drops by the increasing number of days.
14 days out:------------
Linear Regression R2: 0.9210534027157851
Linear Regression MSE: 11.205277229153344
Linear Regression RMSE: 3.34742845019178
Linear Regression MAE: 2.480697728061077
Accuracy: 0.02131782945736434Muli-layer Perceptron
Muli-layer Perceptron R2: 0.9208279486779142
Muli-layer Perceptron MSE: 11.237277025011299
Muli-layer Perceptron RMSE: 3.3522048005769722
Muli-layer Perceptron MAE: 2.4830014230203714
LSTM R2: 0.8293106867363559
LSTM MSE: 24.22677025948721
LSTM RMSE: 4.9220697129853015
LSTM MAE: 3.9718806714789814
Test loss: 0.004521401599049568
Test accuracy: 0.004521401599049568
However, even with a number of 30 days, the R2 of LR and MLP is still at 0.8, whereby LSTM drops to 0.69 but, the error rate of all three metrics have an quadratically increase.
30 days out:------------
Linear Regression R2: 0.8023495414619239
Linear Regression MSE: 26.06617012652934
Linear Regression RMSE: 5.105503905250621
Linear Regression MAE: 3.840367674747195
Accuracy: 0.014634146341463415Muli-layer Perceptron
Muli-layer Perceptron R2: 0.8056262562367411
Muli-layer Perceptron MSE: 25.634036523560532
Muli-layer Perceptron RMSE: 5.06300666833064
Muli-layer Perceptron MAE: 3.8282281762928236
LSTM R2: 0.794942949055266
LSTM MSE: 27.042952569422663
LSTM RMSE: 5.200283893156475
LSTM MAE: 3.942713516533084
Test loss: 0.005046984646469355
Test accuracy: 0.005046984646469355
Now, I want to take a quick look of the results of the Nvidia prediction. From our previous observation of the trend data, we know, that Nvidia is not that volatile than the Daimler stock.
7 days out:
Linear Regression R2: 0.9840980476812651
Linear Regression MSE: 477.8649860102321
Linear Regression RMSE: 21.860123192933568
Linear Regression MAE: 15.751104514724204
Muli-layer Perceptron R2: 0.982952112618063
Muli-layer Perceptron MSE: 512.3011503232476
Muli-layer Perceptron RMSE: 22.634070564599014
Muli-layer Perceptron MAE: 16.427553295187025
LSTM R2: 0.9526400502028002
LSTM MSE: 1423.200201689741
LSTM RMSE: 37.72532573338156
LSTM MAE: 28.309211559516577
Test loss: 0.0021751392632722855
Test accuracy: 0.0021751392632722855
LSTM R2: 0.7715236662166166
LSTM MSE: 6865.876457096025
LSTM RMSE: 82.86058446026087
LSTM MAE: 59.134434449773956
Test loss: 0.010493420995771885
Test accuracy: 0.010493420995771885
It seems like that the lower volatility is also reflected by the numbers of R2, which are 0.98 in LR and MLP. Even with LSTM the R2 is at 0.92 for a seven days prediction. With the forecast up to 30 days, there is a decrease of ~0.06 for all ML model. To most difference compared to the first observation with Daimler is, that in all results, MSE, RMSE and MAE are much higher.
With LSTM, i played a bit more. For me, it was an astonishing, it seems like that LSTM somehow performs diffenent with the given parameters when it comes to the volatility of a stock. As we can so from the observation of seven days or 30 days, the results with a batch size of 100 and 10 epochs is much better than with a batch size of and 6 epochs.
30 days out:
Linear Regression R2: 0.9260988131636365
Linear Regression MSE: 1967.4077523867877
Linear Regression RMSE: 44.355470377246455
Linear Regression MAE: 32.34013531018811
Muli-layer Perceptron R2: 0.9314364892804224
Muli-layer Perceptron MSE: 1825.3073907897913
Muli-layer Perceptron RMSE: 42.72361631217319
Muli-layer Perceptron MAE: 31.672436833157796
LSTM R2: 0.8819063513993067
LSTM MSE: 3143.9056625586086
LSTM RMSE: 56.070541842919695
LSTM MAE: 41.608872088176454
Test loss: 0.006525109056383371
Test accuracy: 0.006525109056383371
LSTM R2: 0.6263458145324079
LSTM MSE: 9947.47408899509
LSTM RMSE: 99.73702466484094
LSTM MAE: 72.85733414812786
Test loss: 0.020645776763558388
Test accuracy: 0.020645776763558388
By taking a look onto the next two charts, we can see a high increase of the stock price within the test dataset.
This development could explain why the errors rates have increased that much. But for now, it doesn’r really explain, why LSTM has such different performance with respect to the parameter settings.
All three models seem to perform very effective when it comes to a stock prediction. With the given Daimler stock, all algorithms are facing a very good R2 ratio of ~95% and an average RMSE of ~2.5. Because of that fact and with respect of the situation that I have only used technical indicators as features, the setup of the algorithms seems quite well.
The circumstance that LSTM performs suddenly different by changing the data set also indicates that the model setting are dependent on the given dataset. However, we cannot yet foreseen how a stock suddenly behaving
(for example the rise of the Nvidia stock). Because of the fact and the difficulty to set up a good performing LSTM I would rather tend to use LR or MLP, because of their average robustness.
An additional important factor is time. The training and the prediction wuth LSTM took at least 4 times longer than with LR or MLP. With respect to the result, it is hard to justify why LSTM is the better algorithm.
With this project, I have conducted predictions of two different stocks by the use of three different AI models over certain time windows. During the observation of the results, it seems like that with LR and MLP the results were always a bit better than with LSTM. If I also take the time of training and prediction into account, I would tend to use the LR or MLP.
By using two different datasets, I can also make the conclusion that the results are dependent on the volatility of the stock trend. When it comes to LSTM, it seems like, that the volatility of a stock is also important when is comes to the setup of the parameter of the model.
In order to predict a stock price, I have only used the price trend of past days and some statistical number. For that, all results looks well.
However, if we take a very close look to the predicted data for a seven days prediction, we can see that there is a trend shift between actual and the predicted data for exactly seven days. I explain this behavior by the given dataset the algorithms are training with.
It seems like that all algorithms are following the actual stock price given the dataset.
In other words, the results I got, should not be used to make some decisions, if it comes to stock trading. For that, an additional set of features is mandatory. Also switching form a supervised approach to a deep learning ml algorithm could bring some better results. For now, the predictions are too unspecific.
Although the performance of each model seemed to be quite impressive,
I have yet not faced a real prediction yet. Thus, a new set of features is necessary. The yfinance API could help here a lot since
it provides a lot more of market data/ metadata and information about the companies that can be used to define new features. By using neuronal networks like MLP and LSTM, I see a lot of potential to improve the results by changing the model parameters.
During my creation of the LSTM model, I could detect a dramatic change of the performance while playing with the number of units or the batch size. Thus, a grid search or another randomized approach could help here a lot.
I hope I could give a good contribution with my project and would be happy to get some feedback.
Thanks a lot!
If you are intersted in my code, you can get it on GitHub.