# Project Overview:

This project is about the idea of using AI to predict stock trends for different time frames. It combines methods to download and explore data from different tickers, to get an idea of how well a stock price is performing in future. For that, it uses multiple ML-models to conduct the predictions.
The program consists of multiple python classes and a Jupter notebook, which is divided into three main sections:

• Analysis Datasets
(Provides statistical numbers and visualizations to explore the dataset)
• Forecasting
(Performs predictions via KI)

# Data Preprocessing:

To grep data, the program is using the “yfinance API”. It queries stock information from yahoo. To start with, the user adds ticker symbols into the form field and specifies the time range for the historical data load.

# Data Exploration & Visualization:

To get some insight of the given data sets, the section “Analysis Datasets” provides global statistical data like mean of daily return, the cumulative return, which describes the total return of a stock since the beginning of the record, standard deviation or the sharp risk ration.
A lower number of the standard deviation indicates if the stock is of less risk of high variability or better say, the volatility of a stock.

# Further data preparation:

Before I can start with the prediction of the stock price, the dataset needs to be further prepared.
Currently I have no y_train/ y_test target where a supervised model can be trained on.
In order to implement this, I take the “price” column and shift it by the number of days, I want to forecast.
For instance, if I want to predict the price in five days, I make a copy of the price column and shift it by the number of five like shown in figure below. Afterwards I adapt the index column. Otherwise, I would have a misleading date index.
Now I’m able perform the split into the test and training dataset. In this project, the ratio between training and test 75% to 25%.

# Metrics:

To benchmark the results of every ML algorithm, it is necessary to define some statistical criteria’s. For this project I made my choice for three performance indicators, which I want to describe a bit closer in this section.

Mean square error (MSE):
The first indicator, I want to talk about is the mean square error (MSE). This value describes the variance of the predicted value. For example, if we imagine a two dimensional plane with two dots. First dot is the expected value, and the second dots is the predicted value, the number of the MSE describes how close the predicted dot is ranging around the expected dot. The smaller the value, the better are my predictions.

# Algorithm Technics and Evaluation:

For the experiment, I decided to use three different machine-learning (ml) algorithms, which all belongs to the class of supervised machine learning algorithms:

• Multi-layer Perceptron
• LSTM
• mean square error (MSE)
• mean absolute error (MAE)
`max_iter = 1000hls = 100MLP = MLPRegressor(random_state=0, max_iter=max_iter, hidden_layer_sizes=(hls,),                   activation='identity',                   learning_rate='adaptive').fit(x_train_scaled, y_train)`
`model = Sequential()# add first layermodel.add(LSTM(units=100, return_sequences=True, input_shape=(x_train_data.shape, 1)))model.add(Dropout(0.2))# add second layermodel.add(LSTM(units=100, return_sequences=True))model.add(Dropout(0.2))# add third layermodel.add(LSTM(units=100, return_sequences=False))model.add(Dropout(0.2))model.fit(x_train_data, y_train_data, batch_size=4, epochs=6, verbose=0)`

## Refinement:

In the section above, i have already provided the final parameter setting of each ML model. However, in this section I want to show, how difficult it is to set up the correct parameters in respect to the performance. As an example a take the LSTM, since with this algrithmn the parameters seems to have the biggest influence.

`#units=50#batch=20#epochs=4LSTM R2: 0.8343208888037233LSTM MSE: 24.18871589943589LSTM RMSE: 4.918202506956773LSTM MAE: 3.6720203234698316Test loss: 0.004514301661401987Test accuracy: 0.004514301661401987Accuracy: 0.45%#units=50#batch=10#epochs=2LSTM R2: 0.8529816884815744LSTM MSE: 21.464288066592907LSTM RMSE: 4.632956730490034LSTM MAE: 3.1555438136686225Test loss: 0.004005847033113241Test accuracy: 0.004005847033113241Accuracy: 0.40%#units=50#batch=1#epochs=4LSTM R2: 0.9393601858190183LSTM MSE: 8.853253900430836LSTM RMSE: 2.975441799200723LSTM MAE: 2.27944727267998Test loss: 0.0016522685764357448Test accuracy: 0.0016522685764357448Accuracy: 0.17%#units=100#batch=8#epochs=8LSTM R2: 0.9373253905196233LSTM MSE: 9.15032867983537LSTM RMSE: 3.0249510210638735LSTM MAE: 2.350033736063246Test loss: 0.0017077106749638915Test accuracy: 0.0017077106749638915Accuracy: 0.17%#units=100#batch=2#epochs=4LSTM R2: 0.9441181986701426LSTM MSE: 8.158596497510764LSTM RMSE: 2.856325698779949LSTM MAE: 2.2533935607262094Test loss: 0.0015226254472509027Test accuracy: 0.0015226254472509027Accuracy: 0.15%#units=100#batch=4#epochs=6LSTM R2: 0.9518972432497814LSTM MSE: 7.02287638199774LSTM RMSE: 2.650071014519751LSTM MAE: 2.052696560230034Test loss: 0.001310668420046568Test accuracy: 0.001310668420046568Accuracy: 0.13%#units=100#batch=4#epochs8´LSTM R2: 0.9573160407850155LSTM MSE: 6.231746147474334LSTM RMSE: 2.4963465599700565LSTM MAE: 1.9181539341105456Test loss: 0.0011630207300186157Test accuracy: 0.0011630207300186157Accuracy: 0.12%#units=100#batch=4#epochs=10LSTM R2: 0.9551101251093278LSTM MSE: 6.553804053217859LSTM RMSE: 2.5600398538338927LSTM MAE: 1.987840666826167Test loss: 0.0012231270084157586Test accuracy: 0.0012231270084157586Accuracy: 0.12%585387`

# Benchmark & Results:

Now let’s take a look at the results after running the last section of the notebook. For the first observation, we are looking to the Daimler stock on a seven-day forecast.

`7 days out:------------Linear RegressionLinear Regression R2: 0.9599340051810165Linear Regression MSE: 5.849530208769271Linear Regression RMSE: 2.418580205155345Linear Regression MAE: 1.8381742201585387Accuracy: 0.026061776061776062`
`Muli-layer PerceptronMuli-layer Perceptron R2: 0.9597540943486109Muli-layer Perceptron MSE: 5.875796718656164Muli-layer Perceptron RMSE: 2.4240042736464313Muli-layer Perceptron MAE: 1.842234816874342Accuracy: 0.019305019305019305`
`LSTMLSTM R2: 0.9518996750746772LSTM MSE: 7.022521341938096LSTM RMSE: 2.6500040267777134LSTM MAE: 2.050466546018151Test loss: 0.0013106020633131266Test accuracy: 0.0013106020633131266Accuracy: 0.13%`
`14 days out:------------Linear RegressionLinear Regression R2: 0.9210534027157851Linear Regression MSE: 11.205277229153344Linear Regression RMSE: 3.34742845019178Linear Regression MAE: 2.480697728061077Accuracy: 0.02131782945736434Muli-layer PerceptronMuli-layer Perceptron R2: 0.9208279486779142Muli-layer Perceptron MSE: 11.237277025011299Muli-layer Perceptron RMSE: 3.3522048005769722Muli-layer Perceptron MAE: 2.4830014230203714Accuracy: 0.025193798449612403LSTMLSTM R2: 0.8293106867363559LSTM MSE: 24.22677025948721LSTM RMSE: 4.9220697129853015LSTM MAE: 3.9718806714789814Test loss: 0.004521401599049568Test accuracy: 0.004521401599049568Accuracy: 0.45%`
`30 days out:------------Linear RegressionLinear Regression R2: 0.8023495414619239Linear Regression MSE: 26.06617012652934Linear Regression RMSE: 5.105503905250621Linear Regression MAE: 3.840367674747195Accuracy: 0.014634146341463415Muli-layer PerceptronMuli-layer Perceptron R2: 0.8056262562367411Muli-layer Perceptron MSE: 25.634036523560532Muli-layer Perceptron RMSE: 5.06300666833064Muli-layer Perceptron MAE: 3.8282281762928236Accuracy: 0.007804878048780488LSTMLSTM R2: 0.794942949055266LSTM MSE: 27.042952569422663LSTM RMSE: 5.200283893156475LSTM MAE: 3.942713516533084Test loss: 0.005046984646469355Test accuracy: 0.005046984646469355Accuracy: 0.50%`
`7 days out:------------Linear RegressionLinear Regression R2: 0.9840980476812651Linear Regression MSE: 477.8649860102321Linear Regression RMSE: 21.860123192933568Linear Regression MAE: 15.751104514724204Accuracy: 0.0019305019305019305Muli-layer PerceptronMuli-layer Perceptron R2: 0.982952112618063Muli-layer Perceptron MSE: 512.3011503232476Muli-layer Perceptron RMSE: 22.634070564599014Muli-layer Perceptron MAE: 16.427553295187025Accuracy: 0.0019305019305019305#units=100#batch=100#epochs=10LSTMLSTM R2: 0.9526400502028002LSTM MSE: 1423.200201689741LSTM RMSE: 37.72532573338156LSTM MAE: 28.309211559516577Test loss: 0.0021751392632722855Test accuracy: 0.0021751392632722855Accuracy: 0.22%#units=100#batch=4#epochs=6LSTM R2: 0.7715236662166166LSTM MSE: 6865.876457096025LSTM RMSE: 82.86058446026087LSTM MAE: 59.134434449773956Test loss: 0.010493420995771885Test accuracy: 0.010493420995771885Accuracy: 1.05%`
`30 days out:------------Linear RegressionLinear Regression R2: 0.9260988131636365Linear Regression MSE: 1967.4077523867877Linear Regression RMSE: 44.355470377246455Linear Regression MAE: 32.34013531018811Accuracy: 0.002926829268292683Muli-layer PerceptronMuli-layer Perceptron R2: 0.9314364892804224Muli-layer Perceptron MSE: 1825.3073907897913Muli-layer Perceptron RMSE: 42.72361631217319Muli-layer Perceptron MAE: 31.672436833157796Accuracy: 0.001951219512195122#units=100#batch=100#epochs=10LSTMLSTM R2: 0.8819063513993067LSTM MSE: 3143.9056625586086LSTM RMSE: 56.070541842919695LSTM MAE: 41.608872088176454Test loss: 0.006525109056383371Test accuracy: 0.006525109056383371Accuracy: 0.65%#units=100#batch=4#epochs=6LSTM R2: 0.6263458145324079LSTM MSE: 9947.47408899509LSTM RMSE: 99.73702466484094LSTM MAE: 72.85733414812786Test loss: 0.020645776763558388Test accuracy: 0.020645776763558388Accuracy: 2.06%`

# Justification:

All three models seem to perform very effective when it comes to a stock prediction. With the given Daimler stock, all algorithms are facing a very good R2 ratio of ~95% and an average RMSE of ~2.5. Because of that fact and with respect of the situation that I have only used technical indicators as features, the setup of the algorithms seems quite well.
The circumstance that LSTM performs suddenly different by changing the data set also indicates that the model setting are dependent on the given dataset. However, we cannot yet foreseen how a stock suddenly behaving
(for example the rise of the Nvidia stock). Because of the fact and the difficulty to set up a good performing LSTM I would rather tend to use LR or MLP, because of their average robustness.
An additional important factor is time. The training and the prediction wuth LSTM took at least 4 times longer than with LR or MLP. With respect to the result, it is hard to justify why LSTM is the better algorithm.

# Reflection:

With this project, I have conducted predictions of two different stocks by the use of three different AI models over certain time windows. During the observation of the results, it seems like that with LR and MLP the results were always a bit better than with LSTM. If I also take the time of training and prediction into account, I would tend to use the LR or MLP.
By using two different datasets, I can also make the conclusion that the results are dependent on the volatility of the stock trend. When it comes to LSTM, it seems like, that the volatility of a stock is also important when is comes to the setup of the parameter of the model.
In order to predict a stock price, I have only used the price trend of past days and some statistical number. For that, all results looks well.