import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from net import ConvProbNet
from simulate_data import generate_data
from parse_input import plot_input_data, transform_input_data, plot_transformed_data, make_train_test
We are given input data
At each time
$\mu_t = y_t$ $\sigma^2_t = (\mu_t - y_t)^2$
That is a pretty simple excercise to solve. For example see here
In order to incentivize our network to produce maximum likelihood parameters we can use the following mean squared error loss:
Where
- Obviously the model minimizes its loss function by producing
$\mu_t = y_t$ , as desired, and by producing$\sigma^2_t = (\mu_t - y_t)^2$ , as desired. - If
$\mu_t = y_t$ then the network is rewarded for setting$\sigma^2_t = 0$ . In other words, if our network is very confident in its predictions then it should set the variance to zero. Another way to say this is that if$y$ is deterministic or not very noisy then our network should be closer to a simple regression - If
$\mu_t \neq y_t$ then the network is rewarded for correctly expressing its uncertainty through the variance of the output distribution. Therefore our output distribution's variance expresses both the model's uncertainty in its prediction, and the inherent uncertainty in the target signal (it tells us something about$\eta$ ).
We're expecting that
Let batch size be
- Input layer (B, N, 3)
- Conv 1D X2 (B, 64, N-2, 3)
- Dropout + max pooling 1D (B, 64, (N-2)/2, 3)
- Dense (B, 100)
- Output (B, 2)
There are two convolution layers with 64 filters and a dense layer with output size 100. The final output is mu and sigma^2
The output of the conv 1d layers includes N-2 because we use a convolutional filter size 3, and then we max pool with pool size = 2 so we get down do (N-2)/2. The width 100, 64 filters, and batch size just come from intuition and practice, and in reality they will become hyperparameters that we fiddle with later when we're trying to get the best model (we can set up a script to automatically do a hyperparameter search).
The I built and defined the actual model in net.py and I simply import it and use it here. I'm calling it ConvProbNet
In the following, I'll run three experiments on the network.
-
Test that the model can produce accurate
$\mu$ : Feed in random$X$ along with a deterministic, constant valued$y = y_0$ . The idea is to make sure that the network does what we want it to do, and that it has enough capacity to dramatically overtrain on simple data. (It is better to start with a network that is too large/deep and then worry about regularizing than to be stuck with a network that is too shallow to represent the underlying data patterns). In this experiment we want to see the model converge to always output$\mu_t = y_0$ ,$\sigma^2_t = 0$ . -
Test that the model can produce accurate
$\sigma^2$ : Feed in random$X$ along with stationary$y$ drawn from a gaussian$\mathcal{N}(\mu_0, \sigma^2_0)$ . We want to see the model converge to always outputting$\mu_t = y_0$ ,$\sigma^2_t = \sigma^2_0$ . -
Simulate autocorrelated stationary signals
$X$ and use a known function$y = f(X)$ (for example a simple exponential weighted moving average). Train the network for a while and see that it's performing reasonably well. We'll look at loss and some plots to evaluate the network. For this experiment I simulated data in simulate_data.py and wrote it out to a csv input_data.csv. It "looks" real in the sense that it's a csv with datestamps that needs to be read in, normalized, transformed, and prepped to be fed to the network. I define all the prep work in parse_input.py and just use the functions here. -
Eventual (not implemented) grab actual signals from the wild and feed them to the network. Evaluate by using a train/test split, metric could be something like average
$\frac{|\mu - y|}{\sigma}$ , something along those lines.
#produce a random mu and sigma from the network before training, just to make sure everything
#is hooked up correctly
f = ConvProbNet().model
X = np.random.random((1, 14, 3)); y = np.random.random(1)
output = f([X[:1], y[:1]])
mu, sig = (output[0].numpy()[0][0], output[1].numpy()[0][0])
print(f'mu = {mu}, sigma squared = {sig}')
mu = 0.014715185388922691, sigma squared = 0.02229337766766548
We're going to generate a constant vector y, and we want the network to always output
#train
X = np.random.random((10000, 14, 3)); y = np.ones(10000)
f.fit([X,y], y, epochs=10)
#output mu and sigma^2, expect 1.0, 0.0
X = np.random.random((1, 14, 3)); y = np.random.random(1)
output = f([X[:1], y[:1]])
mu, sig = (output[0].numpy()[0][0], output[1].numpy()[0][0])
Epoch 1/10
313/313 [==============================] - 1s 2ms/step - loss: 0.0766 - dense_1_accuracy: 0.9675 - dense_2_accuracy: 0.0035
Epoch 2/10
313/313 [==============================] - 1s 2ms/step - loss: 0.0073 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 3/10
313/313 [==============================] - 1s 2ms/step - loss: 0.0038 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 4/10
313/313 [==============================] - 1s 2ms/step - loss: 0.0022 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 5/10
313/313 [==============================] - 1s 2ms/step - loss: 0.0013 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 6/10
313/313 [==============================] - 1s 3ms/step - loss: 7.4012e-04 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 7/10
313/313 [==============================] - 1s 3ms/step - loss: 4.5538e-04 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 8/10
313/313 [==============================] - 1s 2ms/step - loss: 2.2553e-04 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 9/10
313/313 [==============================] - 1s 2ms/step - loss: 1.0551e-04 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
Epoch 10/10
313/313 [==============================] - 1s 2ms/step - loss: 4.0912e-05 - dense_1_accuracy: 1.0000 - dense_2_accuracy: 0.0000e+00
print(f'mu = {mu}, sigma squared = {sig}')
mu = 0.9917384386062622, sigma squared = 0.0016207885928452015
Not too bad! Now let's see if it can learn to output the parameters of the distribution of a stationary y. In other words, we are going to draw y from N(mu, sigma^2) and want the network to learn that and always produce the same mu and sigma^2
#train
X = np.random.random((10000, 14, 3)); y = np.random.normal(1,.5,size=10000)
f.fit([X,y], y, epochs=10)
#output mu and sigma^2, expect 1, 0.25
X = np.random.random((1, 14, 3)); y = np.random.random(1)
output = f([X[:1], y[:1]])
mu, sig = (output[0].numpy()[0][0], output[1].numpy()[0][0])
Epoch 1/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3884 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 2/10
313/313 [==============================] - 1s 2ms/step - loss: 0.3821 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 3/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3822 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 4/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3811 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 5/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3822 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 6/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3822 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 7/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3805 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 8/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3798 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 9/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3813 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 10/10
313/313 [==============================] - 1s 3ms/step - loss: 0.3807 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00A: 0s - loss: 0.3777 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0
print(f'mu = {mu}, sigma squared = {sig}')
mu = 1.0553873777389526, sigma squared = 0.270428866147995
Again, not too bad! We were looking for 1 and 0.25
But this is easy. Let's see how the network does on some simulated data.
Check out the simulate_data and parse_input scripts, but here's what I did to simulate data.
- x1, x2, and x3 have log daily returns that are gaussian distributed, correlated with each other, and autocorrelated over two days.
- y is an exponentially weighted moving average with a halflife of one week running over x1, x2, x3. So data in x1, x2, x3 contain information about future y values.
- the network uses the 3 new values x1(t), x2(t) and x3(t) as well as 13 days worth of older values of x1 x2 x3 in order to predict y(t).
- parse_input just has some plotting functions I wrote, and normalizes the data into log-daily-returns
- finally parse_input has make_train_test which builds up training and test data for the network
df = pd.read_csv('input_data.csv')
df.head()
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
Unnamed: 0 | x1 | x2 | x3 | target | date | |
---|---|---|---|---|---|---|
0 | 0 | 10.789147 | 27.286684 | 11.775043 | 16.616958 | 2017-01-01 |
1 | 1 | 9.910168 | 28.115604 | 11.836738 | 16.618993 | 2017-01-02 |
2 | 2 | 9.921680 | 25.979636 | 11.181378 | 16.279763 | 2017-01-03 |
3 | 3 | 9.233654 | 23.072619 | 10.225685 | 15.673707 | 2017-01-04 |
4 | 4 | 8.757418 | 23.125527 | 9.104205 | 15.188115 | 2017-01-05 |
%matplotlib inline
plot_input_data(df);
X,y = transform_input_data(df)
plot_transformed_data(X,y);
#we're ready to train!
Xtrain, ytrain, Xtest, ytest = make_train_test(X,y)
f.fit([Xtrain,ytrain], ytrain, epochs=10)
Epoch 1/10
13/13 [==============================] - 0s 3ms/step - loss: 0.2978 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 2/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0178 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 3/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0084 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 4/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0037 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 5/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0022 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 6/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0028 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 7/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0030 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 8/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0022 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 9/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0019 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
Epoch 10/10
13/13 [==============================] - 0s 3ms/step - loss: 0.0016 - dense_1_accuracy: 0.0000e+00 - dense_2_accuracy: 0.0000e+00
<tensorflow.python.keras.callbacks.History at 0x6427a21d0>
There wasn't very much data in the input file. But let's evaluate our model's performance.
We're going to get mu and sigma for each day in the test period, and then compare that to y for each day in the test period. If our model is doing well, then
deviation =
then we'll make a histogram and look at the deviations
mu_test, sig_test = f([Xtest, np.ones(Xtest.shape[0])])
mu_test = mu_test.numpy().flatten()
sig_test = sig_test.numpy().flatten()
stds_test = np.sqrt(sig_test)
deviations = np.abs(mu_test - ytest)/stds_test
_ = plt.hist(deviations)
plt.xlabel("Standard Deviations from y true")
plt.ylabel("Frequency in test set")
Text(0, 0.5, 'Frequency in test set')
Not excellent! Let's see how it does if we feed it a lot more training data (50,000 examples this time)
X,y = transform_input_data(generate_data(Ndays=50000))
Xtrain, ytrain, Xtest, ytest = make_train_test(X,y)
f.fit([Xtrain, ytrain], ytrain, epochs=10)
Epoch 1/10
1250/1250 [==============================] - 3s 3ms/step - loss: 1.0864e-04 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 2/10
1250/1250 [==============================] - 3s 3ms/step - loss: 8.2467e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 3/10
1250/1250 [==============================] - 3s 3ms/step - loss: 8.0153e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 4/10
1250/1250 [==============================] - 3s 2ms/step - loss: 7.9234e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 5/10
1250/1250 [==============================] - 3s 2ms/step - loss: 7.8800e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 6/10
1250/1250 [==============================] - 3s 2ms/step - loss: 7.8930e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 7/10
1250/1250 [==============================] - 4s 3ms/step - loss: 7.8200e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 8/10
1250/1250 [==============================] - 3s 3ms/step - loss: 7.7725e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 9/10
1250/1250 [==============================] - 3s 3ms/step - loss: 7.7075e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
Epoch 10/10
1250/1250 [==============================] - 3s 3ms/step - loss: 7.7144e-05 - dense_4_accuracy: 0.0000e+00 - dense_5_accuracy: 0.0000e+00
<tensorflow.python.keras.callbacks.History at 0x652f2c358>
mu_test, sig_test = f([Xtest, np.ones(Xtest.shape[0])])
mu_test = mu_test.numpy().flatten()
sig_test = sig_test.numpy().flatten()
stds_test = np.sqrt(sig_test)
deviations = (mu_test - ytest)/stds_test
_ = plt.hist(deviations, bins=1000)
plt.xlabel("Standard Deviations from y true")
plt.ylabel("Frequency in test set")
plt.xlim(-4,4)
/Users/dyanni3/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:4: RuntimeWarning: invalid value encountered in sqrt
after removing the cwd from sys.path.
(-4, 4)
Much better! This now looks roughly like half a normal distribution. Most of the time our model is right, but rare outliers occur!