Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add trainable theta and euler as discretizer #41

Merged
merged 3 commits into from
Aug 6, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added .ci/secret.tar.enc
Binary file not shown.
22 changes: 20 additions & 2 deletions .nengobones.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ travis_yml:
env:
TF_VERSION: tensorflow==2.1.0
python: 3.6
- script: docs
- script: examples
- script: remote-docs
- script: remote-examples
pypi_user: __token__
slack_notifications: "wZ7l/X7cVeetmwfup7vCeN74pqFGMC5eaJfy/aqRwVGCbY3aHQKoqJaBBrVef\
c+DsJwPPM9HIOGs7jkPY+Y1pFbklAhWCDCvmc+f3fL4/yPWK1u7r8IIHhM3O0YvYrEHfFfZn+V1nAomx1\
Expand All @@ -94,6 +94,24 @@ ci_scripts:
coverage: true
pip_install:
- $TF_VERSION
- template: remote-script
remote_script: docs
output_name: remote-docs
host: azure-docs
travis_var_key: 2895d60e3414
azure_name: nengo-dl-docs
azure_group: nengo-ci
remote_setup:
- conda install -y -c conda-forge cudatoolkit=11.2 cudnn=8.1
- template: remote-script
remote_script: examples
output_name: remote-examples
host: azure-examples
travis_var_key: 2895d60e3414
azure_name: nengo-dl-examples
azure_group: nengo-ci
remote_setup:
- conda install -y -c conda-forge cudatoolkit=11.2 cudnn=8.1
- template: deploy

codecov_yml: {}
Expand Down
9 changes: 2 additions & 7 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,15 +39,10 @@ jobs:
python: 3.6
-
env:
SCRIPT="docs"
addons:
apt:
packages:
- pandoc
SCRIPT="remote-docs"
-
env:
SCRIPT="examples"
services: ['xvfb']
SCRIPT="remote-examples"
- stage: deploy
if: branch =~ ^release-candidate-* OR tag =~ ^v[0-9]*
env: SCRIPT="deploy"
Expand Down
15 changes: 14 additions & 1 deletion CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Release history
- Removed
- Fixed

0.3.2 (unreleased)
0.4.0 (unreleased)
==================

**Added**
Expand All @@ -29,8 +29,21 @@ Release history
uses this implementation for all values of ``memory_d`` when feedforward conditions
are satisfied (no hidden-to-memory or memory-to-memory connections,
and the sequence length is not ``None``). (`#40`_)
- Added ``trainable_theta`` option, which will allow the ``theta`` parameter to be
learned during training. (`#41`_)
- Added ``discretizer`` option, which controls the method used to solve for the ``A``
and ``B`` LMU matrices. This is mainly useful in combination with
``trainable_theta=True``, where setting ``discretizer="euler"`` may improve the
training speed (possibly at the cost of some accuracy). (`#41`_)

**Changed**

- The ``A`` and ``B`` matrices are now stored as constants instead of non-trainable
variables. This can improve the training/inference speed, but it means that saved
weights from previous versions will be incompatible. (`#41`_)

.. _#40: https://github.com/nengo/keras-lmu/pull/40
.. _#41: https://github.com/nengo/keras-lmu/pull/41

0.3.1 (November 16, 2020)
=========================
Expand Down
Binary file modified docs/examples/psMNIST-weights.hdf5
Binary file not shown.
64 changes: 30 additions & 34 deletions docs/examples/psMNIST.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,13 @@
"been shown. The psMNIST task adds more complexity to the input by applying a fixed\n",
"permutation to all of the pixel sequences. This is done to ensure that the information\n",
"contained in the image is distributed evenly throughout the sequence, so that in order\n",
"to perform the task successfully, the network needs to process information across the\n",
"to perform the task successfully the network needs to process information across the\n",
"whole length of the input sequence.\n",
"\n",
"The following notebook uses a single KerasLMU layer inside a simple TensorFlow model to\n",
"showcase the accuracy and efficiency of performing the psMNIST task using these novel\n",
"memory cells. Using the LMU for this task currently produces state-of-the-art results\n",
"this task ([see\n",
"([see\n",
"paper](https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf))."
]
},
Expand Down Expand Up @@ -62,7 +62,7 @@
"metadata": {},
"source": [
"First we set a seed to ensure that the results in this example are reproducible. A\n",
"random number generator state (`rng`) is also created, and this will later be used to\n",
"random number generator (`rng`) is also created, and this will later be used to\n",
"generate the fixed permutation to be applied to the image data."
]
},
Expand Down Expand Up @@ -131,13 +131,12 @@
"method on the images. The first dimension of the reshaped output size represents the\n",
"number of samples our dataset has, which we keep the same. We want to transform each\n",
"sample into a column vector, and to do so we make the second and third dimensions -1 and\n",
"1, respectively, leveraging a standard NumPy trick specifically used for converting\n",
"multi-dimensional data into column vectors.\n",
"1, respectively.\n",
"\n",
"The image displayed below shows the result of this flattening process, and is an example\n",
"of the type of data that is used in the Sequential MNIST task. Note that even though the\n",
"image has been reshaped into an 98 x 8 image (so that it can fit on the screen), there\n",
"is still a fair amount of structure observable in the image."
"image has been flattened, there is still a fair amount of structure observable in the\n",
"image."
]
},
{
Expand All @@ -162,14 +161,15 @@
"metadata": {},
"source": [
"Finally, we apply a fixed permutation on the images in both the training and testing\n",
"datasets. This essentially shuffles the pixels of the image sequences in a consistent\n",
"datasets. This shuffles the pixels of the image sequences in a consistent\n",
"way, allowing for images of the same digit to still be similar, but removing the\n",
"convenience of edges and contours that the network can use for easy digit inference.\n",
"\n",
"We can see, from the image below, that the fixed permutation applied to the image\n",
"creates an even distribute of pixels across the entire sequence. This makes the task\n",
"much more difficult as it makes it necessary for the network to process the entire input\n",
"sequence to accurately predict what the digit is. We now have our data for the Permuted\n",
"much more difficult, as it makes it necessary for the network to process the entire\n",
"input\n",
"sequence to accurately classify the digit. We now have our data for the Permuted\n",
"Sequential MNIST (psMNIST) task."
]
},
Expand Down Expand Up @@ -205,11 +205,11 @@
"metadata": {},
"outputs": [],
"source": [
"X_train = train_images[0:50000]\n",
"X_train = train_images[:50000]\n",
"X_valid = train_images[50000:]\n",
"X_test = test_images\n",
"\n",
"Y_train = train_labels[0:50000]\n",
"Y_train = train_labels[:50000]\n",
"Y_valid = train_labels[50000:]\n",
"Y_test = test_labels\n",
"\n",
Expand All @@ -235,7 +235,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Our model uses a single LMU layer configured with 212 `units` and an `order` of 256\n",
"Our model uses a single LMU layer configured with 212 hidden `units` and an `order` of\n",
"256\n",
"dimensions for the memory, maintaining `units` + `order` = 468 variables in memory\n",
"between time-steps. These numbers were chosen primarily to have a comparable number of\n",
"internal variables to the models that were being compared against in the\n",
Expand All @@ -256,17 +257,15 @@
"source": [
"n_pixels = X_train.shape[1]\n",
"\n",
"lmu_layer = tf.keras.layers.RNN(\n",
" keras_lmu.LMUCell(\n",
" memory_d=1,\n",
" order=256,\n",
" theta=n_pixels,\n",
" hidden_cell=tf.keras.layers.SimpleRNNCell(212),\n",
" hidden_to_memory=False,\n",
" memory_to_memory=False,\n",
" input_to_hidden=True,\n",
" kernel_initializer=\"ones\",\n",
" )\n",
"lmu_layer = keras_lmu.LMU(\n",
" memory_d=1,\n",
" order=256,\n",
" theta=n_pixels,\n",
" hidden_cell=tf.keras.layers.SimpleRNNCell(212),\n",
" hidden_to_memory=False,\n",
" memory_to_memory=False,\n",
" input_to_hidden=True,\n",
" kernel_initializer=\"ones\",\n",
")\n",
"\n",
"# TensorFlow layer definition\n",
Expand Down Expand Up @@ -295,14 +294,14 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"To train our model, we use a `batch_size` of 100, and train for 10 `epochs`, which is a\n",
"To train our model we use a `batch_size` of 100 and train for 10 `epochs`, which is\n",
"far less than most other solutions to the psMNIST task. We could train for more epochs\n",
"if we wished to fine-tune performance, but that is not necessary for the purposes of\n",
"this example. We also create a `ModelCheckpoint` callback that saves the weights of the\n",
"model to a file after each epoch.\n",
"\n",
"The time required for this to run is tracked using the `time` library. Training may take\n",
"a long time to complete, and to save time, this notebook defaults to using pre-trained\n",
"Training may take\n",
"a long time to complete, and to save time this notebook defaults to using pre-trained\n",
"weights. To train the model from scratch, simply change the `do_training` variable to\n",
"`True` before running the cell below."
]
Expand Down Expand Up @@ -339,11 +338,8 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The progression of the training process is shown below. Here we plot the accuracy for\n",
"the training and validation for each epoch.\n",
"\n",
"Note that if this notebook has been configured to use trained weights, instead of using\n",
"live data, a saved image of a previous training run will be displayed."
"The progression of the training process is shown below, plotting the\n",
"training and validation accuracy."
]
},
{
Expand All @@ -359,7 +355,7 @@
" plt.legend()\n",
" plt.xlabel(\"Epoch\")\n",
" plt.ylabel(\"Accuracy\")\n",
" plt.title(\"Post-epoch Training Accuracies\")\n",
" plt.title(\"Post-epoch training accuracies\")\n",
" plt.xticks(np.arange(epochs), np.arange(1, epochs + 1))\n",
" plt.ylim((0.85, 1.0)) # Restrict range of y axis to (0.85, 1) for readability\n",
" plt.savefig(\"psMNIST-training.png\")\n",
Expand All @@ -386,7 +382,7 @@
"metadata": {},
"source": [
"With the training complete, let's use the trained weights to test the model. Since the\n",
"weights are saved to file after every epoch, we can simply load the saved weights, then\n",
"best weights are saved to file, we can simply load the saved weights, then\n",
"test it against the permuted sequences in the test set."
]
},
Expand Down
Loading