diff --git a/docs/examples/psMNIST-weights.hdf5 b/docs/examples/psMNIST-weights.hdf5
index fafae812..a8b9a566 100644
Binary files a/docs/examples/psMNIST-weights.hdf5 and b/docs/examples/psMNIST-weights.hdf5 differ
diff --git a/docs/examples/psMNIST.ipynb b/docs/examples/psMNIST.ipynb
index 11fde84e..5c3b0f4b 100644
--- a/docs/examples/psMNIST.ipynb
+++ b/docs/examples/psMNIST.ipynb
@@ -24,13 +24,13 @@
     "been shown. The psMNIST task adds more complexity to the input by applying a fixed\n",
     "permutation to all of the pixel sequences. This is done to ensure that the information\n",
     "contained in the image is distributed evenly throughout the sequence, so that in order\n",
-    "to perform the task successfully, the network needs to process information across the\n",
+    "to perform the task successfully the network needs to process information across the\n",
     "whole length of the input sequence.\n",
     "\n",
     "The following notebook uses a single KerasLMU layer inside a simple TensorFlow model to\n",
     "showcase the accuracy and efficiency of performing the psMNIST task using these novel\n",
     "memory cells. Using the LMU for this task currently produces state-of-the-art results\n",
-    "this task ([see\n",
+    "([see\n",
     "paper](https://papers.nips.cc/paper/9689-legendre-memory-units-continuous-time-representation-in-recurrent-neural-networks.pdf))."
    ]
   },
@@ -62,7 +62,7 @@
    "metadata": {},
    "source": [
     "First we set a seed to ensure that the results in this example are reproducible. A\n",
-    "random number generator state (`rng`) is also created, and this will later be used to\n",
+    "random number generator (`rng`) is also created, and this will later be used to\n",
     "generate the fixed permutation to be applied to the image data."
    ]
   },
@@ -131,13 +131,12 @@
     "method on the images. The first dimension of the reshaped output size represents the\n",
     "number of samples our dataset has, which we keep the same. We want to transform each\n",
     "sample into a column vector, and to do so we make the second and third dimensions -1 and\n",
-    "1, respectively, leveraging a standard NumPy trick specifically used for converting\n",
-    "multi-dimensional data into column vectors.\n",
+    "1, respectively.\n",
     "\n",
     "The image displayed below shows the result of this flattening process, and is an example\n",
     "of the type of data that is used in the Sequential MNIST task. Note that even though the\n",
-    "image has been reshaped into an 98 x 8 image (so that it can fit on the screen), there\n",
-    "is still a fair amount of structure observable in the image."
+    "image has been flattened, there is still a fair amount of structure observable in the\n",
+    "image."
    ]
   },
   {
@@ -162,14 +161,15 @@
    "metadata": {},
    "source": [
     "Finally, we apply a fixed permutation on the images in both the training and testing\n",
-    "datasets. This essentially shuffles the pixels of the image sequences in a consistent\n",
+    "datasets. This shuffles the pixels of the image sequences in a consistent\n",
     "way, allowing for images of the same digit to still be similar, but removing the\n",
     "convenience of edges and contours that the network can use for easy digit inference.\n",
     "\n",
     "We can see, from the image below, that the fixed permutation applied to the image\n",
     "creates an even distribute of pixels across the entire sequence. This makes the task\n",
-    "much more difficult as it makes it necessary for the network to process the entire input\n",
-    "sequence to accurately predict what the digit is. We now have our data for the Permuted\n",
+    "much more difficult, as it makes it necessary for the network to process the entire\n",
+    "input\n",
+    "sequence to accurately classify the digit. We now have our data for the Permuted\n",
     "Sequential MNIST (psMNIST) task."
    ]
   },
@@ -205,11 +205,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "X_train = train_images[0:50000]\n",
+    "X_train = train_images[:50000]\n",
     "X_valid = train_images[50000:]\n",
     "X_test = test_images\n",
     "\n",
-    "Y_train = train_labels[0:50000]\n",
+    "Y_train = train_labels[:50000]\n",
     "Y_valid = train_labels[50000:]\n",
     "Y_test = test_labels\n",
     "\n",
@@ -235,7 +235,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Our model uses a single LMU layer configured with 212 `units` and an `order` of 256\n",
+    "Our model uses a single LMU layer configured with 212 hidden `units` and an `order` of\n",
+    "256\n",
     "dimensions for the memory, maintaining `units` + `order` = 468 variables in memory\n",
     "between time-steps. These numbers were chosen primarily to have a comparable number of\n",
     "internal variables to the models that were being compared against in the\n",
@@ -256,17 +257,15 @@
    "source": [
     "n_pixels = X_train.shape[1]\n",
     "\n",
-    "lmu_layer = tf.keras.layers.RNN(\n",
-    "    keras_lmu.LMUCell(\n",
-    "        memory_d=1,\n",
-    "        order=256,\n",
-    "        theta=n_pixels,\n",
-    "        hidden_cell=tf.keras.layers.SimpleRNNCell(212),\n",
-    "        hidden_to_memory=False,\n",
-    "        memory_to_memory=False,\n",
-    "        input_to_hidden=True,\n",
-    "        kernel_initializer=\"ones\",\n",
-    "    )\n",
+    "lmu_layer = keras_lmu.LMU(\n",
+    "    memory_d=1,\n",
+    "    order=256,\n",
+    "    theta=n_pixels,\n",
+    "    hidden_cell=tf.keras.layers.SimpleRNNCell(212),\n",
+    "    hidden_to_memory=False,\n",
+    "    memory_to_memory=False,\n",
+    "    input_to_hidden=True,\n",
+    "    kernel_initializer=\"ones\",\n",
     ")\n",
     "\n",
     "# TensorFlow layer definition\n",
@@ -295,14 +294,14 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "To train our model, we use a `batch_size` of 100, and train for 10 `epochs`, which is a\n",
+    "To train our model we use a `batch_size` of 100 and train for 10 `epochs`, which is\n",
     "far less than most other solutions to the psMNIST task. We could train for more epochs\n",
     "if we wished to fine-tune performance, but that is not necessary for the purposes of\n",
     "this example. We also create a `ModelCheckpoint` callback that saves the weights of the\n",
     "model to a file after each epoch.\n",
     "\n",
-    "The time required for this to run is tracked using the `time` library. Training may take\n",
-    "a long time to complete, and to save time, this notebook defaults to using pre-trained\n",
+    "Training may take\n",
+    "a long time to complete, and to save time this notebook defaults to using pre-trained\n",
     "weights. To train the model from scratch, simply change the `do_training` variable to\n",
     "`True` before running the cell below."
    ]
@@ -339,11 +338,8 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The progression of the training process is shown below. Here we plot the accuracy for\n",
-    "the training and validation for each epoch.\n",
-    "\n",
-    "Note that if this notebook has been configured to use trained weights, instead of using\n",
-    "live data, a saved image of a previous training run will be displayed."
+    "The progression of the training process is shown below, plotting the\n",
+    "training and validation accuracy."
    ]
   },
   {
@@ -359,7 +355,7 @@
     "    plt.legend()\n",
     "    plt.xlabel(\"Epoch\")\n",
     "    plt.ylabel(\"Accuracy\")\n",
-    "    plt.title(\"Post-epoch Training Accuracies\")\n",
+    "    plt.title(\"Post-epoch training accuracies\")\n",
     "    plt.xticks(np.arange(epochs), np.arange(1, epochs + 1))\n",
     "    plt.ylim((0.85, 1.0))  # Restrict range of y axis to (0.85, 1) for readability\n",
     "    plt.savefig(\"psMNIST-training.png\")\n",
@@ -386,7 +382,7 @@
    "metadata": {},
    "source": [
     "With the training complete, let's use the trained weights to test the model. Since the\n",
-    "weights are saved to file after every epoch, we can simply load the saved weights, then\n",
+    "best weights are saved to file, we can simply load the saved weights, then\n",
     "test it against the permuted sequences in the test set."
    ]
   },