Updates LoRA Training Notebook for Kaggle (NVIDIA#209)

update notebook for kaggle Signed-off-by: Oleg S <97077423+RobotSail@users.noreply.github.com> Co-authored-by: Máirín Duffy <duffy@redhat.com>
dcurran90 · Mar 5, 2024 · 0d7ff52 · 0d7ff52
1 parent 2146dc6
commit 0d7ff52
Show file tree

Hide file tree

Showing 14 changed files with 115 additions and 5 deletions.
diff --git a/notebooks/README.md b/notebooks/README.md
@@ -1,18 +1,78 @@
 # Training
 
-You're now at the training phase. So far you have hand crafted some prompts and responses, and used `lab generate` to synthesize those prompt/response pairs into a new data set. Using a [Google Colab notebook](./Training_a_LoRA_With_Instruct_Lab.ipynb) and the NVidia T4 provided in the free tier, we will fine tune a LoRA. 
+You're now at the training phase. So far you have hand crafted some prompts and responses, and used `lab generate` to synthesize those prompt/response pairs into a new data set.
+
+Next, you'll get to fine-tune a LoRA (Low-Rank Adaptation of Large Language Models) using a Jupyter notebook and either Kaggle or the Google Collab platform.
+
+We've laid out the steps to get started with either platform below.
+
+## Setting up the notebook
+
+### Kaggle
+
+Using a [Kaggle Notebook](./Training_a_LoRA_With_Labrador.ipynb) and the NVIDIA P100 provided in the free tier, we will fine tune a LoRA. 
+
+#### Pre-requisites
+
+1. You'll need a Kaggle account, which you can create by visiting [Kaggle's Signup Page](https://www.kaggle.com/account/login?phase=startRegisterTab&returnUrl=%2F).
+1. To use Kaggle's accelerators, you'll have to verify your account with a phone number. Visit the [account settings page](https://www.kaggle.com/settings) and select "Phone Verification".
+
+
+**NOTE: At present, you'll need to download the notebook and upload it to Kaggle. The following steps will walk you through uploading the notebook. Once this repository is open sourced, we will make an 'Open in Colab' button**
+
+
+#### Uploading the notebook
+
+Once you have Kaggle properly configured, you can then run this notebook by following this process:
+
+1. At the top-left of the Kaggle page, click the "Create" button
+
+![create-notebook](images/kaggle/create.png)
+
+
+2. Then, select "Notebook" from the Dropdown menu.
+
+![create-new-notebook](images/kaggle/create-new-nb.png)
+
+
+3. This will create a new notebook with some example data inside of it already. From here, select "File" at the top left corner.
+
+![new-notebook-file-click](images/kaggle/file-click.png)
+
+4. Then, select "Import notebook". This will prompt you to upload a file from a local disk (you can also use GitHub).
+
+![import-new-notebook](images/kaggle/import-nb.png)
+
+
+5. With the notebook uploaded, we'll then need to click the three vertical dots on the top right to open the accelerator options.
+
+![select-an-accelerator](images/kaggle/select-accelerator.png)
+
+6. Then select the **P100 GPU Accelerator**. The other accelerator options will not work (yet).
+
+![selecting-the-p100-gpu](images/kaggle/select-accelerator-p100.png)
+
+
+7. Finally, make sure to click "Restart & Clear Cell Outputs" before you run. ***KAGGLE WILL NOT LET YOU RUN NOTEBOOKS OVER 1 MEGABYTE IN SIZE***
+
+![restart-and-clear-cell-outputs](images/kaggle/clear-outputs.png)
+
+### Google Collab
 
 Pre-requisites: 
 * [Google Colab](https://research.google.com/colaboratory/faq.html)
 * A Gmail account that you're logged into, this will allow you to use Google Colab, which in the free tier will give you access to an NVidia T4 x 15GB GPU
 
-
 **NOTE: At present, you'll need to download the notebook and upload it to Google Colab. To upload a notebook go to {Google Colab](https://colab.research.google.com) and you will be prompted to upload a notebook. Once this repository is open sourced, we will make an 'Open in Colab' button**
 
-[The notebook](./Training_a_LoRA_With_Instruct_Lab.ipynb) in this folder will walk you through:
+
+## Running the notebook
+
+
+[The notebook](./Training_a_LoRA_With_Labrador.ipynb) in this folder will walk you through:
 1. Uploading the output of `lab generate` (a synthetic dataset created based on your hand written prompts/responses).
 2. Checking the base model before training
 3. Setting up and training a Low Rank Adapter (LoRA). LoRA is a parameter efficient fine tuning method (PEFT) that allows you to fine tune a model on a small subset of the overall parameters, which allows you to conduct a finetuning in a fraction of the time, on a fraction of the hardware required. The resultant model should be updated and better handle your queries than the base model.
 4. Inspecting the output model to make sure the LoRA training had the desired effect. (That is to say the the output has 'improved').
-   
+
 Once you have finished training and the output looks good, we encourage you go to stage, [Testing the fine-tuned model](../README.md#👩🏽‍🔬-3-testing-the-fine-tuned-model)
diff --git a/notebooks/Training_a_LoRA_With_Instruct_Lab.ipynb b/notebooks/Training_a_LoRA_With_Instruct_Lab.ipynb
@@ -35,7 +35,26 @@
         "\n",
         "Finally, it will give you a chance to interact with your model in two ways, one in this notebook (using the NVIDIA T4 generously supplied by Google and low/no cost) and two by giving you the option to convert your adapter to a format that will let you download it and use it with `llamma.cpp` on your laptop.\n",
         "\n",
-        "IMPORTANT: make sure your notebook uses GPUs. In your notebook, click Runtime --> Change runtime type. Select *T4 GPU* and click Save.  If you miss this step you'll see errors at the Loading model step.\n"
+        "***IMPORTANT***: make sure your notebook uses GPUs.\n",
+        "\n",
+        "**Google Collab**: In your notebook, click Runtime --> Change runtime type, and select *T4 GPU* and click save.\n",
+        "\n",
+        "**Kaggle**: Click on \"More settings\" (3 vertical dots at the top-right) --> Accelerator, and select *P100 GPU*.\n",
+        "\n",
+        "\n",
+        "![kaggle-more-settings](./images/kaggle/select-accelerator.png)\n",
+        "If you miss this step you'll see errors at the Loading model step.\n"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {},
+      "source": [
+        "## How to run this notebook\n",
+        "\n",
+        "Unless you have a spare GPU with 16GB+ of VRAM laying around the house,\n",
+        "you'll need to run this notebook on an external platform such as\n",
+        "[Kaggle](https://www.kaggle.com) or [Google Collab](https://colab.research.google.com/)."
       ]
     },
     {
@@ -75,10 +94,41 @@
       },
       "source": [
         "\n",
+        "## Uploading Generated Data\n",
         "From your local machine, run the `lab generate` command per the [instructions in github](https://github.com/instruct-lab/cli/blob/main/README.md).\n",
         "\n",
         "Next, upload your data.\n",
         "\n",
+        "### Uploading data on Kaggle\n",
+        "\n",
+        "1. Expand on the Input tab on the right of the screen.\n",
+        "\n",
+        "![input](./images/kaggle/input.png)\n",
+        "\n",
+        "\n",
+        "2. Click on the \"Upload\" button, then select \"New Dataset\".\n",
+        "\n",
+        "Upload button:\n",
+        "\n",
+        "![input-upload](./images/kaggle/input-upload.png)\n",
+        "\n",
+        "New Dataset:\n",
+        "\n",
+        "![input-new-dataset](./images/kaggle/new-dataset.png)\n",
+        "\n",
+        "3. From here, you'll be prompted to upload your local files. Go ahead and select all of the files generated from `lab generate`.\n",
+        "\n",
+        "![upload-file](./images/kaggle/input-drop-files.png)\n",
+        "\n",
+        "4. Navigate to the _training_ file that was generated, right click on your uploaded file, then select 'Copy Path'\n",
+        "\n",
+        "![input-files-copy-path](./images/kaggle/copy-file-path.png)\n",
+        "\n",
+        "5. Paste the copied value in the cell below.\n",
+        "\n",
+        "\n",
+        "### Uploading data in Google Collab\n",
+        "\n",
         "To upload data in Google Colab,\n",
         "\n",
         "1. Click on the folder icon on the left of the screen.\n",

diff --git a/notebooks/images/kaggle/clear-outputs.png b/notebooks/images/kaggle/clear-outputs.png
diff --git a/notebooks/images/kaggle/copy-file-path.png b/notebooks/images/kaggle/copy-file-path.png
diff --git a/notebooks/images/kaggle/create-new-nb.png b/notebooks/images/kaggle/create-new-nb.png
diff --git a/notebooks/images/kaggle/create.png b/notebooks/images/kaggle/create.png
diff --git a/notebooks/images/kaggle/file-click.png b/notebooks/images/kaggle/file-click.png
diff --git a/notebooks/images/kaggle/import-nb.png b/notebooks/images/kaggle/import-nb.png
diff --git a/notebooks/images/kaggle/input-drop-files.png b/notebooks/images/kaggle/input-drop-files.png
diff --git a/notebooks/images/kaggle/input-upload.png b/notebooks/images/kaggle/input-upload.png
diff --git a/notebooks/images/kaggle/input.png b/notebooks/images/kaggle/input.png
diff --git a/notebooks/images/kaggle/new-dataset.png b/notebooks/images/kaggle/new-dataset.png
diff --git a/notebooks/images/kaggle/select-accelerator-p100.png b/notebooks/images/kaggle/select-accelerator-p100.png
diff --git a/notebooks/images/kaggle/select-accelerator.png b/notebooks/images/kaggle/select-accelerator.png