diff --git a/Makefile b/Makefile index dee7067210..0f9035b0fa 100644 --- a/Makefile +++ b/Makefile @@ -36,6 +36,7 @@ NOTEBOOKS_TO_RUN += src/super_gradients/examples/model_export/models_export_pose NOTEBOOKS_TO_RUN += notebooks/what_are_recipes_and_how_to_use.ipynb NOTEBOOKS_TO_RUN += notebooks/transfer_learning_classification.ipynb NOTEBOOKS_TO_RUN += notebooks/how_to_use_knowledge_distillation_for_classification.ipynb +NOTEBOOKS_TO_RUN += notebooks/detection_how_to_connect_custom_dataset.ipynb NOTEBOOKS_TO_RUN += notebooks/PTQ_and_QAT_for_classification.ipynb NOTEBOOKS_TO_RUN += notebooks/quickstart_segmentation.ipynb NOTEBOOKS_TO_RUN += notebooks/segmentation_connect_custom_dataset.ipynb diff --git a/README.md b/README.md index 7deab15cbe..b1270cf3b9 100644 --- a/README.md +++ b/README.md @@ -224,7 +224,7 @@ model = models.get("model-name", pretrained_weights="pretrained-model-name") ### Object Detection * [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/3SkMohx) [Object Detection Transfer Learning](https://bit.ly/3SkMohx) -* [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://bit.ly/3dqDlg3) [How to Connect Custom Dataset](https://bit.ly/3SkMohx) +* [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Deci-AI/super-gradients/blob/master/notebooks/detection_how_to_connect_custom_dataset.ipynb) [How to Connect Custom Dataset](https://colab.research.google.com/github/Deci-AI/super-gradients/blob/master/notebooks/detection_how_to_connect_custom_dataset.ipynb) ### How to Predict Using Pre-trained Model diff --git a/notebooks/detection_how_to_connect_custom_dataset.ipynb b/notebooks/detection_how_to_connect_custom_dataset.ipynb new file mode 100644 index 0000000000..f95fedf165 --- /dev/null +++ b/notebooks/detection_how_to_connect_custom_dataset.ipynb @@ -0,0 +1,1461 @@ +{ + "nbformat": 4, + "nbformat_minor": 0, + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "name": "python3", + "display_name": "Python 3" + }, + "language_info": { + "name": "python" + }, + "accelerator": "GPU" + }, + "cells": [ + { + "cell_type": "markdown", + "source": [ + "![SG - Horizontal.png]()" + ], + "metadata": { + "id": "sh6t_y7KzqBH", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "# SuperGradients Object Detection How to Connect Custom Dataset\n", + "\n", + "In this tutorial we will explore how you can connect your custom object detection dataset to SG.\n", + "\n", + "Since SG trainer is fully compatible with PyTorch data loaders, we will demonstrate how to build one and use it.\n", + "\n", + "The notebook is divided into 5 sections:\n", + "1. Experiment setup\n", + "2. Dataset definition: create a proxy dataset and create a dataloader\n", + "3. Architecture definition: pre-trained YoloX on COCO\n", + "4. Training setup\n", + "5. Training and Evaluation\n" + ], + "metadata": { + "id": "5aISf1B-AGDQ", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "#Install SG" + ], + "metadata": { + "id": "-1nPOPmc1lGp", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "The cell below will install **super_gradients** which will automatically get all its dependencies. Let's import all the installed libraries to make sure they installed succesfully." + ], + "metadata": { + "id": "VAssbjJw7Yt1", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "JKce1SM6voVH", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "98939239-1891-4714-efad-4dd90df9766a", + "pycharm": { + "name": "#%%\n", + "is_executing": true + } + }, + "outputs": [], + "source": [ + "! pip install -q super_gradients==3.4.1" + ] + }, + { + "cell_type": "markdown", + "source": [ + "# 1. Experiment setup" + ], + "metadata": { + "id": "njthhNJR1pJm", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YPym4wvpOcOJ", + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "We will first initialize our **trainer** which will be in charge of everything, like training, evaluation, saving checkpoints, plotting etc.\n", + "\n", + "The **experiment name** argument is important as every checkpoints, logs and tensorboards to be saved in a directory with the same name. This directory will be created as a sub-directory of **ckpt_root_dir** as follow:\n", + "\n", + "```\n", + "ckpt_root_dir\n", + "|─── experiment_name_1\n", + "│ ckpt_best.pth # Model checkpoint on best epoch\n", + "│ ckpt_latest.pth # Model checkpoint on last epoch\n", + "│ average_model.pth # Model checkpoint averaged over epochs\n", + "│ events.out.tfevents.1659878383... # Tensorflow artifacts of a specific run\n", + "│ log_Aug07_11_52_48.txt # Trainer logs of a specific run\n", + "└─── experiment_name_2\n", + " ...\n", + "```\n", + "In this notebook multi-gpu training is set as `OFF`, for Distributed training multi_gpu can be set as\n", + " `MultiGPUMode.DISTRIBUTED_DATA_PARALLEL` or `MultiGPUMode.DATA_PARALLEL`.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "source": [ + "Let's define **ckpt_root_dir** inside the Colab, later we can use it to start TensorBoard and monitor the run." + ], + "metadata": { + "id": "A2PlnTWpimnH", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "from super_gradients.training import Trainer\n", + "\n", + "\n", + "CHECKPOINT_DIR = '/home/notebook_ckpts/'\n", + "trainer = Trainer(experiment_name='transfer_learning_object_detection_yolox', ckpt_root_dir=CHECKPOINT_DIR)" + ], + "metadata": { + "id": "_v1N3kXs3wo1", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 3, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# 2. Dataset definition\n" + ], + "metadata": { + "id": "J9ZaMulSvwhr", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## 2.A Generate Proxy Dataset" + ], + "metadata": { + "id": "_1TXuJKkKzFJ", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "A proxy dataset generation is available merely to demonstrate an end-to-end training pipeline in this notebook.\n", + "\n", + "The expected data is in COCO format where the labels are class_id X_center Y_center H W .\n" + ], + "metadata": { + "id": "Y7us7VHRig7M", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "from super_gradients.training.utils.ssd_utils import SSDPostPredictCallback, DefaultBoxes\n", + "from super_gradients.training.metrics import DetectionMetrics\n", + "from PIL import Image\n", + "import os\n", + "import numpy as np\n", + "import torch\n", + "\n", + "\n", + "# Anchors\n", + "def dboxes():\n", + " figsize = 96\n", + " feat_size = [6, 3, 2, 1]\n", + " scales = [9, 36, 64, 91, 96]\n", + " aspect_ratios = [[2], [2], [2], [2]]\n", + " return DefaultBoxes(figsize, feat_size, scales, aspect_ratios)\n", + "\n", + "def base_detection_collate_fn(batch):\n", + " \"\"\"\n", + " Simple aggregation function to batched the data\n", + " \"\"\"\n", + "\n", + " images_batch, labels_batch = list(zip(*batch))\n", + " for i, labels in enumerate(labels_batch):\n", + " # ADD TARGET IMAGE INDEX\n", + " labels[:, 0] = i\n", + "\n", + " return torch.stack(images_batch, 0), torch.cat(labels_batch, 0)\n", + "\n", + "# creation of proxy dataset to demonstrate usage\n", + "def generate_proxy_dataset( write_path):\n", + " # Create training files and text\n", + " os.makedirs(os.path.join(write_path, 'images', 'train'), exist_ok=True)\n", + " os.makedirs(os.path.join(write_path, 'images', 'val'), exist_ok=True)\n", + " os.makedirs(os.path.join(write_path, 'labels', 'train'), exist_ok=True)\n", + " os.makedirs(os.path.join(write_path, 'labels', 'val'), exist_ok=True)\n", + " train_fp = open(os.path.join(write_path, 'train.txt'), 'w')\n", + " val_fp = open(os.path.join(write_path, 'val.txt'), 'w')\n", + "\n", + " for n in range(10):\n", + " a = np.random.rand(96, 96, 3) * 255\n", + " im_out = Image.fromarray(a.astype('uint8')).convert('RGB')\n", + " im_string = '%000d.jpg' % n\n", + " im_out.save(os.path.join(write_path, 'images', 'train', im_string))\n", + " im_out.save(os.path.join(write_path, 'images', 'val', im_string))\n", + " train_fp.write((os.path.join(write_path,'images', 'train', im_string)) + '\\n')\n", + " val_fp.write((os.path.join(write_path, 'images', 'val', im_string)) + '\\n')\n", + "\n", + " # Create label files\n", + " train_label_fp = open(os.path.join(write_path, 'labels', 'train', im_string.replace('.jpg','.txt')), 'w')\n", + " val_label_fp = open(os.path.join(write_path, 'labels', 'val', im_string.replace('.jpg','.txt')), 'w')\n", + " for b in range(5):\n", + " cls = np.random.randint(0, 7)\n", + " loc = np.random.uniform(0.25, 0.5)\n", + " train_label_fp.write(f'{cls} {loc - 0.1} {loc + 0.1} {loc - 0.2} {loc + 0.2}' + '\\n')\n", + " val_label_fp.write(f'{cls} {loc - 0.1} {loc + 0.1} {loc - 0.2} {loc + 0.2}' + '\\n')\n" + ], + "metadata": { + "id": "wbdVYnIyjgv-", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 4, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "generate_proxy_dataset('/content/example_data')" + ], + "metadata": { + "id": "DXu4yfuZoiv0", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 5, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## 2.B Create Torch Dataset" + ], + "metadata": { + "id": "MDksFYrIqClt", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "import torch\n", + "from torch.utils.data import Dataset\n", + "import json\n", + "import os\n", + "from PIL import Image\n", + "from torchvision import transforms, utils\n", + "import numpy as np\n", + "\n", + "\n", + "class CustomDataset(Dataset):\n", + " \"\"\"\n", + " A PyTorch Dataset class to be used in a PyTorch DataLoader to create batches.\n", + " \"\"\"\n", + "\n", + " def __init__(self, data_folder, split, keep_difficult=False):\n", + " \"\"\"\n", + " :param data_folder: folder where data files are stored\n", + " :param split: split, one of 'TRAIN' or 'TEST'\n", + " :param keep_difficult: keep or discard objects that are considered difficult to detect?\n", + " \"\"\"\n", + " self.split = split.lower()\n", + "\n", + " assert self.split in {'train', 'val'}\n", + "\n", + " self.data_folder = data_folder\n", + " self.keep_difficult = keep_difficult\n", + "\n", + " # Read data files\n", + " with open(os.path.join(data_folder, self.split + '.txt'), 'r') as j:\n", + " self.images = j.readlines()\n", + "\n", + " def __getitem__(self, i):\n", + " # Read image and label\n", + " image = Image.open(self.images[i].replace(\"\\n\",\"\"), mode='r').resize((320, 320))\n", + " image_tensor = torch.tensor(np.array(image)).permute(2, 0, 1).float()\n", + " labels = np.loadtxt(self.images[i].replace(\"jpg\\n\",\"txt\").replace(\"images\", \"labels\"))\n", + " return image_tensor, labels\n", + "\n", + "\n", + " def __len__(self):\n", + " return len(self.images)\n" + ], + "metadata": { + "id": "AGziBKSIqaUu", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 6, + "outputs": [] + }, + { + "cell_type": "code", + "source": [ + "train_dataset = CustomDataset(\"/content/example_data\", split=\"train\")\n", + "val_dataset = CustomDataset(\"/content/example_data\", split=\"val\")" + ], + "metadata": { + "id": "2B0hlas_1Rh-", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 7, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Let's have a look at the first image:" + ], + "metadata": { + "id": "eIG5tsiuor9E", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "train_dataset[1][0]" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "ZsHqcq1jpN0F", + "outputId": "e9ac3efa-75a9-4ea6-e46c-277ab418a66f", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 8, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[[135., 136., 135., ..., 1., 0., 0.],\n", + " [130., 131., 132., ..., 10., 0., 0.],\n", + " [117., 119., 123., ..., 31., 20., 19.],\n", + " ...,\n", + " [189., 185., 179., ..., 172., 176., 178.],\n", + " [215., 209., 196., ..., 174., 171., 169.],\n", + " [226., 218., 203., ..., 176., 169., 167.]],\n", + "\n", + " [[ 80., 82., 85., ..., 58., 45., 38.],\n", + " [ 76., 78., 83., ..., 65., 53., 47.],\n", + " [ 64., 67., 75., ..., 85., 73., 68.],\n", + " ...,\n", + " [ 96., 95., 91., ..., 194., 203., 208.],\n", + " [122., 118., 107., ..., 198., 200., 202.],\n", + " [132., 127., 113., ..., 200., 200., 200.]],\n", + "\n", + " [[195., 193., 190., ..., 47., 30., 21.],\n", + " [189., 188., 185., ..., 56., 39., 30.],\n", + " [173., 173., 174., ..., 77., 61., 54.],\n", + " ...,\n", + " [132., 130., 124., ..., 97., 96., 97.],\n", + " [156., 150., 137., ..., 95., 86., 84.],\n", + " [165., 158., 143., ..., 94., 82., 79.]]])" + ] + }, + "metadata": {}, + "execution_count": 8 + } + ] + }, + { + "cell_type": "code", + "source": [ + "train_dataset[1][0].shape" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "SWhwpP0H5_3z", + "outputId": "5b12d11d-dc76-421c-e635-37c9d5fce8ed", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 9, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "torch.Size([3, 320, 320])" + ] + }, + "metadata": {}, + "execution_count": 9 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "Let's have a look at the first label" + ], + "metadata": { + "id": "8y2rKSxTo4HR", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "train_dataset[1][1]" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "X4qdL8KGpIZJ", + "outputId": "c3b8c530-9f6b-40dc-eca7-72e310750fb3", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 10, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "array([[0. , 0.18314663, 0.38314663, 0.08314663, 0.48314663],\n", + " [4. , 0.2662562 , 0.4662562 , 0.1662562 , 0.5662562 ],\n", + " [0. , 0.24051457, 0.44051457, 0.14051457, 0.54051457],\n", + " [6. , 0.30498596, 0.50498596, 0.20498596, 0.60498596],\n", + " [2. , 0.2363268 , 0.4363268 , 0.1363268 , 0.5363268 ]])" + ] + }, + "metadata": {}, + "execution_count": 10 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "This target has 5 boxes, each from a different class.\n", + "\n", + "Each label include the following [img_id, class, X_center, Y_center, H, W ]\n" + ], + "metadata": { + "id": "6BC44LrJpSpM", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "train_dataset[1][1].shape" + ], + "metadata": { + "id": "MEE-VNBdrl96", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "433f8150-dd5b-4771-ce98-38869de9205a", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 11, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "(5, 5)" + ] + }, + "metadata": {}, + "execution_count": 11 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "## 2.C Create Torch Dataloader" + ], + "metadata": { + "id": "aWfFrYLzo9j8", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "As each target may include a diffrent number of boxes we define Dataloader.collate_fn to be DetectionCollateFN. More information regarding this function can be found [here](https://https://github.com/Deci-AI/super-gradients/blob/a47fa1d9b6689df9228df0e56fe73600565d2a32/src/super_gradients/training/utils/detection_utils.py)" + ], + "metadata": { + "id": "D3ThxDIopDDB", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "from torch.utils.data import Dataset, DataLoader\n", + "from super_gradients.training.utils.collate_fn.detection_collate_fn import DetectionCollateFN\n", + "\n", + "train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True, num_workers=2, collate_fn=DetectionCollateFN())\n", + "val_dataloader = DataLoader(val_dataset, batch_size=4, shuffle=False, num_workers=2, collate_fn=DetectionCollateFN())" + ], + "metadata": { + "id": "XrWjWfjXnw_r", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 12, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "Lets' have a look at the first batch:\n", + "\n", + "We have 4 images in each batch, each include 5 labels.\n", + "\n", + "Each label include the following [img_id, class, X_center, Y_center, H, W ]\n" + ], + "metadata": { + "id": "vB1sGPO8qwZJ", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "next(iter(train_dataloader))[1]" + ], + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "O-KuZQ3XBduM", + "outputId": "44cdcf3b-4a2a-49db-ae7f-cd1ffac07c27", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 13, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/collate_fn/detection_collate_fn.py:29: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n", + " images_batch = [torch.tensor(img) for img in images_batch]\n", + "/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/collate_fn/detection_collate_fn.py:29: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n", + " images_batch = [torch.tensor(img) for img in images_batch]\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "tensor([[0.0000, 6.0000, 0.2528, 0.4528, 0.1528, 0.5528],\n", + " [0.0000, 6.0000, 0.2613, 0.4613, 0.1613, 0.5613],\n", + " [0.0000, 3.0000, 0.3278, 0.5278, 0.2278, 0.6278],\n", + " [0.0000, 4.0000, 0.3212, 0.5212, 0.2212, 0.6212],\n", + " [0.0000, 0.0000, 0.3695, 0.5695, 0.2695, 0.6695],\n", + " [1.0000, 5.0000, 0.2007, 0.4007, 0.1007, 0.5007],\n", + " [1.0000, 2.0000, 0.1661, 0.3661, 0.0661, 0.4661],\n", + " [1.0000, 5.0000, 0.2157, 0.4157, 0.1157, 0.5157],\n", + " [1.0000, 4.0000, 0.2274, 0.4274, 0.1274, 0.5274],\n", + " [1.0000, 6.0000, 0.3697, 0.5697, 0.2697, 0.6697],\n", + " [2.0000, 1.0000, 0.1861, 0.3861, 0.0861, 0.4861],\n", + " [2.0000, 5.0000, 0.1755, 0.3755, 0.0755, 0.4755],\n", + " [2.0000, 6.0000, 0.3389, 0.5389, 0.2389, 0.6389],\n", + " [2.0000, 1.0000, 0.2743, 0.4743, 0.1743, 0.5743],\n", + " [2.0000, 2.0000, 0.2375, 0.4375, 0.1375, 0.5375],\n", + " [3.0000, 4.0000, 0.2116, 0.4116, 0.1116, 0.5116],\n", + " [3.0000, 4.0000, 0.2414, 0.4414, 0.1414, 0.5414],\n", + " [3.0000, 5.0000, 0.2835, 0.4835, 0.1835, 0.5835],\n", + " [3.0000, 6.0000, 0.2687, 0.4687, 0.1687, 0.5687],\n", + " [3.0000, 3.0000, 0.2546, 0.4546, 0.1546, 0.5546]], dtype=torch.float64)" + ] + }, + "metadata": {}, + "execution_count": 13 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "\n", + "# 3. Architecture definition" + ], + "metadata": { + "id": "fFfvyMHU32QF", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "SG includes implementations of many different architectures for object detection tasks that can be found [here](https://github.com/Deci-AI/super-gradients#implemented-model-architectures)." + ], + "metadata": { + "id": "EpqgjQjl4awr", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "As mentioned earlier, the architecture that we'll use is based on [YOLOX: Exceeding YOLO Series in 2021](https://arxiv.org/pdf/2107.08430.pdf) and its performance can be view [here](https://github.com/Deci-AI/super-gradients#implemented-model-architectures).\n", + "This architecture was pretrained on COCO2017 datasets.\n", + "\n", + "There are 80 categories in COCO detection, but we want only 8 for our proxy dataset, which means that the predicting layers of the pre-trained model don't work for us. There is a bit of magic for this use case behind the scenes that SuperGradients does for you: you can specify **num_classes** during model construction and it'll automatically replace model's head to a suitable one. As for the pre-trained weights, they will be used for all layers except of those that are replaced." + ], + "metadata": { + "id": "GNM64JAa4sbF", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "from super_gradients.training import models\n", + "\n", + "\n", + "model = models.get(\"yolox_n\", pretrained_weights=\"coco\", num_classes=8)\n", + "model.num_classes" + ], + "metadata": { + "id": "YDK4btf04Gbu", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "5191cbb9-e010-4875-c0d7-fde22f30b44a", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 14, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "Downloading: \"https://sghub.deci.ai/models/yolox_n_coco.pth\" to /root/.cache/torch/hub/checkpoints/yolox_n_coco.pth\n", + "100%|██████████| 11.1M/11.1M [00:00<00:00, 15.6MB/s]\n", + "[2023-10-30 15:09:51] INFO - checkpoint_utils.py - Successfully loaded pretrained weights for architecture yolox_n\n" + ] + }, + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "8" + ] + }, + "metadata": {}, + "execution_count": 14 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "That being said, SG allows you to use one of SG implemented architectures or your custom architecture, as long as it inherits torch.nn.Module." + ], + "metadata": { + "id": "40UcYJ3u5JyF", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "# 4. Training setup\n" + ], + "metadata": { + "id": "LYPVR-XM4GsZ", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "With a model and data in hand we have almost everything to start training, only the recipe is yet to be defined. For this use case we can start with the training recipe that was used for pre-training our model. Let's load it and take a look at the training parameters it gives us." + ], + "metadata": { + "id": "6K_56lDV8azX", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "from super_gradients.training import training_hyperparams\n", + "\n", + "train_params = training_hyperparams.get('coco2017_yolox')\n", + "train_params" + ], + "metadata": { + "id": "3eRe0hBz4G1n", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "87f9bbf1-157b-4f2d-a3cd-410377d60f10", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 21, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "{'resume': None,\n", + " 'run_id': None,\n", + " 'resume_path': None,\n", + " 'resume_from_remote_sg_logger': False,\n", + " 'ckpt_name': 'ckpt_latest.pth',\n", + " 'lr_mode': 'CosineLRScheduler',\n", + " 'lr_schedule_function': None,\n", + " 'lr_warmup_epochs': 5,\n", + " 'lr_warmup_steps': 0,\n", + " 'lr_cooldown_epochs': 15,\n", + " 'warmup_initial_lr': None,\n", + " 'step_lr_update_freq': None,\n", + " 'cosine_final_lr_ratio': 0.05,\n", + " 'warmup_mode': 'LinearEpochLRWarmup',\n", + " 'lr_updates': [],\n", + " 'pre_prediction_callback': None,\n", + " 'optimizer': 'SGD',\n", + " 'optimizer_params': {'momentum': 0.9, 'weight_decay': 0.0005, 'nesterov': True},\n", + " 'load_opt_params': True,\n", + " 'zero_weight_decay_on_bias_and_bn': True,\n", + " 'loss': 'YoloXDetectionLoss',\n", + " 'criterion_params': {'strides': [8, 16, 32], 'num_classes': 80},\n", + " 'ema': True,\n", + " 'ema_params': {'decay': 0.9999, 'decay_type': 'exp', 'beta': 15},\n", + " 'train_metrics_list': [],\n", + " 'valid_metrics_list': [{'DetectionMetrics': {'normalize_targets': True, 'post_prediction_callback': YoloXPostPredictionCallback(), 'num_cls': 80}}],\n", + " 'metric_to_watch': 'mAP@0.50:0.95',\n", + " 'greater_metric_to_watch_is_better': True,\n", + " 'launch_tensorboard': False,\n", + " 'tensorboard_port': None,\n", + " 'tb_files_user_prompt': False,\n", + " 'save_tensorboard_to_s3': False,\n", + " 'precise_bn': False,\n", + " 'precise_bn_batch_size': None,\n", + " 'sync_bn': False,\n", + " 'silent_mode': False,\n", + " 'mixed_precision': True,\n", + " 'save_ckpt_epoch_list': [285],\n", + " 'average_best_models': True,\n", + " 'dataset_statistics': False,\n", + " 'batch_accumulate': 1,\n", + " 'run_validation_freq': 1,\n", + " 'run_test_freq': 1,\n", + " 'save_model': True,\n", + " 'seed': 42,\n", + " 'phase_callbacks': [{'YoloXTrainingStageSwitchCallback': {'next_stage_start_epoch': 285}}],\n", + " 'log_installed_packages': True,\n", + " 'clip_grad_norm': None,\n", + " 'ckpt_best_name': 'ckpt_best.pth',\n", + " 'max_train_batches': None,\n", + " 'max_valid_batches': None,\n", + " 'sg_logger': 'base_sg_logger',\n", + " 'sg_logger_params': {'tb_files_user_prompt': False, 'launch_tensorboard': False, 'tensorboard_port': None, 'save_checkpoints_remote': False, 'save_tensorboard_remote': False, 'save_logs_remote': False, 'monitor_system': True},\n", + " 'torch_compile': False,\n", + " 'torch_compile_loss': False,\n", + " 'torch_compile_options': {'mode': 'reduce-overhead', 'fullgraph': False, 'dynamic': False, 'backend': 'inductor', 'options': None, 'disable': False},\n", + " '_convert_': 'all',\n", + " 'max_epochs': 300,\n", + " 'initial_lr': 0.02}" + ] + }, + "metadata": {}, + "execution_count": 21 + } + ] + }, + { + "cell_type": "markdown", + "source": [ + "If it looks like a daunting numbers of parameters, remember that they define everything necessary for SG to know how to train your model: metrics, an optimizer with its parameters, a loss, EMA parameters, various callbacks, etc. If you wish, take a moment to go over them and get aquainted, for the most part, the structure is shared between all the recipes in SuperGradients.\n", + "\n", + "For the sake of this tutorial, we'll change a few parameters above to values that make more sense for a quick transer learning. We'll set a small number of epochs, disable warmup and cooldown, pass a correct number of classes into the loss (criterion) and set a much smaller learning rate not to alter the models weights too much too fast. Mixed precision is also disabled because it is not supported for the CPU training of Colab." + ], + "metadata": { + "id": "65MTR4hd5Fp1", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "train_params['max_epochs'] = 5\n", + "train_params['lr_warmup_epochs'] = 0\n", + "train_params['lr_cooldown_epochs'] = 0\n", + "train_params['criterion_params']['num_classes'] = 8\n", + "train_params['average_best_models'] = False\n", + "train_params['initial_lr'] = 0.0005\n", + "train_params['cosine_final_lr_ratio'] = 0.9\n", + "train_params['mixed_precision'] = False" + ], + "metadata": { + "id": "HRZhFnEk8XzL", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 16, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "# 5. Training and evaluation\n" + ], + "metadata": { + "id": "D3tVVUhy4OqP", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## 5.A. Connect tensor board" + ], + "metadata": { + "id": "T0k_F-7C8-j6", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "You can instantiate a tensorboard to be able to monitor the run live.\n", + "\n", + "This was commented out because it cannot be saved in the notebook, just uncomment the following code if you want to start in a tensorboard" + ], + "metadata": { + "id": "se-cC5sK9Cfz", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "# %load_ext tensorboard\n", + "# %tensorboard --logdir $CHECKPOINT_DIR --bind_all" + ], + "metadata": { + "id": "ExGJWUHE4O1c", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 17, + "outputs": [] + }, + { + "cell_type": "markdown", + "source": [ + "## 5.B. Start Training" + ], + "metadata": { + "id": "hqk9fUXt9Ift", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "The logs and the checkpoint for the latest epoch will be kept in your experiment folder.\n", + "\n", + "To start training we'll call train(...) and provide it with the objects we construted above: the model, the training parameters and the data loaders.\n", + "\n", + "**Note:** While training, don't forget to refresh the tensorboard with the arrow on the top right." + ], + "metadata": { + "id": "8tKUuxbe9NlQ", + "pycharm": { + "name": "#%% md\n" + } + } + }, + { + "cell_type": "code", + "source": [ + "trainer.train(model=model, training_params=train_params, train_loader=train_dataloader, valid_loader=val_dataloader)" + ], + "metadata": { + "id": "-Ojnc1bk9L3s", + "colab": { + "base_uri": "https://localhost:8080/" + }, + "outputId": "6d9b0e33-98c2-4377-9913-cc402c9732c0", + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": 18, + "outputs": [ + { + "output_type": "stream", + "name": "stderr", + "text": [ + "[2023-10-30 15:09:51] WARNING - sg_trainer.py - Train dataset size % batch_size != 0 and drop_last=False, this might result in smaller last batch.\n", + "[2023-10-30 15:09:51] INFO - sg_trainer.py - Starting a new run with `run_id=RUN_20231030_150951_846432`\n", + "[2023-10-30 15:09:51] INFO - sg_trainer.py - Checkpoints directory: /home/notebook_ckpts/transfer_learning_object_detection_yolox/RUN_20231030_150951_846432\n" + ] + }, + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Training parameters:\n", + "{\n", + " \"resume\" : null,\n", + " \"run_id\" : null,\n", + " \"resume_path\" : null,\n", + " \"resume_from_remote_sg_logger\" : false,\n", + " \"ckpt_name\" : \"ckpt_latest.pth\",\n", + " \"lr_mode\" : \"CosineLRScheduler\",\n", + " \"lr_schedule_function\" : null,\n", + " \"lr_warmup_epochs\" : 5,\n", + " \"lr_warmup_steps\" : 0,\n", + " \"lr_cooldown_epochs\": 15,\n", + " \"warmup_initial_lr\" : null,\n", + " \"step_lr_update_freq\" : null,\n", + " \"cosine_final_lr_ratio\" : 0.05,\n", + " \"warmup_mode\" : \"LinearEpochLRWarmup\",\n", + " \"lr_updates\": [],\n", + " \"pre_prediction_callback\" : null,\n", + " \"optimizer\" : \"SGD\",\n", + " \"optimizer_params\" : {\"momentum\": 0.9, \"weight_decay\": 0.0005, \"nesterov\": true},\n", + " \"load_opt_params\" : true,\n", + " \"zero_weight_decay_on_bias_and_bn\" : true,\n", + " \"loss\" : \"YoloXDetectionLoss\",\n", + " \"criterion_params\" : {\"strides\": [8, 16, 32], \"num_classes\": 80},\n", + " \"ema\" : true,\n", + " \"ema_params\": {\"decay\": 0.9999, \"decay_type\": \"exp\", \"beta\": 15},\n", + " \"train_metrics_list\": [],\n", + " \"valid_metrics_list\":\n", + " [\n", + " {\n", + " \"DetectionMetrics\":\n", + " {\"normalize_targets\": true, \"post_prediction_callback\": YoloXPostPredictionCallback(), \"num_cls\": 80}\n", + " }\n", + " ],\n", + " \"metric_to_watch\" : \"mAP@0.50:0.95\",\n", + " \"greater_metric_to_watch_is_better\" : true,\n", + " \"launch_tensorboard\": false,\n", + " \"tensorboard_port\" : null,\n", + " \"tb_files_user_prompt\" : false,\n", + " \"save_tensorboard_to_s3\": false,\n", + " \"precise_bn\": false,\n", + " \"precise_bn_batch_size\" : null,\n", + " \"sync_bn\" : false,\n", + " \"silent_mode\" : false,\n", + " \"mixed_precision\" : true,\n", + " \"save_ckpt_epoch_list\" : [285],\n", + " \"average_best_models\" : true,\n", + " \"dataset_statistics\": false,\n", + " \"batch_accumulate\" : 1,\n", + " \"run_validation_freq\" : 1,\n", + " \"run_test_freq\" : 1,\n", + " \"save_model\": true,\n", + " \"seed\" : 42,\n", + " \"phase_callbacks\" : [{\"YoloXTrainingStageSwitchCallback\": {\"next_stage_start_epoch\": 285}}],\n", + " \"log_installed_packages\": true,\n", + " \"clip_grad_norm\" : null,\n", + " \"ckpt_best_name\" : \"ckpt_best.pth\",\n", + " \"max_train_batches\" : null,\n", + " \"max_valid_batches\" : null,\n", + " \"sg_logger\" : \"base_sg_logger\",\n", + " \"sg_logger_params\":\n", + " {\n", + " \"tb_files_user_prompt\" : false,\n", + " \"launch_tensorboard\": false,\n", + " \"tensorboard_port\" : null,\n", + " \"save_checkpoints_remote\" : false,\n", + " \"save_tensorboard_remote\" : false,\n", + " \"save_logs_remote\" : false,\n", + " \"monitor_system\" : true\n", + " },\n", + " \"torch_compile\" : false,\n", + " \"torch_compile_loss\": false,\n", + " \"torch_compile_options\":\n", + " {\n", + " \"mode\" : \"reduce-overhead\",\n", + " \"fullgraph\" : false,\n", + " \"dynamic\" : false,\n", + " \"backend\" : \"inductor\",\n", + " \"options\" : null,\n", + " \"disable\" : false\n", + " },\n", + " \"_convert_\" : \"all\",\n", + " \"max_epochs\": 300,\n", + " \"initial_lr\": 0.02\n", + "}\n", + "The console stream is now moved to /home/notebook_ckpts/transfer_learning_object_detection_yolox/RUN_20231030_150951_846432/console_Oct30_15_09_51.txt\n" + ] + }, + { + "output_type": "stream", + "name": "stderr", + "text": [ + "[2023-10-30 15:09:51] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9999, 'decay_type': 'exp', 'beta': 15}\n", + "[2023-10-30 15:09:51] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9999, 'decay_type': 'exp', 'beta': 15}\n", + "/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/collate_fn/detection_collate_fn.py:29: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n", + " images_batch = [torch.tensor(img) for img in images_batch]\n", + "[2023-10-30 15:09:51] INFO - sg_trainer.py - Using EMA with params {'decay': 0.9999, 'decay_type': 'exp', 'beta': 15}\n", + "/usr/local/lib/python3.10/dist-packages/super_gradients/training/utils/collate_fn/detection_collate_fn.py:29: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).\n", + " images_batch = [torch.tensor(img) for img in images_batch]\n", + "[2023-10-30 15:09:52] INFO - sg_trainer_utils.py - TRAINING PARAMETERS:\n", + " - Mode: Single GPU\n", + " - Number of GPUs: 0 (0 available on the machine)\n", + " - Full dataset size: 10 (len(train_set))\n", + " - Batch size per GPU: 4 (batch_size)\n", + " - Batch Accumulate: 1 (batch_accumulate)\n", + " - Total batch size: 4 (num_gpus * batch_size)\n", + " - Effective Batch size: 4 (num_gpus * batch_size * batch_accumulate)\n", + " - Iterations per epoch: 3 (len(train_loader))\n", + " - Gradient updates per epoch: 3 (len(train_loader) / batch_accumulate)\n", + "\n", + "[2023-10-30 15:09:52] INFO - sg_trainer.py - Started training for 5 epochs (0/4)\n", + "\n", + "Train epoch 0: 0%| | 0/3 [00:00