Filter computes based on computes allow list (#2397)

* making the changes for qna notebook * updating the qna notebook with computes allow list check * correcting the text * notebook changes for summarization * update text classification notebook with computes allow list * updating the token classification notebook with computes allow list filter * updating notebooks for computes allow list for translation * fixing formatting issues for qna * fixing formatting issues for summarization * fixing formatting issues for text cls * fixing formatting issues for token cls * fixing formatting issues for translation
Azure · Jun 26, 2023 · e2b0ca4 · e2b0ca4
1 parent d9aceba
commit e2b0ca4
Show file tree

Hide file tree

Showing 5 changed files with 665 additions and 300 deletions.
diff --git a/sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb b/sdk/python/foundation-models/system/finetune/question-answering/extractive-qa.ipynb
@@ -97,56 +97,6 @@
     "\n",
     "experiment_name = \"question-answering-extractive-qna\"\n",
     "\n",
-    "# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'\n",
-    "compute_cluster = \"gpu-cluster-big\"\n",
-    "try:\n",
-    "    compute = workspace_ml_client.compute.get(compute_cluster)\n",
-    "except Exception as ex:\n",
-    "    compute = AmlCompute(\n",
-    "        name=compute_cluster,\n",
-    "        size=\"Standard_NC24rs_v3\",\n",
-    "        max_instances=2,  # For multi node training set this to an integer value more than 1\n",
-    "    )\n",
-    "    workspace_ml_client.compute.begin_create_or_update(compute).wait()\n",
-    "\n",
-    "# This is the number of GPUs in a single node of the selected 'vm_size' compute.\n",
-    "# Setting this to less than the number of GPUs will result in underutilized GPUs, taking longer to train.\n",
-    "# Setting this to more than the number of GPUs will result in an error.\n",
-    "gpu_count_found = False\n",
-    "workspace_compute_sku_list = workspace_ml_client.compute.list_sizes()\n",
-    "available_sku_sizes = []\n",
-    "for compute_sku in workspace_compute_sku_list:\n",
-    "    available_sku_sizes.append(compute_sku.name)\n",
-    "    if compute_sku.name.lower() == compute.size.lower():\n",
-    "        gpus_per_node = compute_sku.gpus\n",
-    "        gpu_count_found = True\n",
-    "# if gpu_count_found not found, then print an error\n",
-    "if gpu_count_found:\n",
-    "    print(f\"Number of GPU's in compute {compute.size}: {gpus_per_node}\")\n",
-    "else:\n",
-    "    raise ValueError(\n",
-    "        f\"Number of GPU's in compute {compute.size} not found. Available skus are: {available_sku_sizes}.\"\n",
-    "        f\"This should not happen. Please check the selected compute cluster: {compute_cluster} and try again.\"\n",
-    "    )\n",
-    "# CPU based finetune works only for single-node single-process\n",
-    "if gpus_per_node == 0:\n",
-    "    print(\n",
-    "        \"WARNING! Selected compute doesn't have GPU. CPU based finetune is experimental and works on a single process in a single node\"\n",
-    "    )\n",
-    "    gpus_per_node = 1\n",
-    "\n",
-    "# Computes with K80 GPUs are not supported\n",
-    "unsupported_gpu_vm_list = [\n",
-    "    \"standard_nc6\",\n",
-    "    \"standard_nc12\",\n",
-    "    \"standard_nc24\",\n",
-    "    \"standard_nc24r\",\n",
-    "]\n",
-    "if compute.size.lower() in unsupported_gpu_vm_list:\n",
-    "    raise ValueError(\n",
-    "        f\"VM size {compute.size} is currently not supported for finetuning\"\n",
-    "    )\n",
-    "\n",
     "# genrating a unique timestamp that can be used for names and versions that need to be unique\n",
     "timestamp = str(int(time.time()))"
    ]
@@ -170,8 +120,7 @@
    "outputs": [],
    "source": [
     "model_name = \"bert-base-uncased\"\n",
-    "model_version = \"3\"\n",
-    "foundation_model = registry_ml_client.models.get(model_name, model_version)\n",
+    "foundation_model = registry_ml_client.models.get(model_name, label=\"latest\")\n",
     "print(\n",
     "    \"\\n\\nUsing model name: {0}, version: {1}, id: {2} for fine tuning\".format(\n",
     "        foundation_model.name, foundation_model.version, foundation_model.id\n",
@@ -184,7 +133,128 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 3. Pick the dataset for fine-tuning the model\n",
+    "### 3. Create a compute to be used with the job\n",
+    "\n",
+    "The finetune job works `ONLY` with `GPU` compute. The size of the compute depends on how big the model is and in most cases it becomes tricky to identify the right compute for the job. In this cell, we guide the user to select the right compute for the job.\n",
+    "\n",
+    "`NOTE1` The computes listed below work with the most optimized configuration. Any changes to the configuration might lead to Cuda Out Of Memory error. In such cases, try to upgrade the compute to a bigger compute size.\n",
+    "\n",
+    "`NOTE2` While selecting the compute_cluster_size below, make sure the compute is available in your resource group. If a particular compute is not available you can make a request to get access to the compute resources."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import ast\n",
+    "\n",
+    "if \"computes_allow_list\" in foundation_model.tags:\n",
+    "    computes_allow_list = ast.literal_eval(\n",
+    "        foundation_model.tags[\"computes_allow_list\"]\n",
+    "    )  # convert string to python list\n",
+    "    print(f\"Please create a compute from the above list - {computes_allow_list}\")\n",
+    "else:\n",
+    "    computes_allow_list = None\n",
+    "    print(\"Computes allow list is not part of model tags\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If you have a specific compute size to work with change it here. By default we use the 1 x V100 compute from the above list\n",
+    "compute_cluster_size = \"Standard_NC6s_v3\"\n",
+    "\n",
+    "# If you already have a gpu cluster, mention it here. Else will create a new one with the name 'gpu-cluster-big'\n",
+    "compute_cluster = \"gpu-cluster-big\"\n",
+    "\n",
+    "try:\n",
+    "    compute = workspace_ml_client.compute.get(compute_cluster)\n",
+    "    print(\"The compute cluster already exists! Reusing it for the current run\")\n",
+    "except Exception as ex:\n",
+    "    print(\n",
+    "        f\"Looks like the compute cluster doesn't exist. Creating a new one with compute size {compute_cluster_size}!\"\n",
+    "    )\n",
+    "    try:\n",
+    "        print(\"Attempt #1 - Trying to create a dedicated compute\")\n",
+    "        compute = AmlCompute(\n",
+    "            name=compute_cluster,\n",
+    "            size=compute_cluster_size,\n",
+    "            tier=\"Dedicated\",\n",
+    "            max_instances=2,  # For multi node training set this to an integer value more than 1\n",
+    "        )\n",
+    "        workspace_ml_client.compute.begin_create_or_update(compute).wait()\n",
+    "    except Exception as e:\n",
+    "        try:\n",
+    "            print(\n",
+    "                \"Attempt #2 - Trying to create a low priority compute. Since this is a low priority compute, the job could get pre-empted before completion.\"\n",
+    "            )\n",
+    "            compute = AmlCompute(\n",
+    "                name=compute_cluster,\n",
+    "                size=compute_cluster_size,\n",
+    "                tier=\"LowPriority\",\n",
+    "                max_instances=2,  # For multi node training set this to an integer value more than 1\n",
+    "            )\n",
+    "            workspace_ml_client.compute.begin_create_or_update(compute).wait()\n",
+    "        except Exception as e:\n",
+    "            print(e)\n",
+    "            raise ValueError(\n",
+    "                f\"WARNING! Compute size {compute_cluster_size} not available in workspace\"\n",
+    "            )\n",
+    "\n",
+    "\n",
+    "# Sanity check on the created compute\n",
+    "if computes_allow_list is not None:\n",
+    "    computes_allow_list_lower_case = [x.lower() for x in computes_allow_list]\n",
+    "    if compute.size.lower() not in computes_allow_list_lower_case:\n",
+    "        raise ValueError(\n",
+    "            f\"VM size {compute.size} is not in the allow-listed computes for finetuning\"\n",
+    "        )\n",
+    "else:\n",
+    "    # Computes with K80 GPUs are not supported\n",
+    "    unsupported_gpu_vm_list = [\n",
+    "        \"standard_nc6\",\n",
+    "        \"standard_nc12\",\n",
+    "        \"standard_nc24\",\n",
+    "        \"standard_nc24r\",\n",
+    "    ]\n",
+    "    if compute.size.lower() in unsupported_gpu_vm_list:\n",
+    "        raise ValueError(\n",
+    "            f\"VM size {compute.size} is currently not supported for finetuning\"\n",
+    "        )\n",
+    "\n",
+    "\n",
+    "# This is the number of GPUs in a single node of the selected 'vm_size' compute.\n",
+    "# Setting this to less than the number of GPUs will result in underutilized GPUs, taking longer to train.\n",
+    "# Setting this to more than the number of GPUs will result in an error.\n",
+    "gpu_count_found = False\n",
+    "workspace_compute_sku_list = workspace_ml_client.compute.list_sizes()\n",
+    "available_sku_sizes = []\n",
+    "for compute_sku in workspace_compute_sku_list:\n",
+    "    available_sku_sizes.append(compute_sku.name)\n",
+    "    if compute_sku.name.lower() == compute.size.lower():\n",
+    "        gpus_per_node = compute_sku.gpus\n",
+    "        gpu_count_found = True\n",
+    "# if gpu_count_found not found, then print an error\n",
+    "if gpu_count_found:\n",
+    "    print(f\"Number of GPU's in compute {compute.size}: {gpus_per_node}\")\n",
+    "else:\n",
+    "    raise ValueError(\n",
+    "        f\"Number of GPU's in compute {compute.size} not found. Available skus are: {available_sku_sizes}.\"\n",
+    "        f\"This should not happen. Please check the selected compute cluster: {compute_cluster} and try again.\"\n",
+    "    )"
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4. Pick the dataset for fine-tuning the model\n",
     "\n",
     "We use the [SQUAD](https://huggingface.co/datasets/squad) dataset. The next few cells show basic data preparation for fine tuning:\n",
     "* Visualize some data rows. Take note of the dataset fields: `question`, `context`, `answers`, `id` and `title`. The `answers` field has `start_key` and `text` fields in json format inside the `answers` field . The keys `question` and `context`, `answers`, `answer_start` and `text` are the relevant fields that need to be mapped to the parameters of the fine tuning pipeline.\n",
@@ -262,9 +332,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 4. Submit the fine tuning job using the the model and data as inputs\n",
+    "### 5. Submit the fine tuning job using the the model and data as inputs\n",
     " \n",
-    "Create the job that uses the `question-answering` pipeline component. [Learn more]() about all the parameters supported for fine tuning."
+    "Create the job that uses the `question-answering` pipeline component. [Learn more](https://github.com/Azure/azureml-assets/blob/main/training/finetune_acft_hf_nlp/components/pipeline_components/question_answering/README.md) about all the parameters supported for fine tuning."
    ]
   },
   {
@@ -328,6 +398,9 @@
     "        per_device_eval_batch_size=1,\n",
     "        learning_rate=2e-5,\n",
     "        metric_for_best_model=\"exact\",\n",
+    "        apply_lora=\"true\",\n",
+    "        apply_deepspeed=\"true\",\n",
+    "        apply_ort=\"true\",\n",
     "    )\n",
     "    return {\n",
     "        # map the output of the fine tuning job to the output of the pipeline job so that we can easily register the fine tuned model\n",
@@ -372,7 +445,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 5. Review training and evaluation metrics\n",
+    "### 6. Review training and evaluation metrics\n",
     "Viewing the job in AzureML studio is the best way to analyze logs, metrics and outputs of jobs. You can create custom charts and compare metics across different jobs. See https://learn.microsoft.com/en-us/azure/machine-learning/how-to-log-view-metrics?tabs=interactive#view-jobsruns-information-in-the-studio to learn more. \n",
     "\n",
     "However, we may need to access and review metrics programmatically for which we will use MLflow, which is the recommended client for logging and querying metrics."
@@ -439,7 +512,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 6. Register the fine tuned model with the workspace\n",
+    "### 7. Register the fine tuned model with the workspace\n",
     "\n",
     "We will register the model from the output of the fine tuning job. This will track lineage between the fine tuned model and the fine tuning job. The fine tuning job, further, tracks lineage to the foundation model, data and training code."
    ]
@@ -484,7 +557,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 7. Deploy the fine tuned model to an online endpoint\n",
+    "### 8. Deploy the fine tuned model to an online endpoint\n",
     "Online endpoints give a durable REST API that can be used to integrate with applications that need to use the model."
    ]
   },
@@ -543,7 +616,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 8. Test the endpoint with sample data\n",
+    "### 9. Test the endpoint with sample data\n",
     "\n",
     "We will fetch some sample data from the test dataset and submit to online endpoint for inference. We will then show the display the scored labels alongside the ground truth labels"
    ]
@@ -625,7 +698,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### 9. Delete the online endpoint\n",
+    "### 10. Delete the online endpoint\n",
     "Don't forget to delete the online endpoint, else you will leave the billing meter running for the compute used by the endpoint"
    ]
   },