202307-notebooks Template amends (#14683)

Co-authored-by: writer-jill <jill.osborne@imply.io>
apache · Aug 15, 2023 · e33d2db · e33d2db
1 parent 2fdf5b1
commit e33d2db
Showing 1 changed file with 216 additions and 26 deletions.
diff --git a/examples/quickstart/jupyter-notebooks/notebooks/99-contributing/notebook-template.ipynb b/examples/quickstart/jupyter-notebooks/notebooks/99-contributing/notebook-template.ipynb
@@ -5,7 +5,7 @@
    "id": "0cb3b009-ebde-4d56-9d59-a028d66d8309",
    "metadata": {},
    "source": [
-    "# Title\n",
+    "# (Result) by (action) using (feature)\n",
     "<!--\n",
     "  ~ Licensed to the Apache Software Foundation (ASF) under one\n",
     "  ~ or more contributor license agreements.  See the NOTICE file\n",
@@ -24,48 +24,72 @@
     "  ~ specific language governing permissions and limitations\n",
     "  ~ under the License.\n",
     "  -->\n",
-    "Introduction to Notebook\n",
-    "Lorem Ipsum"
+    "\n",
+    "Introductory paragraph - for example:\n",
+    "\n",
+    "This tutorial demonstrates how to work with [feature](link to feature doc). In this tutorial you perform the following tasks:\n",
+    "\n",
+    "- Task 1\n",
+    "- Task 2\n",
+    "- Task 3\n",
+    "- etc\n",
+    "\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "bbdbf6ad-ca7b-40f5-8ca3-1070f4a3ee42",
+   "id": "b74aa63d-3d21-472d-8ade-8573ef3c50cf",
    "metadata": {},
    "source": [
-    "## Prerequisites\n",
-    "\n",
-    "This tutorial works with Druid XX.0.0 or later.\n",
+    "## Table of contents\n",
     "\n",
-    "Launch this tutorial and all prerequisites using the `all-services` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n"
+    "- [Prerequisites](#Prerequisites)\n",
+    "- [Initalization](#Initalization)\n",
+    "- [Next section](#Nextsection)\n",
+    "- etc"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "7ee6aef8-a11d-48d5-bcdc-e6231ba594b7",
+   "id": "bbdbf6ad-ca7b-40f5-8ca3-1070f4a3ee42",
    "metadata": {},
    "source": [
-    "<details><summary>    \n",
-    "<b>Run without Docker Compose</b>    \n",
-    "</summary>\n",
+    "## Prerequisites\n",
+    "\n",
+    "This tutorial works with Druid XX.0.0 or later.\n",
+    "\n",
+    "#### Run with Docker\n",
+    "\n",
+    "<!-- Profiles are:\n",
+    "`druid-jupyter` - just Jupyter and Druid\n",
+    "`all-services` - includes Jupyter, Druid, and Kafka\n",
+    " -->\n",
+    "\n",
+    "Launch this tutorial and all prerequisites using the ....... profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
+    "   \n",
+    "#### Run without Docker\n",
     "\n",
-    "In order to run this notebook you will need:\n",
+    "If you do not use the Docker Compose environment, you need the following:\n",
     "\n",
-    "<b>Required Services</b>\n",
-    "* <!-- include list of components needed for notebook, i.e. kafka, druid instance, etc. -->\n",
+    "* A running Apache Druid instance, with a `DRUID_HOST` local environment variable containing the server name of your Druid router\n",
+    "* [druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md), a Python client for Apache Druid. Follow the instructions in the Install section of the README file.\n",
     "\n",
-    "<b>Python packages</b>\n",
-    "* druidapi, a [Python client for Apache Druid](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md)\n",
-    "*  <!-- include any python package dependencies -->\n",
-    "</details>"
+    " <!-- Remove as needed -->\n",
+    "* A running Apache Kafka instance, with a `KAFKA_HOST` local environment variable containing the broker server name.\n",
+    "* [matplotlib](https://matplotlib.org/), a library for creating visualizations in Python.\n",
+    "* [pandas](https://pandas.pydata.org/), a data analysis and manipulation tool."
    ]
   },
   {
    "cell_type": "markdown",
    "id": "5007a243-b81a-4601-8f57-5b14940abbff",
    "metadata": {},
    "source": [
-    "### Initialization"
+    "### Initialization\n",
+    "\n",
+    "Run the next cell to set up the Druid Python client's connection to Apache Druid.\n",
+    "\n",
+    "If successful, the Druid version number will be shown in the output."
    ]
   },
   {
@@ -84,7 +108,23 @@
     "    druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
     "    \n",
     "print(f\"Opening a connection to {druid_host}.\")\n",
-    "druid = druidapi.jupyter_client(druid_host)"
+    "druid = druidapi.jupyter_client(druid_host)\n",
+    "\n",
+    "display = druid.display\n",
+    "sql_client = druid.sql\n",
+    "status_client = druid.status\n",
+    "\n",
+    "status_client.version"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2efdbee0-62da-4fd3-84e1-f66b8c0150b3",
+   "metadata": {},
+   "source": [
+    "<!-- Include these cells if your notebook uses Kafka. -->\n",
+    "\n",
+    "Run the next cell to set up the connection to Apache Kafka."
    ]
   },
   {
@@ -94,15 +134,165 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# INCLUDE THIS CELL IF YOUR NOTEBOOK USES KAFKA  \n",
-    "# Use kafka_host variable when connecting to kafka \n",
-    "import os\n",
-    "\n",
     "if 'KAFKA_HOST' not in os.environ.keys():\n",
     "   kafka_host=f\"http://localhost:9092\"\n",
     "else:\n",
     "    kafka_host=f\"{os.environ['KAFKA_HOST']}:9092\""
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "472589e4-1026-4b3b-bb79-eedabb2b44c4",
+   "metadata": {},
+   "source": [
+    "<!-- Include these cells if you're relying on someone ingesting example data through the console -->\n",
+    "\n",
+    "### Load example data\n",
+    "\n",
+    "Once your Druid environment is up and running, ingest the sample data for this tutorial.\n",
+    "\n",
+    "Run the following cell to create a table called `example-dataset-notebook`. Notice {the use of X as a timestamp | only required columns are ingested | WHERE / expressions / GROUP BY are front-loaded | partitions on X period and clusters by Y}.\n",
+    "\n",
+    "When completed, you'll see a description of the final table.\n",
+    "\n",
+    "<!--\n",
+    "\n",
+    "Replace `example-dataset-notebook` with a unique table name for this notebook.\n",
+    "\n",
+    "- Always prefix your table name with `example-`\n",
+    "- If using the standard example datasets, use the following standard values for `dataset`:\n",
+    "\n",
+    "    wikipedia       wikipedia\n",
+    "    koalas          KoalasToTheMax one day\n",
+    "    koalanest       KoalasToTheMax one day (nested)\n",
+    "    nyctaxi3        NYC Taxi cabs (3 files)\n",
+    "    nyctaxi         NYC Taxi cabs (all files)\n",
+    "    flights         FlightCarrierOnTime (1 month)\n",
+    "\n",
+    "-->\n",
+    "\n",
+    "Monitor the ingestion task process in the Druid console."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f52a94fb-d2e4-403f-ab10-84d3af7bf2c8",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Replace `example-dataset-notebook` with your table name here.\n",
+    "# Remember to apply good data modelling practice to your INSERT / REPLACE.\n",
+    "\n",
+    "sql='''\n",
+    "'''\n",
+    "\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('example-dataset-notebook')\n",
+    "display.table('example-dataset-notebook')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c3d6b39-6551-4b2a-bdfb-9606aa92c853",
+   "metadata": {},
+   "source": [
+    "<!-- Include these cells if you need additional Python modules -->\n",
+    "\n",
+    "Finally, run the following cell to import additional Python modules that you will use to X, Y, Z."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dc4c2524-0eba-4bc6-84ed-da3a25aa5fbe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Add your modules here, remembering to align this with the prerequisites section\n",
+    "\n",
+    "import json\n",
+    "import matplotlib\n",
+    "import matplotlib.pyplot as plt\n",
+    "import pandas as pd"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1b6c9b88-837d-4c80-a28d-36184ba63355",
+   "metadata": {},
+   "source": [
+    "## Awesome!\n",
+    "\n",
+    "The main body of your notebook goes here!\n",
+    "\n",
+    "### This is a step\n",
+    "\n",
+    "Here things get done\n",
+    "\n",
+    "### And so is this!\n",
+    "\n",
+    "Wow! Awesome!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "54b8d5fe-ba85-4b5b-9669-0dd47dfbccd1",
+   "metadata": {},
+   "source": [
+    "## Summary\n",
+    "\n",
+    "* You learned this\n",
+    "* Remember this\n",
+    "\n",
+    "## Go further\n",
+    "\n",
+    "* Try this out on your own data\n",
+    "* Solve for problem X that is't covered here\n",
+    "\n",
+    "## Learn more\n",
+    "\n",
+    "* Read docs pages\n",
+    "* Watch or read something cool from the community\n",
+    "* Do some exploratory stuff on your own"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ca4d3362-b1a4-47a4-a782-9773c216b3ba",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# STANDARD CODE BLOCKS\n",
+    "\n",
+    "# When just wanting to display some SQL results\n",
+    "display.sql(sql)\n",
+    "\n",
+    "# When ingesting data:\n",
+    "sql_client.run_task(sql)\n",
+    "sql_client.wait_until_ready('wikipedia-en')\n",
+    "display.table('wikipedia-en')\n",
+    "\n",
+    "# When you want to make an EXPLAIN look pretty\n",
+    "print(json.dumps(json.loads(sql_client.explain_sql(sql)['PLAN']), indent=2))\n",
+    "\n",
+    "# When you want a simple plot\n",
+    "df = pd.DataFrame(sql_client.sql(sql))\n",
+    "df.plot(x='Tail_Number', y='Flights', marker='o')\n",
+    "plt.xticks(rotation=45, ha='right')\n",
+    "plt.gca().get_legend().remove()\n",
+    "plt.show()\n",
+    "\n",
+    "# When you want to add some query context parameters\n",
+    "req = sql_client.sql_request(sql)\n",
+    "req.add_context(\"useApproximateTopN\", \"false\")\n",
+    "resp = sql_client.sql_query(req)\n",
+    "\n",
+    "# When you want to compare two different sets of results\n",
+    "df3 = df1.compare(df2, keep_equal=True)\n",
+    "df3"
+   ]
   }
  ],
  "metadata": {
@@ -121,7 +311,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.11.4"
+   "version": "3.10.3"
   }
  },
  "nbformat": 4,