Skip to content

Commit

Permalink
202307-notebooks Template amends (#14683)
Browse files Browse the repository at this point in the history
Co-authored-by: writer-jill <jill.osborne@imply.io>
  • Loading branch information
petermarshallio and writer-jill authored Aug 15, 2023
1 parent 2fdf5b1 commit e33d2db
Showing 1 changed file with 216 additions and 26 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"id": "0cb3b009-ebde-4d56-9d59-a028d66d8309",
"metadata": {},
"source": [
"# Title\n",
"# (Result) by (action) using (feature)\n",
"<!--\n",
" ~ Licensed to the Apache Software Foundation (ASF) under one\n",
" ~ or more contributor license agreements. See the NOTICE file\n",
Expand All @@ -24,48 +24,72 @@
" ~ specific language governing permissions and limitations\n",
" ~ under the License.\n",
" -->\n",
"Introduction to Notebook\n",
"Lorem Ipsum"
"\n",
"Introductory paragraph - for example:\n",
"\n",
"This tutorial demonstrates how to work with [feature](link to feature doc). In this tutorial you perform the following tasks:\n",
"\n",
"- Task 1\n",
"- Task 2\n",
"- Task 3\n",
"- etc\n",
"\n"
]
},
{
"cell_type": "markdown",
"id": "bbdbf6ad-ca7b-40f5-8ca3-1070f4a3ee42",
"id": "b74aa63d-3d21-472d-8ade-8573ef3c50cf",
"metadata": {},
"source": [
"## Prerequisites\n",
"\n",
"This tutorial works with Druid XX.0.0 or later.\n",
"## Table of contents\n",
"\n",
"Launch this tutorial and all prerequisites using the `all-services` profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n"
"- [Prerequisites](#Prerequisites)\n",
"- [Initalization](#Initalization)\n",
"- [Next section](#Nextsection)\n",
"- etc"
]
},
{
"cell_type": "markdown",
"id": "7ee6aef8-a11d-48d5-bcdc-e6231ba594b7",
"id": "bbdbf6ad-ca7b-40f5-8ca3-1070f4a3ee42",
"metadata": {},
"source": [
"<details><summary> \n",
"<b>Run without Docker Compose</b> \n",
"</summary>\n",
"## Prerequisites\n",
"\n",
"This tutorial works with Druid XX.0.0 or later.\n",
"\n",
"#### Run with Docker\n",
"\n",
"<!-- Profiles are:\n",
"`druid-jupyter` - just Jupyter and Druid\n",
"`all-services` - includes Jupyter, Druid, and Kafka\n",
" -->\n",
"\n",
"Launch this tutorial and all prerequisites using the ....... profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).\n",
" \n",
"#### Run without Docker\n",
"\n",
"In order to run this notebook you will need:\n",
"If you do not use the Docker Compose environment, you need the following:\n",
"\n",
"<b>Required Services</b>\n",
"* <!-- include list of components needed for notebook, i.e. kafka, druid instance, etc. -->\n",
"* A running Apache Druid instance, with a `DRUID_HOST` local environment variable containing the server name of your Druid router\n",
"* [druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md), a Python client for Apache Druid. Follow the instructions in the Install section of the README file.\n",
"\n",
"<b>Python packages</b>\n",
"* druidapi, a [Python client for Apache Druid](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md)\n",
"* <!-- include any python package dependencies -->\n",
"</details>"
" <!-- Remove as needed -->\n",
"* A running Apache Kafka instance, with a `KAFKA_HOST` local environment variable containing the broker server name.\n",
"* [matplotlib](https://matplotlib.org/), a library for creating visualizations in Python.\n",
"* [pandas](https://pandas.pydata.org/), a data analysis and manipulation tool."
]
},
{
"cell_type": "markdown",
"id": "5007a243-b81a-4601-8f57-5b14940abbff",
"metadata": {},
"source": [
"### Initialization"
"### Initialization\n",
"\n",
"Run the next cell to set up the Druid Python client's connection to Apache Druid.\n",
"\n",
"If successful, the Druid version number will be shown in the output."
]
},
{
Expand All @@ -84,7 +108,23 @@
" druid_host=f\"http://{os.environ['DRUID_HOST']}:8888\"\n",
" \n",
"print(f\"Opening a connection to {druid_host}.\")\n",
"druid = druidapi.jupyter_client(druid_host)"
"druid = druidapi.jupyter_client(druid_host)\n",
"\n",
"display = druid.display\n",
"sql_client = druid.sql\n",
"status_client = druid.status\n",
"\n",
"status_client.version"
]
},
{
"cell_type": "markdown",
"id": "2efdbee0-62da-4fd3-84e1-f66b8c0150b3",
"metadata": {},
"source": [
"<!-- Include these cells if your notebook uses Kafka. -->\n",
"\n",
"Run the next cell to set up the connection to Apache Kafka."
]
},
{
Expand All @@ -94,15 +134,165 @@
"metadata": {},
"outputs": [],
"source": [
"# INCLUDE THIS CELL IF YOUR NOTEBOOK USES KAFKA \n",
"# Use kafka_host variable when connecting to kafka \n",
"import os\n",
"\n",
"if 'KAFKA_HOST' not in os.environ.keys():\n",
" kafka_host=f\"http://localhost:9092\"\n",
"else:\n",
" kafka_host=f\"{os.environ['KAFKA_HOST']}:9092\""
]
},
{
"cell_type": "markdown",
"id": "472589e4-1026-4b3b-bb79-eedabb2b44c4",
"metadata": {},
"source": [
"<!-- Include these cells if you're relying on someone ingesting example data through the console -->\n",
"\n",
"### Load example data\n",
"\n",
"Once your Druid environment is up and running, ingest the sample data for this tutorial.\n",
"\n",
"Run the following cell to create a table called `example-dataset-notebook`. Notice {the use of X as a timestamp | only required columns are ingested | WHERE / expressions / GROUP BY are front-loaded | partitions on X period and clusters by Y}.\n",
"\n",
"When completed, you'll see a description of the final table.\n",
"\n",
"<!--\n",
"\n",
"Replace `example-dataset-notebook` with a unique table name for this notebook.\n",
"\n",
"- Always prefix your table name with `example-`\n",
"- If using the standard example datasets, use the following standard values for `dataset`:\n",
"\n",
" wikipedia wikipedia\n",
" koalas KoalasToTheMax one day\n",
" koalanest KoalasToTheMax one day (nested)\n",
" nyctaxi3 NYC Taxi cabs (3 files)\n",
" nyctaxi NYC Taxi cabs (all files)\n",
" flights FlightCarrierOnTime (1 month)\n",
"\n",
"-->\n",
"\n",
"Monitor the ingestion task process in the Druid console."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f52a94fb-d2e4-403f-ab10-84d3af7bf2c8",
"metadata": {},
"outputs": [],
"source": [
"# Replace `example-dataset-notebook` with your table name here.\n",
"# Remember to apply good data modelling practice to your INSERT / REPLACE.\n",
"\n",
"sql='''\n",
"'''\n",
"\n",
"sql_client.run_task(sql)\n",
"sql_client.wait_until_ready('example-dataset-notebook')\n",
"display.table('example-dataset-notebook')"
]
},
{
"cell_type": "markdown",
"id": "9c3d6b39-6551-4b2a-bdfb-9606aa92c853",
"metadata": {},
"source": [
"<!-- Include these cells if you need additional Python modules -->\n",
"\n",
"Finally, run the following cell to import additional Python modules that you will use to X, Y, Z."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dc4c2524-0eba-4bc6-84ed-da3a25aa5fbe",
"metadata": {},
"outputs": [],
"source": [
"# Add your modules here, remembering to align this with the prerequisites section\n",
"\n",
"import json\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "1b6c9b88-837d-4c80-a28d-36184ba63355",
"metadata": {},
"source": [
"## Awesome!\n",
"\n",
"The main body of your notebook goes here!\n",
"\n",
"### This is a step\n",
"\n",
"Here things get done\n",
"\n",
"### And so is this!\n",
"\n",
"Wow! Awesome!"
]
},
{
"cell_type": "markdown",
"id": "54b8d5fe-ba85-4b5b-9669-0dd47dfbccd1",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"* You learned this\n",
"* Remember this\n",
"\n",
"## Go further\n",
"\n",
"* Try this out on your own data\n",
"* Solve for problem X that is't covered here\n",
"\n",
"## Learn more\n",
"\n",
"* Read docs pages\n",
"* Watch or read something cool from the community\n",
"* Do some exploratory stuff on your own"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ca4d3362-b1a4-47a4-a782-9773c216b3ba",
"metadata": {},
"outputs": [],
"source": [
"# STANDARD CODE BLOCKS\n",
"\n",
"# When just wanting to display some SQL results\n",
"display.sql(sql)\n",
"\n",
"# When ingesting data:\n",
"sql_client.run_task(sql)\n",
"sql_client.wait_until_ready('wikipedia-en')\n",
"display.table('wikipedia-en')\n",
"\n",
"# When you want to make an EXPLAIN look pretty\n",
"print(json.dumps(json.loads(sql_client.explain_sql(sql)['PLAN']), indent=2))\n",
"\n",
"# When you want a simple plot\n",
"df = pd.DataFrame(sql_client.sql(sql))\n",
"df.plot(x='Tail_Number', y='Flights', marker='o')\n",
"plt.xticks(rotation=45, ha='right')\n",
"plt.gca().get_legend().remove()\n",
"plt.show()\n",
"\n",
"# When you want to add some query context parameters\n",
"req = sql_client.sql_request(sql)\n",
"req.add_context(\"useApproximateTopN\", \"false\")\n",
"resp = sql_client.sql_query(req)\n",
"\n",
"# When you want to compare two different sets of results\n",
"df3 = df1.compare(df2, keep_equal=True)\n",
"df3"
]
}
],
"metadata": {
Expand All @@ -121,7 +311,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
"version": "3.10.3"
}
},
"nbformat": 4,
Expand Down

0 comments on commit e33d2db

Please sign in to comment.