Re-structure/expand Trino catalog documentation (#291)

# Description - Adds a concept page for catalog usage - splits the usage page into a series of pages under "Usage guide" - attempts to make each cluster/TLS scenario self-enclosed and runnable Closes #274.
stackabletech · Sep 22, 2022 · 203b6d2 · 203b6d2
1 parent a19fc38
commit 203b6d2
Show file tree

Hide file tree

Showing 19 changed files with 1,629 additions and 667 deletions.
diff --git a/docs/antora.yml b/docs/antora.yml
@@ -4,4 +4,5 @@ title: Stackable Operator for Trino
 nav:
   - modules/getting_started/nav.adoc
   - modules/ROOT/nav.adoc
+  - modules/usage_guide/nav.adoc
 prerelease: true
diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc
@@ -1,2 +1,2 @@
 * xref:configuration.adoc[]
-* xref:usage.adoc[]
+* xref:concepts.adoc[]
diff --git a/docs/modules/ROOT/pages/concepts.adoc b/docs/modules/ROOT/pages/concepts.adoc
@@ -0,0 +1,34 @@
+= Concepts
+
+== Connectors
+
+https://trino.io/docs/current/overview/use-cases.html#what-trino-is[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own store but rather interacts with many types of store. Trino connects to these stores - or data sources - via https://trino.io/docs/current/connector.html[connectors].
+Each connector enables access to a specific underlying datasource such as a Hive warehouse, a PostgreSQL database or a Druid instance.
+
+A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.
+
+== Catalogs
+
+An instance of a connector is called a catalog.
+Think of a setup containing a large Hive warehouse running on HDFS.
+There may exist two different catalogs called e.g. `warehouse_1` and `warehouse_2` each specifying the same `hive` connector.
+
+Currently, the following connectors are supported:
+
+* https://trino.io/docs/current/connector/hive.html[Hive]
+* https://trino.io/docs/current/connector/iceberg.html[Iceberg]
+* https://trino.io/docs/current/connector/tpcds.html[TPCDS]
+* https://trino.io/docs/current/connector/tpch.html[TPCH]
+
+== Catalog references
+
+Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
+
+The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
+
+[excalidraw,trino-catalog-overview,svg,width=70%]
+----
+include::partial$diagrams/TrinoCatalogs.excalidraw[]
+----
+
+A complete example of this is shown here: xref:usage_guide:catalogs.adoc[].