Skip to content

Commit

Permalink
Re-structure/expand Trino catalog documentation (#291)
Browse files Browse the repository at this point in the history
# Description

- Adds a concept page for catalog usage
- splits the usage page into a series of pages under "Usage guide"
- attempts to make each cluster/TLS scenario self-enclosed and runnable

Closes #274.
  • Loading branch information
adwk67 committed Sep 22, 2022
1 parent a19fc38 commit 203b6d2
Show file tree
Hide file tree
Showing 19 changed files with 1,629 additions and 667 deletions.
1 change: 1 addition & 0 deletions docs/antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ title: Stackable Operator for Trino
nav:
- modules/getting_started/nav.adoc
- modules/ROOT/nav.adoc
- modules/usage_guide/nav.adoc
prerelease: true
2 changes: 1 addition & 1 deletion docs/modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
* xref:configuration.adoc[]
* xref:usage.adoc[]
* xref:concepts.adoc[]
34 changes: 34 additions & 0 deletions docs/modules/ROOT/pages/concepts.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
= Concepts

== Connectors

https://trino.io/docs/current/overview/use-cases.html#what-trino-is[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own store but rather interacts with many types of store. Trino connects to these stores - or data sources - via https://trino.io/docs/current/connector.html[connectors].
Each connector enables access to a specific underlying datasource such as a Hive warehouse, a PostgreSQL database or a Druid instance.

A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.

== Catalogs

An instance of a connector is called a catalog.
Think of a setup containing a large Hive warehouse running on HDFS.
There may exist two different catalogs called e.g. `warehouse_1` and `warehouse_2` each specifying the same `hive` connector.

Currently, the following connectors are supported:

* https://trino.io/docs/current/connector/hive.html[Hive]
* https://trino.io/docs/current/connector/iceberg.html[Iceberg]
* https://trino.io/docs/current/connector/tpcds.html[TPCDS]
* https://trino.io/docs/current/connector/tpch.html[TPCH]

== Catalog references

Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.

The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:

[excalidraw,trino-catalog-overview,svg,width=70%]
----
include::partial$diagrams/TrinoCatalogs.excalidraw[]
----

A complete example of this is shown here: xref:usage_guide:catalogs.adoc[].
Loading

0 comments on commit 203b6d2

Please sign in to comment.