From 203b6d2d2bfd08ab26c77e4264255a828f9dc1da Mon Sep 17 00:00:00 2001 From: Andrew Kenworthy Date: Thu, 22 Sep 2022 08:38:15 +0000 Subject: [PATCH] Re-structure/expand Trino catalog documentation (#291) # Description - Adds a concept page for catalog usage - splits the usage page into a series of pages under "Usage guide" - attempts to make each cluster/TLS scenario self-enclosed and runnable Closes #274. --- docs/antora.yml | 1 + docs/modules/ROOT/nav.adoc | 2 +- docs/modules/ROOT/pages/concepts.adoc | 34 + docs/modules/ROOT/pages/usage.adoc | 664 ----------------- .../diagrams/TrinoCatalogs.excalidraw | 673 ++++++++++++++++++ .../getting_started/pages/first_steps.adoc | 4 +- .../getting_started/pages/installation.adoc | 30 + .../examples/code/trino-insecure.yaml | 47 ++ .../code/trino-secure-internal-tls.yaml | 77 ++ .../examples/code/trino-secure-tls-only.yaml | 63 ++ .../examples/code/trino-secure-tls.yaml | 77 ++ docs/modules/usage_guide/nav.adoc | 7 + docs/modules/usage_guide/pages/catalogs.adoc | 76 ++ docs/modules/usage_guide/pages/cluster.adoc | 221 ++++++ .../usage_guide/pages/configuration.adoc | 166 +++++ docs/modules/usage_guide/pages/index.adoc | 14 + .../modules/usage_guide/pages/monitoring.adoc | 5 + docs/modules/usage_guide/pages/query.adoc | 70 ++ docs/modules/usage_guide/pages/security.adoc | 65 ++ 19 files changed, 1629 insertions(+), 667 deletions(-) create mode 100644 docs/modules/ROOT/pages/concepts.adoc delete mode 100644 docs/modules/ROOT/pages/usage.adoc create mode 100644 docs/modules/ROOT/partials/diagrams/TrinoCatalogs.excalidraw create mode 100644 docs/modules/usage_guide/examples/code/trino-insecure.yaml create mode 100644 docs/modules/usage_guide/examples/code/trino-secure-internal-tls.yaml create mode 100644 docs/modules/usage_guide/examples/code/trino-secure-tls-only.yaml create mode 100644 docs/modules/usage_guide/examples/code/trino-secure-tls.yaml create mode 100644 docs/modules/usage_guide/nav.adoc create mode 100644 docs/modules/usage_guide/pages/catalogs.adoc create mode 100644 docs/modules/usage_guide/pages/cluster.adoc create mode 100644 docs/modules/usage_guide/pages/configuration.adoc create mode 100644 docs/modules/usage_guide/pages/index.adoc create mode 100644 docs/modules/usage_guide/pages/monitoring.adoc create mode 100644 docs/modules/usage_guide/pages/query.adoc create mode 100644 docs/modules/usage_guide/pages/security.adoc diff --git a/docs/antora.yml b/docs/antora.yml index f85369c4..99a22096 100644 --- a/docs/antora.yml +++ b/docs/antora.yml @@ -4,4 +4,5 @@ title: Stackable Operator for Trino nav: - modules/getting_started/nav.adoc - modules/ROOT/nav.adoc + - modules/usage_guide/nav.adoc prerelease: true diff --git a/docs/modules/ROOT/nav.adoc b/docs/modules/ROOT/nav.adoc index 57c8d3f6..82da5c89 100644 --- a/docs/modules/ROOT/nav.adoc +++ b/docs/modules/ROOT/nav.adoc @@ -1,2 +1,2 @@ * xref:configuration.adoc[] -* xref:usage.adoc[] +* xref:concepts.adoc[] \ No newline at end of file diff --git a/docs/modules/ROOT/pages/concepts.adoc b/docs/modules/ROOT/pages/concepts.adoc new file mode 100644 index 00000000..b1c2f43e --- /dev/null +++ b/docs/modules/ROOT/pages/concepts.adoc @@ -0,0 +1,34 @@ += Concepts + +== Connectors + +https://trino.io/docs/current/overview/use-cases.html#what-trino-is[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own store but rather interacts with many types of store. Trino connects to these stores - or data sources - via https://trino.io/docs/current/connector.html[connectors]. +Each connector enables access to a specific underlying datasource such as a Hive warehouse, a PostgreSQL database or a Druid instance. + +A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user. + +== Catalogs + +An instance of a connector is called a catalog. +Think of a setup containing a large Hive warehouse running on HDFS. +There may exist two different catalogs called e.g. `warehouse_1` and `warehouse_2` each specifying the same `hive` connector. + +Currently, the following connectors are supported: + +* https://trino.io/docs/current/connector/hive.html[Hive] +* https://trino.io/docs/current/connector/iceberg.html[Iceberg] +* https://trino.io/docs/current/connector/tpcds.html[TPCDS] +* https://trino.io/docs/current/connector/tpch.html[TPCH] + +== Catalog references + +Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible. + +The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster: + +[excalidraw,trino-catalog-overview,svg,width=70%] +---- +include::partial$diagrams/TrinoCatalogs.excalidraw[] +---- + +A complete example of this is shown here: xref:usage_guide:catalogs.adoc[]. \ No newline at end of file diff --git a/docs/modules/ROOT/pages/usage.adoc b/docs/modules/ROOT/pages/usage.adoc deleted file mode 100644 index c58665ca..00000000 --- a/docs/modules/ROOT/pages/usage.adoc +++ /dev/null @@ -1,664 +0,0 @@ -= Usage - -Trino works together with the Apache Hive metastore and S3 bucket. - -== Prerequisites - -* Deployed Stackable Apache Hive metastore -* Accessible S3 Bucket - ** Endpoint, access-key and secret-key - ** Data in the Bucket (we use the https://archive.ics.uci.edu/ml/datasets/iris[Iris] dataset here) -* Optional deployed Stackable xref:secret-operator::index.adoc[Secret Operator] for certificates when deploying for TLS -* Optional deployed Stackable xref:commons-operator::index.adoc[Commons Operator] for certificates when deploying for TLS authentication -* Optional for authorization: Deployed Stackable xref:opa::index.adoc[OPA Operator][OPA-Operator] -* Optional https://repo.stackable.tech/#browse/browse:packages:trino-cli%2Ftrino-cli-363-executable.jar[Trino CLI] to test SQL queries - -== Installation - -In the following we explain or link the required installation steps. - -=== S3 bucket - -Please refer to the S3 provider. - -=== Hive operator - -Please refer to the xref:hive::index.adoc[Hive Operator] docs. - -Both Hive and Trino need the same S3 authentication. - -=== OPA operator - -Please refer to the xref:opa::index.adoc[OPA Operator] docs. - -=== Authentication - -We provide user authentication via secret that can be referred in the custom resource: -[source,yaml] ----- -authentication: - method: - multiUser: - userCredentialsSecret: - namespace: default - name: simple-trino-users-secret ----- - -These secrets need to be created manually before startup. The secret may look like the following snippet: -[source,yaml] ----- -apiVersion: v1 -kind: Secret -metadata: - name: simple-trino-users-secret -type: kubernetes.io/opaque -stringData: - admin: $2y$10$89xReovvDLacVzRGpjOyAOONnayOgDAyIS2nW9bs5DJT98q17Dy5i - alice: $2y$10$HcCa4k9v2DRrD/g7e5vEz.Bk.1xg00YTEHOZjPX7oK3KqMSt2xT8W - bob: $2y$10$xVRXtYZnYuQu66SmruijPO8WHFM/UK5QPHTr.Nzf4JMcZSqt3W.2. ----- - -The : combinations are provided in the `stringData` field. The hashes are created using bcrypt with 10 rounds or more. -[source] ----- -htpasswd -nbBC 10 admin admin ----- - -=== Authorization - -In order to authorize Trino via OPA, a `ConfigMap` containing Rego rules for Trino has to be applied. The following example is an all access Rego rule for testing with the user `admin`. Do not use it in production! - -[source,yaml] ----- -apiVersion: v1 -kind: ConfigMap -metadata: - name: opa-bundle-trino - labels: - opa.stackable.tech/bundle: "trino" -data: - trino.rego: | - package trino - - import future.keywords.in - - default allow = false - - allow { - is_admin - } - - is_admin() { - input.context.identity.user == "admin" - } ----- - -Users should write their own rego rules for more complex OPA authorization. - -=== Trino - -With the prerequisites fulfilled, the CRD for this operator must be created: -[source] ----- -kubectl apply -f /etc/stackable/trino-operator/crd/trinocluster.crd.yaml ----- - -==== Insecure for testing: - -Create an insecure single node Trino cluster for testing. You will access the UI/CLI via http and no user / password or authorization is required. Please adapt the `s3` settings with your credentials (check `examples/simple-trino-cluster.yaml` for an example setting up Hive and Trino): - -[source,yaml] ----- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCluster -metadata: - name: simple-trino -spec: - version: 396-stackable0.1.0 - catalogLabelSelector: - matchLabels: - trino: simple-trino - coordinators: - roleGroups: - default: - replicas: 1 - workers: - roleGroups: - default: - replicas: 1 ---- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCatalog -metadata: - name: hive - labels: - trino: simple-trino -spec: - connector: - hive: - metastore: - configMap: simple-hive-derby - s3: - inline: - host: test-minio - port: 9000 - accessStyle: Path - credentials: - secretClass: minio-credentials ----- - -To access the CLI please execute: -[source] ----- -./trino-cli-396-executable.jar --debug --server http://: --user=admin ----- - -==== Secure for production: - -There are multiple steps that must be taken to secure a Trino cluster: - -1. Enable authentication -2. Enable TLS between the clients and coordinator -3. Enable internal TLS for communications between coordinators and workers - -For testing purposes we use the https://trino.io/docs/current/installation/cli.html[Trino CLI]. - -===== Via authentication - -If authentication is enabled, https://trino.io/docs/current/security/tls.html[TLS] for the coordinator as well as a shared secret for https://trino.io/docs/current/security/internal-communication.html[internal communications] (this is base64 and not encrypted) must be configured. - -Securing the Trino cluster will disable all HTTP ports and disable the web interface on the HTTP port as well. - -[source,yaml] ----- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCluster -metadata: - name: simple-trino -spec: - version: 396-stackable0.1.0 - config: - tls: - secretClass: trino-tls - authentication: - method: - multiUser: - userCredentialsSecret: - name: simple-trino-users-secret -[..] ----- - -If no `config.tls.secretClass` is provided but authentication is enabled, it will default to `tls` provided by the xref:secret-operator::index.adoc[Secret Operator]. - -[source] ----- -./trino-cli-396-executable.jar --debug --server https://: --user=admin --keystore-path=keystore.p12 --keystore-password=changeit ----- -or - -[source] ----- -./trino-cli-396-executable.jar --debug --server https://: --user=admin --insecure ----- - -===== Via TLS only - -This will disable the HTTP port and UI access and encrypt client-server communications. - -[source,yaml] ----- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCluster -metadata: - name: simple-trino -spec: - version: 396-stackable0.1.0 - config: - tls: - secretClass: trino-tls -[..] ----- - -[source] ----- -./trino-cli-396-executable.jar --debug --server https://: --user=admin --keystore-path=keystore.p12 --keystore-password=changeit ----- - -===== Via internal TLS - -Internal TLS is for encrypted and authenticated communications between coordinators and workers. Since this applies to all the data send and processed between the processes, this may reduce the performance significantly. - -[source,yaml] ----- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCluster -metadata: - name: simple-trino -spec: - version: 396-stackable0.1.0 - config: - internalTls: - secretClass: trino-internal-tls -[..] ----- - -Since Trino has internal and external communications running over a single port, this will enable the HTTPS port but not expose it. Cluster access is only possible via HTTP. - -[source] ----- -./trino-cli-396-executable.jar --debug --server http://: --user=admin ----- - -==== S3 connection specification - -You can specify S3 connection details directly inside the `TrinoCatalog` specification -or by referring to an external `S3Connection` custom resource. - -To specify S3 connection details directly as part of the `TrinoCatalog` resource, you -add an inline connection configuration as shown below: - -[source,yaml] ----- -s3: # <1> - inline: - host: test-minio # <2> - port: 9000 # <3> - pathStyleAccess: true # <4> - secretClass: minio-credentials # <5> - tls: - verification: - server: - caCert: - secretClass: minio-tls-certificates #<6> ----- -<1> Entry point for the connection configuration -<2> Connection host -<3> Optional connection port -<4> Optional flag if path-style URLs should be used; This defaults to `false` - which means virtual hosted-style URLs are used. -<5> Name of the `Secret` object expected to contain the following keys: - `accessKey` and `secretKey` -<6> Optional TLS settings for encrypted traffic. The `secretClass` can be provided by the Secret Operator or yourself. - -A self provided S3 TLS secret can be specified like this: -[source,yaml] ----- -apiVersion: secrets.stackable.tech/v1alpha1 -kind: SecretClass -metadata: - name: minio-tls-certificates -spec: - backend: - k8sSearch: - searchNamespace: - pod: {} ---- -apiVersion: v1 -kind: Secret -metadata: - name: minio-tls-certificates - labels: - secrets.stackable.tech/class: minio-tls-certificates -data: - ca.crt: - tls.crt: - tls.key: ----- - -It is also possible to configure the bucket connection details as a separate -Kubernetes resource and only refer to that object from the `TrinoCatalog` specification -like this: - -[source,yaml] ----- -s3: - reference: my-connection-resource # <1> ----- -<1> Name of the connection resource with connection details - -The resource named `my-connection-resource` is then defined as shown below: - -[source,yaml] ----- ---- -apiVersion: s3.stackable.tech/v1alpha1 -kind: S3Connection -metadata: - name: my-connection-resource -spec: - host: test-minio - port: 9000 - accessStyle: Path - credentials: - secretClass: minio-credentials ----- - -This has the advantage that the connection configuration can be shared across -applications and reduces the cost of updating these details. - -=== Test Trino with Hive and S3 - -Create a schema and a table for the Iris data located in S3 and query data. This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here] - -==== Create schema -[source,sql] ----- -CREATE SCHEMA IF NOT EXISTS hive.iris -WITH (location = 's3a://iris/'); ----- -which should return: ----- -CREATE SCHEMA ----- - -==== Create table -[source,sql] ----- -CREATE TABLE IF NOT EXISTS hive.iris.iris_parquet ( - sepal_length DOUBLE, - sepal_width DOUBLE, - petal_length DOUBLE, - petal_width DOUBLE, - class VARCHAR -) -WITH ( - external_location = 's3a://iris/parq', - format = 'PARQUET' -); ----- -which should return: ----- -CREATE TABLE ----- - -==== Query data -[source,sql] ----- -SELECT - sepal_length, - class -FROM hive.iris.iris_parquet -LIMIT 10; ----- - -which should return something like this: ----- - sepal_length | class ---------------+------------- - 5.1 | Iris-setosa - 4.9 | Iris-setosa - 4.7 | Iris-setosa - 4.6 | Iris-setosa - 5.0 | Iris-setosa - 5.4 | Iris-setosa - 4.6 | Iris-setosa - 5.0 | Iris-setosa - 4.4 | Iris-setosa - 4.9 | Iris-setosa -(10 rows) - -Query 20220210_161615_00000_a8nka, FINISHED, 1 node -https://172.18.0.5:30299/ui/query.html?20220210_161615_00000_a8nka -Splits: 18 total, 18 done (100.00%) -CPU Time: 0.7s total, 20 rows/s, 11.3KB/s, 74% active -Per Node: 0.3 parallelism, 5 rows/s, 3.02KB/s -Parallelism: 0.3 -Peak Memory: 0B -2.67 [15 rows, 8.08KB] [5 rows/s, 3.02KB/s] ----- - -== Catalogs -=== Create catalog -Trino connects to datasources via https://trino.io/docs/current/connector.html[connectors]. -Each connector enables access to a specific underlying datasource like a Hive warehouse, PostgresSQL or Druid instance. - -Currently the following connectors are supported: - -* https://trino.io/docs/current/connector/hive.html[Hive] -* https://trino.io/docs/current/connector/iceberg.html[Iceberg] -* https://trino.io/docs/current/connector/tpcds.html[TPCDS] -* https://trino.io/docs/current/connector/tpch.html[TPCH] - -An instance of a connector is called a catalog. -Think of a setup containing a large Hive warehouses based on a HDFS. -You then can have two catalogs called e.g. `warehouse_1` and `warehouse_2` that both use the `hive` connector. - -You can create a catalog using the `TrinoCatalog` object as follows. - -[source,yaml] ----- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCatalog -metadata: - name: hive - labels: - trino: simple-trino -spec: - connector: - hive: - metastore: - configMap: simple-hive-derby - s3: - inline: - host: test-minio - port: 9000 - accessStyle: Path - credentials: - secretClass: minio-credentials ---- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCatalog -metadata: - name: iceberg - labels: - trino: simple-trino -spec: - connector: - iceberg: - metastore: - configMap: simple-hive-derby - s3: - inline: - host: test-minio - port: 9000 - accessStyle: Path - credentials: - secretClass: minio-credentials ----- - -The `metadata.name` will be the name of the catalog that shows up in your Trino instance. -The `metadata.labels` will be used to determine, which Trino instance imports which `TrinoCatalogs`. -The `spec.connector.` determines which connector is used. -Each connector supports a different set of attributes. - -=== Add catalog to Trino cluster -You have to specify within your `TrinoCluster` which catalogs it should use as follows: - -[source,yaml] ----- -apiVersion: trino.stackable.tech/v1alpha1 -kind: TrinoCluster -metadata: - name: simple-trino -spec: - version: 396-stackable0.1.0 - catalogLabelSelector: - matchLabels: - trino: simple-trino -# ... ----- - -The `spec.catalogLabelSelector` is used to fetch the list of `TrinoCatalogs` used for this Trino cluster. -In this case the `hive` and `iceberg` catalogs will be used as they both match the `catalogLabelSelector`. -This mechanism allows to create a `TrinoCluster` once and then add new catalogs dynamically by creating `TrinoCatalog` objects. -It also allows to reuse a `TrinoCatalog` within multiple `TrinoClusters`. - -== Monitoring - -The managed Trino instances are automatically configured to export Prometheus metrics. See -xref:home:operators:monitoring.adoc[] for more details. - -== Configuration & Environment Overrides - -The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). - -IMPORTANT: Do not override port numbers. This will lead to faulty installations. - -=== Configuration Properties - -For a role or role group, at the same level of `config`, you can specify: `configOverrides` for: - -- `config.properties` -- `node.properties` -- `log.properties` -- `password-authenticator.properties` - -For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference]. - -[source,yaml] ----- -workers: - roleGroups: - default: - config: {} - replicas: 1 - configOverrides: - config.properties: - query.max-memory-per-node: "2GB" ----- - -Just as for the `config`, it is possible to specify this at role level as well: - -[source,yaml] ----- -workers: - configOverrides: - config.properties: - query.max-memory-per-node: "2GB" - roleGroups: - default: - config: {} - replicas: 1 ----- - -All override property values must be strings. The properties will be passed on without any escaping or formatting. - -=== Environment Variables - -Environment variables can be (over)written by adding the `envOverrides` property. - -For example per role group: - -[source,yaml] ----- -workers: - roleGroups: - default: - config: {} - replicas: 1 - envOverrides: - JAVA_HOME: "path/to/java" ----- - -or per role: - -[source,yaml] ----- -workers: - envOverrides: - JAVA_HOME: "path/to/java" - roleGroups: - default: - config: {} - replicas: 1 ----- - -Here too, overriding properties such as `http-server.https.port` will lead to broken installations. - -=== Storage for data volumes - -You can mount a volume where data (config and logs of Trino) is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role or role group: - -[source,yaml] ----- -workers: - config: - resources: - storage: - data: - capacity: 2Gi - roleGroups: - default: - config: - resources: - storage: - data: - capacity: 3Gi ----- - -In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume. Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume). This works the same for memory or CPU requests. - -By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `2Gi` large local volume mount for the data location containing mainly logs. - -=== Memory requests - -You can request a certain amount of memory for each individual role group as shown below: - -[source,yaml] ----- -workers: - roleGroups: - default: - config: - resources: - memory: - limit: '2Gi' ----- - -In this example, each Trino container in the `default` group will have a maximum of 2 gigabytes of memory. To be more precise, these memory limits apply to the container running Trino but not to any sidecar containers that are part of the pod. - -Setting this property will also automatically set the maximum Java heap size for the corresponding process to 80% of the available memory. Be aware that if the memory constraint is too low, the cluster might fail to start. If pods terminate with an 'OOMKilled' status and the cluster doesn't start, try increasing the memory limit. - -For more details regarding Kubernetes memory requests and limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/[Assign Memory Resources to Containers and Pods]. - -=== CPU requests - -Similarly to memory resources, you can also configure CPU limits, as shown below: - -[source,yaml] ----- -workers: - roleGroups: - default: - config: - resources: - cpu: - max: '500m' - min: '250m' ----- - -=== Defaults - -If nothing is specified, the operator will automatically set the following default values for resources: - -[source,yaml] ----- -workers: - roleGroups: - default: - config: - resources: - requests: - cpu: 200m - memory: 2Gi - limits: - cpu: "4" - memory: 2Gi - storage: - data: - capacity: 2Gi ----- - -WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements. - -For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods]. diff --git a/docs/modules/ROOT/partials/diagrams/TrinoCatalogs.excalidraw b/docs/modules/ROOT/partials/diagrams/TrinoCatalogs.excalidraw new file mode 100644 index 00000000..764af67b --- /dev/null +++ b/docs/modules/ROOT/partials/diagrams/TrinoCatalogs.excalidraw @@ -0,0 +1,673 @@ +{ + "type": "excalidraw", + "version": 2, + "source": "https://excalidraw.com", + "elements": [ + { + "type": "rectangle", + "version": 745, + "versionNonce": 1870472121, + "isDeleted": false, + "id": "ShDDUIjfAR4cUs5ZS_Hw7", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 759, + "y": 749.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 153, + "height": 62, + "seed": 236946397, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "id": "2ceAyVC8OCwGa-aFQfmsK", + "type": "text" + }, + { + "id": "42AVzcm9HY_DUE7Idt105", + "type": "arrow" + }, + { + "type": "text", + "id": "2ceAyVC8OCwGa-aFQfmsK" + }, + { + "id": "J_BTacHx9FWlVnVUKMYgn", + "type": "arrow" + } + ], + "updated": 1663759431752, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 613, + "versionNonce": 757256279, + "isDeleted": false, + "id": "2ceAyVC8OCwGa-aFQfmsK", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 764, + "y": 768, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 143, + "height": 25, + "seed": 141620339, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759431752, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "TrinoCluster", + "baseline": 18, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "ShDDUIjfAR4cUs5ZS_Hw7", + "originalText": "TrinoCluster" + }, + { + "type": "rectangle", + "version": 327, + "versionNonce": 1740724691, + "isDeleted": false, + "id": "95W9gdcFPFkc3eGCMObeQ", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 502, + "y": 702.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 137, + "height": 45, + "seed": 714932285, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "id": "C9qRQEagQJDjUQwANeCpn", + "type": "text" + }, + { + "id": "42AVzcm9HY_DUE7Idt105", + "type": "arrow" + }, + { + "type": "text", + "id": "C9qRQEagQJDjUQwANeCpn" + } + ], + "updated": 1655193671050, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 340, + "versionNonce": 454286777, + "isDeleted": false, + "id": "C9qRQEagQJDjUQwANeCpn", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 507, + "y": 713, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 127, + "height": 25, + "seed": 434105875, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663758882445, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "Connector", + "baseline": 18, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "95W9gdcFPFkc3eGCMObeQ", + "originalText": "Connector" + }, + { + "type": "rectangle", + "version": 141, + "versionNonce": 1891411827, + "isDeleted": false, + "id": "Fl0vBzl87ULvVkSFXEUCj", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 470, + "y": 644.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 179, + "height": 114, + "seed": 1726738589, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "id": "42AVzcm9HY_DUE7Idt105", + "type": "arrow" + } + ], + "updated": 1655193671050, + "link": null, + "locked": false + }, + { + "type": "arrow", + "version": 1669, + "versionNonce": 1713568695, + "isDeleted": false, + "id": "42AVzcm9HY_DUE7Idt105", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 650.5897435897436, + "y": 750.5597777692162, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 92.51282051282044, + "height": 18.35026562218286, + "seed": 578012499, + "groupIds": [], + "strokeSharpness": "round", + "boundElements": [], + "updated": 1663759431769, + "link": null, + "locked": false, + "startBinding": { + "elementId": "Fl0vBzl87ULvVkSFXEUCj", + "gap": 1.5897435897435899, + "focus": 0.41496824490053114 + }, + "endBinding": { + "elementId": "ShDDUIjfAR4cUs5ZS_Hw7", + "gap": 15.897435897435898, + "focus": -0.14591356439879996 + }, + "lastCommittedPoint": null, + "startArrowhead": null, + "endArrowhead": "arrow", + "points": [ + [ + 0, + 0 + ], + [ + 92.51282051282044, + 18.35026562218286 + ] + ] + }, + { + "type": "text", + "version": 94, + "versionNonce": 396341079, + "isDeleted": false, + "id": "ti1ORFDQZcMZZ4GlOXDXu", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 483, + "y": 656, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 128, + "height": 25, + "seed": 619993885, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759250068, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "TrinoCatalog", + "baseline": 18, + "textAlign": "left", + "verticalAlign": "top", + "containerId": null, + "originalText": "TrinoCatalog" + }, + { + "type": "ellipse", + "version": 143, + "versionNonce": 1395736599, + "isDeleted": false, + "id": "lqBmV6-s9K7xoWZqxPzoi", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 381, + "y": 672.5, + "strokeColor": "#b80069", + "backgroundColor": "transparent", + "width": 56, + "height": 56, + "seed": 557522557, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "type": "text", + "id": "JUEMRV3lZNhi4zlDDzKps" + } + ], + "updated": 1663759310177, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 86, + "versionNonce": 596881625, + "isDeleted": false, + "id": "JUEMRV3lZNhi4zlDDzKps", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 386, + "y": 677.5, + "strokeColor": "#b80069", + "backgroundColor": "transparent", + "width": 46, + "height": 46, + "seed": 1732007283, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759310177, + "link": null, + "locked": false, + "fontSize": 36, + "fontFamily": 1, + "text": "1", + "baseline": 32, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "lqBmV6-s9K7xoWZqxPzoi", + "originalText": "1" + }, + { + "type": "ellipse", + "version": 143, + "versionNonce": 1964185337, + "isDeleted": false, + "id": "AXChbBjjY_mocVqscR-RT", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 380.5, + "y": 868, + "strokeColor": "#b80069", + "backgroundColor": "transparent", + "width": 56, + "height": 56, + "seed": 1848828509, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "id": "ykPcOs3qVjC8o0B-jdo36", + "type": "text" + }, + { + "type": "text", + "id": "ykPcOs3qVjC8o0B-jdo36" + } + ], + "updated": 1663759307950, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 86, + "versionNonce": 1787343127, + "isDeleted": false, + "id": "ykPcOs3qVjC8o0B-jdo36", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 385.5, + "y": 873, + "strokeColor": "#b80069", + "backgroundColor": "transparent", + "width": 46, + "height": 46, + "seed": 1822998003, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759307950, + "link": null, + "locked": false, + "fontSize": 36, + "fontFamily": 1, + "text": "2", + "baseline": 32, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "AXChbBjjY_mocVqscR-RT", + "originalText": "2" + }, + { + "type": "rectangle", + "version": 392, + "versionNonce": 1878755545, + "isDeleted": false, + "id": "dGj6Z-9giTHELMji2U7Ao", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 502, + "y": 897, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 137, + "height": 45, + "seed": 1053579287, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "id": "TBiqZjMFAGOLGVQR7r6-C", + "type": "text" + }, + { + "id": "42AVzcm9HY_DUE7Idt105", + "type": "arrow" + }, + { + "id": "TBiqZjMFAGOLGVQR7r6-C", + "type": "text" + }, + { + "type": "text", + "id": "TBiqZjMFAGOLGVQR7r6-C" + } + ], + "updated": 1663759272406, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 404, + "versionNonce": 1174457655, + "isDeleted": false, + "id": "TBiqZjMFAGOLGVQR7r6-C", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 507, + "y": 907.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 127, + "height": 25, + "seed": 618010841, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759272406, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "Connector", + "baseline": 18, + "textAlign": "center", + "verticalAlign": "middle", + "containerId": "dGj6Z-9giTHELMji2U7Ao", + "originalText": "Connector" + }, + { + "type": "rectangle", + "version": 206, + "versionNonce": 1155088153, + "isDeleted": false, + "id": "Nl_hXpCFJOdJsPu6T6iBy", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 470, + "y": 839, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 179, + "height": 114, + "seed": 1305951543, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [ + { + "id": "42AVzcm9HY_DUE7Idt105", + "type": "arrow" + }, + { + "id": "J_BTacHx9FWlVnVUKMYgn", + "type": "arrow" + } + ], + "updated": 1663759332956, + "link": null, + "locked": false + }, + { + "type": "text", + "version": 157, + "versionNonce": 486542935, + "isDeleted": false, + "id": "ZMGOgTaNxQs0IZQMRXmlt", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 483, + "y": 850.5, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 128, + "height": 25, + "seed": 204445113, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759272406, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "TrinoCatalog", + "baseline": 18, + "textAlign": "left", + "verticalAlign": "top", + "containerId": null, + "originalText": "TrinoCatalog" + }, + { + "id": "J_BTacHx9FWlVnVUKMYgn", + "type": "arrow", + "x": 652.9743589743589, + "y": 865.3013187127165, + "width": 98.87179487179492, + "height": 83.71631491130802, + "angle": 0, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "strokeSharpness": "round", + "seed": 1381754743, + "version": 324, + "versionNonce": 98069719, + "isDeleted": false, + "boundElements": null, + "updated": 1663759431769, + "link": null, + "locked": false, + "points": [ + [ + 0, + 0 + ], + [ + 98.87179487179492, + -83.71631491130802 + ] + ], + "lastCommittedPoint": null, + "startBinding": { + "elementId": "Nl_hXpCFJOdJsPu6T6iBy", + "gap": 3.9743589743589745, + "focus": 0.3647185102995803 + }, + "endBinding": { + "elementId": "ShDDUIjfAR4cUs5ZS_Hw7", + "gap": 7.153846153846154, + "focus": 0.7282373747298095 + }, + "startArrowhead": null, + "endArrowhead": "arrow" + }, + { + "id": "K2xF-wPmE8gusevXrNm01", + "type": "text", + "x": 697, + "y": 700, + "width": 136, + "height": 25, + "angle": 0, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "groupIds": [], + "strokeSharpness": "sharp", + "seed": 356525145, + "version": 28, + "versionNonce": 2074484537, + "isDeleted": false, + "boundElements": null, + "updated": 1663759407537, + "link": null, + "locked": false, + "text": "matched label", + "fontSize": 20, + "fontFamily": 1, + "textAlign": "left", + "verticalAlign": "top", + "baseline": 18, + "containerId": null, + "originalText": "matched label" + }, + { + "type": "text", + "version": 81, + "versionNonce": 431079577, + "isDeleted": false, + "id": "EjoHPFKuxag7yR6oCIrtu", + "fillStyle": "hachure", + "strokeWidth": 1, + "strokeStyle": "solid", + "roughness": 1, + "opacity": 100, + "angle": 0, + "x": 692.5, + "y": 841, + "strokeColor": "#000000", + "backgroundColor": "transparent", + "width": 136, + "height": 25, + "seed": 227566841, + "groupIds": [], + "strokeSharpness": "sharp", + "boundElements": [], + "updated": 1663759421558, + "link": null, + "locked": false, + "fontSize": 20, + "fontFamily": 1, + "text": "matched label", + "baseline": 18, + "textAlign": "left", + "verticalAlign": "top", + "containerId": null, + "originalText": "matched label" + } + ], + "appState": { + "gridSize": null, + "viewBackgroundColor": "#ffffff" + }, + "files": {} +} \ No newline at end of file diff --git a/docs/modules/getting_started/pages/first_steps.adoc b/docs/modules/getting_started/pages/first_steps.adoc index 1ae52af2..201707ed 100644 --- a/docs/modules/getting_started/pages/first_steps.adoc +++ b/docs/modules/getting_started/pages/first_steps.adoc @@ -4,7 +4,7 @@ After going through the xref:installation.adoc[] section and having installed al == Setup Trino -In the simplest form, Trino does not require any other operators (except the commons and secret operator) to work and access the web interface. There are no data sources (e.g. PostgreSQL, Hive or S3) configured and for the tests only internal data is queried. +A working Trino cluster and its web interface require only the commons and secret operators to work. Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used. Create a file named `trino.yaml` with the following content: @@ -123,4 +123,4 @@ include::example$code/getting-started.sh[tag=cleanup-trino-cli] == What's next -Have a look at the xref:ROOT:usage.adoc[] page to find out more about the features of the Trino Operator. +Have a look at the xref:usage_guide:cluster.adoc[] to find out more about how to configure a Trino cluster. diff --git a/docs/modules/getting_started/pages/installation.adoc b/docs/modules/getting_started/pages/installation.adoc index e5a07d59..19d209d2 100644 --- a/docs/modules/getting_started/pages/installation.adoc +++ b/docs/modules/getting_started/pages/installation.adoc @@ -48,6 +48,36 @@ include::example$code/getting-started.sh[tag=helm-install-operators] Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the Trino service (as well as the CRDs for the required operators). You are now ready to deploy Trino in Kubernetes. +== Optional installation steps + +Some Trino connectors like `hive` or `iceberg` work together with the Apache Hive metastore and S3 buckets. +For these components extra steps are required. + +* a Stackable Hive metastore +* an accessible S3 bucket +** an end-point, and access- and secret-keys +** data in the bucket (we use the https://archive.ics.uci.edu/ml/datasets/iris[Iris] dataset here) +* the following are optional +** a Stackable xref:secret-operator::index.adoc[Secret Operator] for certificates when deploying for TLS +** a Stackable xref:commons-operator::index.adoc[Commons Operator] for certificates when deploying for TLS authentication +** (for authorization): a Stackable xref:opa::index.adoc[OPA Operator][OPA-Operator] +** the https://repo.stackable.tech/#browse/browse:packages:trino-cli%2Ftrino-cli-363-executable.jar[Trino CLI] to test SQL queries + +=== S3 bucket + +Please refer to the S3 provider. + +=== Hive operator + +Please refer to the xref:hive::index.adoc[Hive Operator] docs. + +Both Hive and Trino need the same S3 authentication. + +=== OPA operator + +Please refer to the xref:opa::index.adoc[OPA Operator] docs. + + == What's next xref:first_steps.adoc[Set up a Trino cluster] and its dependencies. diff --git a/docs/modules/usage_guide/examples/code/trino-insecure.yaml b/docs/modules/usage_guide/examples/code/trino-insecure.yaml new file mode 100644 index 00000000..c3691761 --- /dev/null +++ b/docs/modules/usage_guide/examples/code/trino-insecure.yaml @@ -0,0 +1,47 @@ +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCatalog +metadata: + name: hive + labels: + trino: simple-trino +spec: + connector: + hive: + metastore: + configMap: simple-hive-derby +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCluster +metadata: + name: simple-trino +spec: + version: 396-stackable0.1.0 + catalogLabelSelector: + matchLabels: + trino: simple-trino + coordinators: + roleGroups: + default: + replicas: 1 + workers: + roleGroups: + default: + replicas: 1 +--- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-derby +spec: + version: 3.1.3-stackable0.1.0 + metastore: + roleGroups: + default: + replicas: 1 + config: + database: + connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true + user: APP + password: mine + dbType: derby diff --git a/docs/modules/usage_guide/examples/code/trino-secure-internal-tls.yaml b/docs/modules/usage_guide/examples/code/trino-secure-internal-tls.yaml new file mode 100644 index 00000000..9e3e451c --- /dev/null +++ b/docs/modules/usage_guide/examples/code/trino-secure-internal-tls.yaml @@ -0,0 +1,77 @@ +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCatalog +metadata: + name: hive + labels: + trino: simple-trino +spec: + connector: + hive: + metastore: + configMap: simple-hive-derby +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCluster +metadata: + name: simple-trino +spec: + version: 396-stackable0.1.0 + config: + internalTls: + secretClass: trino-internal-tls # <1> + authentication: + method: + multiUser: + userCredentialsSecret: + name: trino-users # <2> + catalogLabelSelector: + matchLabels: + trino: simple-trino + coordinators: + roleGroups: + default: + replicas: 1 + workers: + roleGroups: + default: + replicas: 1 +--- +apiVersion: secrets.stackable.tech/v1alpha1 +kind: SecretClass +metadata: + name: trino-internal-tls # <1> +spec: + backend: + autoTls: # <3> + ca: + secret: + name: secret-provisioner-trino-internal-tls-ca + namespace: default + autoGenerate: true +--- +apiVersion: v1 +kind: Secret +metadata: + name: trino-users # <2> +type: kubernetes.io/opaque +stringData: + # admin:admin + admin: $2y$10$89xReovvDLacVzRGpjOyAOONnayOgDAyIS2nW9bs5DJT98q17Dy5i +--- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-derby +spec: + version: 3.1.3-stackable0.1.0 + metastore: + roleGroups: + default: + replicas: 1 + config: + database: + connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true + user: APP + password: mine + dbType: derby diff --git a/docs/modules/usage_guide/examples/code/trino-secure-tls-only.yaml b/docs/modules/usage_guide/examples/code/trino-secure-tls-only.yaml new file mode 100644 index 00000000..91fcfec6 --- /dev/null +++ b/docs/modules/usage_guide/examples/code/trino-secure-tls-only.yaml @@ -0,0 +1,63 @@ +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCatalog +metadata: + name: hive + labels: + trino: simple-trino +spec: + connector: + hive: + metastore: + configMap: simple-hive-derby +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCluster +metadata: + name: simple-trino +spec: + version: 396-stackable0.1.0 + config: + tls: + secretClass: trino-tls # <1> + catalogLabelSelector: + matchLabels: + trino: simple-trino # <2> + coordinators: + roleGroups: + default: + replicas: 1 + workers: + roleGroups: + default: + replicas: 1 +--- +apiVersion: secrets.stackable.tech/v1alpha1 +kind: SecretClass +metadata: + name: trino-tls # <1> +spec: + backend: + autoTls: # <3> + ca: + secret: + name: secret-provisioner-trino-tls-ca + namespace: default + autoGenerate: true +--- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-derby +spec: + version: 3.1.3-stackable0.1.0 + metastore: + roleGroups: + default: + replicas: 1 + config: + database: + connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true + user: APP + password: mine + dbType: derby diff --git a/docs/modules/usage_guide/examples/code/trino-secure-tls.yaml b/docs/modules/usage_guide/examples/code/trino-secure-tls.yaml new file mode 100644 index 00000000..94e33151 --- /dev/null +++ b/docs/modules/usage_guide/examples/code/trino-secure-tls.yaml @@ -0,0 +1,77 @@ +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCatalog +metadata: + name: hive + labels: + trino: simple-trino +spec: + connector: + hive: + metastore: + configMap: simple-hive-derby +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCluster +metadata: + name: simple-trino +spec: + version: 396-stackable0.1.0 + config: + tls: + secretClass: trino-tls # <1> + authentication: + method: + multiUser: + userCredentialsSecret: + name: trino-users # <2> + catalogLabelSelector: + matchLabels: + trino: simple-trino # <3> + coordinators: + roleGroups: + default: + replicas: 1 + workers: + roleGroups: + default: + replicas: 1 +--- +apiVersion: secrets.stackable.tech/v1alpha1 +kind: SecretClass +metadata: + name: trino-tls # <1> +spec: + backend: + autoTls: # <4> + ca: + secret: + name: secret-provisioner-trino-tls-ca + namespace: default + autoGenerate: true +--- +apiVersion: v1 +kind: Secret +metadata: + name: trino-users # <2> +type: kubernetes.io/opaque +stringData: + # admin:admin + admin: $2y$10$89xReovvDLacVzRGpjOyAOONnayOgDAyIS2nW9bs5DJT98q17Dy5i +--- +apiVersion: hive.stackable.tech/v1alpha1 +kind: HiveCluster +metadata: + name: simple-hive-derby +spec: + version: 3.1.3-stackable0.1.0 + metastore: + roleGroups: + default: + replicas: 1 + config: + database: + connString: jdbc:derby:;databaseName=/tmp/metastore_db;create=true + user: APP + password: mine + dbType: derby diff --git a/docs/modules/usage_guide/nav.adoc b/docs/modules/usage_guide/nav.adoc new file mode 100644 index 00000000..a33cb615 --- /dev/null +++ b/docs/modules/usage_guide/nav.adoc @@ -0,0 +1,7 @@ +* xref:index.adoc[] +** xref:security.adoc[] +** xref:catalogs.adoc[] +** xref:cluster.adoc[] +** xref:configuration.adoc[] +** xref:monitoring.adoc[] +** xref:query.adoc[] \ No newline at end of file diff --git a/docs/modules/usage_guide/pages/catalogs.adoc b/docs/modules/usage_guide/pages/catalogs.adoc new file mode 100644 index 00000000..aaee3fe8 --- /dev/null +++ b/docs/modules/usage_guide/pages/catalogs.adoc @@ -0,0 +1,76 @@ += Using Catalogs + +Catalogues are defined in their own resources and referenced from cluster objects. See the xref:ROOT:concepts.adoc[] page for more details. + +== Create a catalog + +You can create a catalog using the `TrinoCatalog` object as follows. + +[source,yaml] +---- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCatalog +metadata: + name: hive + labels: + trino: simple-trino +spec: + connector: + hive: + metastore: + configMap: simple-hive-derby + s3: + inline: + host: test-minio + port: 9000 + accessStyle: Path + credentials: + secretClass: minio-credentials +--- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCatalog +metadata: + name: iceberg + labels: + trino: simple-trino +spec: + connector: + iceberg: + metastore: + configMap: simple-hive-derby + s3: + inline: + host: test-minio + port: 9000 + accessStyle: Path + credentials: + secretClass: minio-credentials +---- + +The `metadata.name` will be the name of the catalog that shows up in your Trino instance. +The `metadata.labels` will be used to determine the link between Trino clusters and `TrinoCatalogs`. +The `spec.connector.` determines which connector is used. +Each connector supports a different set of attributes. + +== Add a catalog to a Trino cluster + +It is necessary to specify within the `TrinoCluster` which catalogs it should use. Here is an example of this: + +[source,yaml] +---- +apiVersion: trino.stackable.tech/v1alpha1 +kind: TrinoCluster +metadata: + name: simple-trino +spec: + version: 396-stackable0.1.0 + catalogLabelSelector: + matchLabels: + trino: simple-trino +# ... +---- + +The `spec.catalogLabelSelector` is used to fetch the list of `TrinoCatalogs` used for this Trino cluster. +In this case the `hive` and `iceberg` catalogs will be used as they both match the `catalogLabelSelector`. + +A `TrinoCluster` can, once created, detect and use new catalogs that have been subsequently created with a matching label. This also means that it is possible to reuse a `TrinoCatalog` within multiple `TrinoClusters`. \ No newline at end of file diff --git a/docs/modules/usage_guide/pages/cluster.adoc b/docs/modules/usage_guide/pages/cluster.adoc new file mode 100644 index 00000000..90ca60b9 --- /dev/null +++ b/docs/modules/usage_guide/pages/cluster.adoc @@ -0,0 +1,221 @@ += Creating a Trino cluster + +== Define an insecure cluster (testing) + +Create an insecure single node Trino cluster for testing. This can be accessed with the UI/CLI via http without either user/password credentials or authorization. + +For testing purposes we use the https://trino.io/docs/current/installation/cli.html[Trino CLI]. + +First, ensure all necessary operator have been deployed: + +[source] +---- +stackablectl operator install \ + secret commons hive trino +---- + +The Trino cluster can now be deployed: + +[source,yaml] +---- +include::example$code/trino-insecure.yaml[] +---- + +We have defined a single catalog - Hive - which uses an embedded database (derby). + +To interact with Trino, first obtain the host and port for the Trino coordinator service (in this and following examples, https://172.18.0.3:31748): + +[source] +---- +stackablectl services list + + PRODUCT NAME NAMESPACE ENDPOINTS EXTRA INFOS + + hive simple-hive-derby default hive 172.18.0.4:32186 + metrics 172.18.0.4:30109 + + trino simple-trino default coordinator-metrics 172.18.0.3:32123 + coordinator-https https://172.18.0.3:31748 +---- + +Next, download the Trino CLI tool (this can be obtained from the Stackable repository, as shown below): + +[source] +---- +curl --output trino.jar https://repo.stackable.tech/repository/packages/trino-cli/trino-cli-396-executable.jar +---- + +Execute some CLI commands to verify operation, such as returning the names of all catalogs. Note that an insecure connection is specified: +[source] +---- +./trino.jar --insecure --debug --server https://172.18.0.3:31748 --user=admin --execute "SHOW CATALOGS" --output-format=CSV_UNQUOTED + +hive +system +---- + +== Define a secure cluster (production) + +For secure connections the following steps must be taken: + +1. Enable authentication +2. Enable TLS between the clients and coordinator +3. Enable internal TLS for communication between coordinators and workers + +=== Via authentication + +If authentication is enabled, https://trino.io/docs/current/security/tls.html[TLS] for the coordinator as well as a shared secret for https://trino.io/docs/current/security/internal-communication.html[internal communications] (this is base64 and not encrypted) must be configured. + +Securing the Trino cluster will disable all HTTP ports and disable the web interface on the HTTP port as well. In the definition below the authentication is directed to use the `trino-users` secret and TLS communication will use a certificate signed by the Secret Operator (indicated by `autoTls`). + +[source,yaml] +---- +include::example$code/trino-secure-tls.yaml[] +---- + +<1> The name of (and reference to) the `SecretClass` +<2> The name of (and reference to) the `Secret` +<3> `TrinoCatalog` reference +<4> TLS mechanism + +The CLI now requires that a path to the keystore and a password be provided: + +[source] +---- +./trino.jar --debug --server https://172.18.0.3:31748 +--user=admin --keystore-path= --keystore-password= +---- + +=== Via TLS only + +This will disable the HTTP port and UI access and encrypt client-server communications. + +[source,yaml] +---- +include::example$code/trino-secure-tls-only.yaml[] +---- + +<1> The name of (and reference to) the `SecretClass` +<2> `TrinoCatalog` reference +<3> TLS mechanism + +CLI callout: + +[source] +---- +./trino.jar --debug --server https://172.18.0.3:31748 --keystore-path= --keystore-password= +---- + +=== Via internal TLS + +Internal TLS is for encrypted and authenticated communications between coordinators and workers. Since this applies to all the data send and processed between the processes, this may reduce the performance significantly. + +[source,yaml] +---- +include::example$code/trino-secure-internal-tls.yaml[] +---- + +<1> The name of (and reference to) the `SecretClass` +<2> The name of (and reference to) the `Secret` +<3> TLS mechanism + +Since Trino has internal and external communications running over a single port, this will enable the HTTPS port but not expose it. Cluster access is only possible via HTTP. + +[source] +---- +./trino.jar --debug --server http://172.18.0.3:31748 --user=admin +---- + +== S3 connection specification + +You can specify S3 connection details directly inside the `TrinoCatalog` specification +or by referring to an external `S3Connection` custom resource. + +To specify S3 connection details directly as part of the `TrinoCatalog` resource, you +add an inline connection configuration as shown below: + +[source,yaml] +---- +s3: # <1> + inline: + host: test-minio # <2> + port: 9000 # <3> + pathStyleAccess: true # <4> + secretClass: minio-credentials # <5> + tls: + verification: + server: + caCert: + secretClass: minio-tls-certificates #<6> +---- +<1> Entry point for the connection configuration +<2> Connection host +<3> Optional connection port +<4> Optional flag if path-style URLs should be used; This defaults to `false` + which means virtual hosted-style URLs are used. +<5> Name of the `Secret` object expected to contain the following keys: + `accessKey` and `secretKey` +<6> Optional TLS settings for encrypted traffic. The `secretClass` can be provided by the Secret Operator or yourself. + +A self provided S3 TLS secret can be specified like this: + +[source,yaml] +---- +--- +apiVersion: secrets.stackable.tech/v1alpha1 +kind: SecretClass +metadata: + name: minio-tls-certificates +spec: + backend: + k8sSearch: + searchNamespace: + pod: {} +--- +apiVersion: v1 +kind: Secret +metadata: + name: minio-tls-certificates + labels: + secrets.stackable.tech/class: minio-tls-certificates +data: + ca.crt: + tls.crt: + tls.key: +---- + +It is also possible to configure the bucket connection details as a separate +Kubernetes resource and only refer to that object from the `TrinoCatalog` specification +like this: + +[source,yaml] +---- +s3: + reference: my-connection-resource # <1> +---- +<1> Name of the connection resource with connection details + +The resource named `my-connection-resource` is then defined as shown below: + +[source,yaml] +---- +--- +apiVersion: s3.stackable.tech/v1alpha1 +kind: S3Connection +metadata: + name: my-connection-resource +spec: + host: test-minio + port: 9000 + accessStyle: Path + credentials: + secretClass: minio-credentials +---- + +This has the advantage that the connection configuration can be shared across +applications and reduces the cost of updating these details. + + + + + diff --git a/docs/modules/usage_guide/pages/configuration.adoc b/docs/modules/usage_guide/pages/configuration.adoc new file mode 100644 index 00000000..12e40ef0 --- /dev/null +++ b/docs/modules/usage_guide/pages/configuration.adoc @@ -0,0 +1,166 @@ += Configuration + +The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role). + +IMPORTANT: Do not override port numbers. This will lead to faulty installations. + +== Configuration Properties + +For a role or role group, at the same level of `config`, you can specify: `configOverrides` for: + +- `config.properties` +- `node.properties` +- `log.properties` +- `password-authenticator.properties` + +For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference]. + +[source,yaml] +---- +workers: + roleGroups: + default: + config: {} + replicas: 1 + configOverrides: + config.properties: + query.max-memory-per-node: "2GB" +---- + +Just as for the `config`, it is possible to specify this at role level as well: + +[source,yaml] +---- +workers: + configOverrides: + config.properties: + query.max-memory-per-node: "2GB" + roleGroups: + default: + config: {} + replicas: 1 +---- + +All override property values must be strings. The properties will be passed on without any escaping or formatting. + +== Environment Variables + +Environment variables can be (over)written by adding the `envOverrides` property. + +For example per role group: + +[source,yaml] +---- +workers: + roleGroups: + default: + config: {} + replicas: 1 + envOverrides: + JAVA_HOME: "path/to/java" +---- + +or per role: + +[source,yaml] +---- +workers: + envOverrides: + JAVA_HOME: "path/to/java" + roleGroups: + default: + config: {} + replicas: 1 +---- + +Here too, overriding properties such as `http-server.https.port` will lead to broken installations. + +== Resources + +=== Storage for data volumes + +You can mount a volume where data (config and logs of Trino) is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role or role group: + +[source,yaml] +---- +workers: + config: + resources: + storage: + data: + capacity: 2Gi + roleGroups: + default: + config: + resources: + storage: + data: + capacity: 3Gi +---- + +In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume. Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume). This works the same for memory or CPU requests. + +By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `2Gi` large local volume mount for the data location containing mainly logs. + +=== Memory requests + +You can request a certain amount of memory for each individual role group as shown below: + +[source,yaml] +---- +workers: + roleGroups: + default: + config: + resources: + memory: + limit: '2Gi' +---- + +In this example, each Trino container in the `default` group will have a maximum of 2 gigabytes of memory. To be more precise, these memory limits apply to the container running Trino but not to any sidecar containers that are part of the pod. + +Setting this property will also automatically set the maximum Java heap size for the corresponding process to 80% of the available memory. Be aware that if the memory constraint is too low, the cluster might fail to start. If pods terminate with an 'OOMKilled' status and the cluster doesn't start, try increasing the memory limit. + +For more details regarding Kubernetes memory requests and limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/[Assign Memory Resources to Containers and Pods]. + +=== CPU requests + +Similarly to memory resources, you can also configure CPU limits, as shown below: + +[source,yaml] +---- +workers: + roleGroups: + default: + config: + resources: + cpu: + max: '500m' + min: '250m' +---- + +=== Defaults + +If nothing is specified, the operator will automatically set the following default values for resources: + +[source,yaml] +---- +workers: + roleGroups: + default: + config: + resources: + requests: + cpu: 200m + memory: 2Gi + limits: + cpu: "4" + memory: 2Gi + storage: + data: + capacity: 2Gi +---- + +WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements. + +For more details regarding Kubernetes CPU limits see: https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/[Assign CPU Resources to Containers and Pods]. \ No newline at end of file diff --git a/docs/modules/usage_guide/pages/index.adoc b/docs/modules/usage_guide/pages/index.adoc new file mode 100644 index 00000000..e22b8d51 --- /dev/null +++ b/docs/modules/usage_guide/pages/index.adoc @@ -0,0 +1,14 @@ += Usage guide + +This section will help you to use the Trino Stackable Operator. It will show you how to set up a Trino cluster in different ways and to test it with Hive and S3. + +== Guides + +The following guides are available here: + +* xref:security.adoc[Security considerations] +* xref:catalogs.adoc[Using Catalogs] +* xref:cluster.adoc[Creating a Trino cluster] +* xref:configuration.adoc[Configuration] +* xref:monitoring.adoc[Monitoring] +* xref:query.adoc[Testing Trino with Hive and S3] \ No newline at end of file diff --git a/docs/modules/usage_guide/pages/monitoring.adoc b/docs/modules/usage_guide/pages/monitoring.adoc new file mode 100644 index 00000000..fe8b94e2 --- /dev/null +++ b/docs/modules/usage_guide/pages/monitoring.adoc @@ -0,0 +1,5 @@ += Monitoring + +The managed Trino instances are automatically configured to export Prometheus metrics. See +xref:home:operators:monitoring.adoc[] for more details. + diff --git a/docs/modules/usage_guide/pages/query.adoc b/docs/modules/usage_guide/pages/query.adoc new file mode 100644 index 00000000..fdea041d --- /dev/null +++ b/docs/modules/usage_guide/pages/query.adoc @@ -0,0 +1,70 @@ += Testing Trino with Hive and S3 + +Create a schema and a table for the Iris data located in S3 and query data. This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here]. + +== Create schema +[source,sql] +---- +CREATE SCHEMA IF NOT EXISTS hive.iris +WITH (location = 's3a://iris/'); +---- +which should return: +---- +CREATE SCHEMA +---- + +== Create table +[source,sql] +---- +CREATE TABLE IF NOT EXISTS hive.iris.iris_parquet ( + sepal_length DOUBLE, + sepal_width DOUBLE, + petal_length DOUBLE, + petal_width DOUBLE, + class VARCHAR +) +WITH ( + external_location = 's3a://iris/parq', + format = 'PARQUET' +); +---- +which should return: +---- +CREATE TABLE +---- + +== Query data +[source,sql] +---- +SELECT + sepal_length, + class +FROM hive.iris.iris_parquet +LIMIT 10; +---- + +which should return something like this: +---- + sepal_length | class +--------------+------------- + 5.1 | Iris-setosa + 4.9 | Iris-setosa + 4.7 | Iris-setosa + 4.6 | Iris-setosa + 5.0 | Iris-setosa + 5.4 | Iris-setosa + 4.6 | Iris-setosa + 5.0 | Iris-setosa + 4.4 | Iris-setosa + 4.9 | Iris-setosa +(10 rows) + +Query 20220210_161615_00000_a8nka, FINISHED, 1 node +https://172.18.0.5:30299/ui/query.html?20220210_161615_00000_a8nka +Splits: 18 total, 18 done (100.00%) +CPU Time: 0.7s total, 20 rows/s, 11.3KB/s, 74% active +Per Node: 0.3 parallelism, 5 rows/s, 3.02KB/s +Parallelism: 0.3 +Peak Memory: 0B +2.67 [15 rows, 8.08KB] [5 rows/s, 3.02KB/s] +---- \ No newline at end of file diff --git a/docs/modules/usage_guide/pages/security.adoc b/docs/modules/usage_guide/pages/security.adoc new file mode 100644 index 00000000..55c75f4b --- /dev/null +++ b/docs/modules/usage_guide/pages/security.adoc @@ -0,0 +1,65 @@ += Security considerations + +== Authentication + +We provide user authentication via a secret that can be referred in the custom resource: +[source,yaml] +---- +authentication: + method: + multiUser: + userCredentialsSecret: + namespace: default + name: simple-trino-users-secret +---- + +These secrets need to be created manually before startup. The secret may look like the following snippet: +[source,yaml] +---- +apiVersion: v1 +kind: Secret +metadata: + name: simple-trino-users-secret +type: kubernetes.io/opaque +stringData: + admin: $2y$10$89xReovvDLacVzRGpjOyAOONnayOgDAyIS2nW9bs5DJT98q17Dy5i + alice: $2y$10$HcCa4k9v2DRrD/g7e5vEz.Bk.1xg00YTEHOZjPX7oK3KqMSt2xT8W + bob: $2y$10$xVRXtYZnYuQu66SmruijPO8WHFM/UK5QPHTr.Nzf4JMcZSqt3W.2. +---- + +The : combinations are provided in the `stringData` field. The hashes are created using bcrypt with 10 rounds. +[source] +---- +htpasswd -nbBC 10 admin admin +---- + +== Authorization + +In order to authorize Trino via OPA, a `ConfigMap` containing Rego rules for Trino has to be applied. The following example is an all-access Rego rule for testing with the user `admin`. Do not use it in production! + +[source,yaml] +---- +apiVersion: v1 +kind: ConfigMap +metadata: + name: opa-bundle-trino + labels: + opa.stackable.tech/bundle: "trino" +data: + trino.rego: | + package trino + + import future.keywords.in + + default allow = false + + allow { + is_admin + } + + is_admin() { + input.context.identity.user == "admin" + } +---- + +Users should write their own rego rules for more complex OPA authorization. \ No newline at end of file