Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update durable storage docs #14720

Closed

Conversation

adarshsanjeev
Copy link
Contributor

Update durable storage docs table.

  • Change alphabetical order of the table to be hierarchical.
  • Removed documentation for local durable storage.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@LakshSingla
Copy link
Contributor

Currently, I am adding support for Azure and GCS storage connector. How would I add the documentation for those two alongside this connector. I am suspecting the many config values to overlap, but not all.

|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported. |
|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.|
|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. |
|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files. |
|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store intermediate stage results. Provide a unique value for the prefix. Don't share the same prefix between clusters. If the location includes other files or directories, then they will get cleaned up as well. |
|`druid.msq.intermediate.storage.tempDir`| n/a | Required. Directory path on the local disk to temporarily store intermediate stage results. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so

|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported. |
|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.|
|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. |
|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it in the order like: type > bucket > prefix > tempDir > maxRetry > chunkSize.
First three are required. Max retry and chunk size is something that we least expect the users to set, with chunk size being the least of these two. I am of the opinion that we can also undocument it.

@cryptoe cryptoe mentioned this pull request Aug 1, 2023
1 task
@cryptoe
Copy link
Contributor

cryptoe commented Aug 1, 2023

Currently, I am adding support for Azure and GCS storage connector. How would I add the documentation for those two alongside this connector. I am suspecting the many config values to overlap, but not all.

We can probably keep the 2 separate since we need this for 27.0 release.

@cryptoe cryptoe added this to the 27.0 milestone Aug 1, 2023
|`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable durable storage for the cluster.|
|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. |
|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported. |
|`druid.msq.intermediate.storage.type` | `s3` for Amazon S3 storage | Required. The type of storage to use. `s3` is the only supported type. |

avoid forward-looking language. The docs here are for the current version.

@@ -389,13 +389,13 @@ The following common service properties control how durable storage behaves:

|Parameter |Default | Description |
|-------------------|----------------------------------------|----------------------|
|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files. |
|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.|
|`druid.msq.intermediate.storage.enable` | true | Required. Whether to enable durable storage for the cluster.|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under what circumstances would I want to turn this off? (Since it defaults to "true" I'm assuming that "on" is what we recommend.). Either a brief statement here or a link to documentation about how to configure this (which may or may not exist)

|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported. |
|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.|
|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. |
|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files. |
|`druid.msq.intermediate.storage.bucket` | n/a | The S3 bucket to store intermediate files. |

|`druid.msq.intermediate.storage.type` | `s3` if your deep storage is S3 | Required. The type of storage to use. Currently only `s3` is supported. |
|`druid.msq.intermediate.storage.chunkSize` | 100MiB | Optional. Defines the size of each chunk to temporarily store in `druid.msq.intermediate.storage.tempDir`. The chunk size must be between 5 MiB and 5 GiB. A large chunk size reduces the API calls made to the durable storage, however it requires more disk space to store the temporary chunks. Druid uses a default of 100MiB if the value is not provided.|
|`druid.msq.intermediate.storage.maxRetry` | 10 | Optional. Defines the max number times to attempt S3 API calls to avoid failures due to transient errors. |
|`druid.msq.intermediate.storage.bucket` | n/a | The bucket in S3 where you want to store intermediate files. |
|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store intermediate stage results. Provide a unique value for the prefix. Don't share the same prefix between clusters. If the location includes other files or directories, then they will get cleaned up as well. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store intermediate stage results. Provide a unique value for the prefix. Don't share the same prefix between clusters. If the location includes other files or directories, then they will get cleaned up as well. |
|`druid.msq.intermediate.storage.prefix` | n/a | S3 prefix to store intermediate stage results. Provide a unique value for the prefix. Don't share the same prefix between clusters. If the location includes other files or directories, then they will get cleaned up as well. |

typo (double space)

317brian added a commit to 317brian/druid that referenced this pull request Aug 2, 2023
@adarshsanjeev
Copy link
Contributor Author

Closed in favour of #14609

cryptoe added a commit that referenced this pull request Aug 4, 2023
* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate #14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
317brian added a commit to 317brian/druid that referenced this pull request Aug 4, 2023
* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate apache#14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
(cherry picked from commit 3b5b6c6)
AmatyaAvadhanula pushed a commit that referenced this pull request Aug 4, 2023
* docs: query from deep storage (#14609)

* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate #14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
(cherry picked from commit 3b5b6c6)

* change code tab styles to Docusaurus2 format from Docu1 format

* fix spelling file

* doc cleanup

(cherry picked from commit 2218a13)

---------

Co-authored-by: Laksh Singla <lakshsingla@gmail.com>
317brian added a commit to 317brian/druid that referenced this pull request Aug 4, 2023
* cold tier wip

* wip

* copyedits

* wip

* copyedits

* copyedits

* wip

* wip

* update rules page

* typo

* typo

* update sidebar

* moves durable storage info to its own page in operations

* update screenshots

* add apache license

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* add query from deep storage tutorial stub

* address some of the feedback

* revert screenshot update. handled in separate pr

* load rule update

* wip tutorial

* reformat deep storage endpoints

* rest of tutorial

* typo

* cleanup

* screenshot and sidebar for tutorial

* add license

* typos

* Apply suggestions from code review

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>

* rest of review comments

* clarify where results are stored

* update api reference for durablestorage context param

* Apply suggestions from code review

Co-authored-by: Karan Kumar <karankumar1100@gmail.com>

* comments

* incorporate apache#14720

* address rest of comments

* missed one

* Update docs/api-reference/sql-api.md

* Update docs/api-reference/sql-api.md

---------

Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: demo-kratia <56242907+demo-kratia@users.noreply.github.com>
Co-authored-by: Karan Kumar <karankumar1100@gmail.com>
(cherry picked from commit 3b5b6c6)
(cherry picked from commit ed83c56)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants