Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eng 2589 add documentation for the local data #1178

Merged
merged 9 commits into from
Apr 15, 2023

Conversation

Fanjia-Yan
Copy link
Contributor

Describe your changes and why you are making these changes

This PR adds documentation for using local data by introducing local data parameter as an alternative method in Accessing Data section. The motivation for putting the option here is that a new user can leverage the quickstart tutorial to quickly create his own workflow without the complication of hooking his database or uploading the data through Aqueduct UI(which is not immediately obvious). I also reminded the user to set the use_local flag to `true when publishing the workflow.
---Addition 1---
Screen Shot 2023-04-05 at 4 52 50 PM
---Addition 2----
Screen Shot 2023-04-05 at 4 53 09 PM

Related issue number (if any)

https://linear.app/aqueducthq/issue/ENG-2589/add-documentation-for-the-local-data-feature

Loom demo (if any)

[I can add one later]

Checklist before requesting a review

  • I have created a descriptive PR title. The PR title should complete the sentence "This PR...".
  • I have performed a self-review of my code.
  • I have included a small demo of the changes. For the UI, this would be a screenshot or a Loom video.
  • If this is a new feature, I have added unit tests and integration tests.
  • I have run the integration tests locally and they are passing.
  • I have run the linter script locally (See python3 scripts/run_linters.py -h for usage).
  • All features on the UI continue to work correctly.
  • Added one of the following CI labels:
    • run_integration_test: Runs integration tests
    • skip_integration_test: Skips integration tests (Should be used when changes are ONLY documentation/UI)

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@Fanjia-Yan
Copy link
Contributor Author

Some questions from my side:

  1. While I think the local data is good for a new user to know, since Quickstart Tutorial is the first tutorial that the user is going to use, will all the create_param() and use_local be too much for a user to digest?
  2. If so, will it be better to put it in Parameters Tutorial?

@vsreekanti
Copy link
Contributor

@Fanjia-Yan, just a note that in addition to documenting here, you should also update the GitBook documentation: https://github.com/aqueducthq/gitbook

@Fanjia-Yan Fanjia-Yan marked this pull request as ready for review April 10, 2023 19:07
"---\n",
"### Local Data Parameters\n",
"\n",
"In the case that we haven't acquire accesses to the databases, or we want to test out a workflow with a subset of a dataset, we can consume a local JSON, CSV, or Parquet file to create a parameter Artifact that serve as a temporary data."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit grammar and cleanup, change to:

You can also create a parameter to seamlessly pass in local data to your flow.

In the following example, we pass in a local CSV file that the flow can consume as a table artifact.

" as_type=ArtifactType.TABLE,\n",
" format=\"csv\")\n",
"\n",
"# The local data parameter we just create will have the same content as `reviews_table`. \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this comment and replace the following line with simply: data_param.get() so the user sees the data.

"id": "a2e121b8",
"metadata": {},
"source": [
"If you decides to publish flow using local data, set `use_local` to `True` in `publish_flow`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit grammar: If you decide to publish a flow that refers to local data, you must set use_local to True in publish_flow.

@Fanjia-Yan Fanjia-Yan added the run_integration_test Triggers integration tests label Apr 14, 2023
@Fanjia-Yan Fanjia-Yan merged commit 142d63e into main Apr 15, 2023
@vsreekanti vsreekanti deleted the eng-2589-add-documentation-for-the-local-data branch April 18, 2023 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run_integration_test Triggers integration tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants