-
Notifications
You must be signed in to change notification settings - Fork 427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-browser assisted full cloud stack deployments #2816
Conversation
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
@schustmi I might need you to take a quick look at the side-change documented in the PR description. It affects the way we build docker container images to run pipelines. |
5c3b989
to
1419d71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a few comments, mostly minor things.
50ebeb7
to
68404fc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docker changes LGTM
src/zenml/analytics/enums.py
Outdated
@@ -85,6 +85,9 @@ class AnalyticsEvent(str, Enum): | |||
DEPLOY_STACK_COMPONENT = "Stack component deployed" | |||
DESTROY_STACK_COMPONENT = "Stack component destroyed" | |||
|
|||
# Cloud stack deployment | |||
DEPLOY_STACK_CLOUD = "Stack deployed in cloud" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where else would a stack be deployed? 😅
Isn't there a better wording for this to explain what it really does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something with One Click
in there so we know this is what happened? Also, this is currently only being tracked in the CLI, which means we'll miss all users that run this from the dashboard (which I assume will be the majority of them?).
At some point it was mentioned that you're planning to add some special metadata to the components/stacks created using this feature, which we could then track in our Stack/Componenent/Connector registered
events. Is this still planned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I named the event DEPLOY_FULL_STACK
. On the server side, there is the event REGISTERED_STACK
that we already had, but it will have new metadata fields:
wizard = True
if the full stack was registered via the CLI wizarddeployment = cloud-formation
if the full stack was deployed from scratch in AWSprovider = aws
if either happened
"""Return the ZenML stack that was deployed and registered. | ||
|
||
This method is called to retrieve a ZenML stack matching the deployment | ||
provider and stack name. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack name is not an input here. This I guess just gives the latest, I'm not sure how this would be used if there are multiple such stacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the comment below. I can't rely on the stack name to be the same because the user can change it before they deploy the infrastructure in AWS. Therefore I'm using another approach: I'm searching for the stack most likely to match the one deployed by the user in AWS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, just the docstring is wrong here. I guess this method is only used for polling until the stack gets deployed right?
IMO, this could again be improved (similar to my analytics comment) by some metadata on the stack. The lambda function registering the stack could add a specific label to the stack, which could then be passed to this endpoint when waiting for the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added labels
to the full stack model. These are set differently depending on where and how the endpoint is called and used as metadata in the REGISTERED_STACK
event. They are also set on the stack components and service connectors being created (not yet possible to do so directly on the stack). These labels are also used to "detect" the stack here in fewer steps.
stack_name: str, | ||
location: Optional[str] = None, | ||
auth_context: AuthContext = Security(authorize), | ||
) -> Tuple[str, str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we maybe already check here that the user has permissions to create service connectors, stacks and components? Otherwise I assume the lambda function downstream will fail, which leaves users with deployed infra that is not tracked in ZenML?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but not something I can check in the client. I would need a new endpoint just for this purpose and I would need to check all permissions, not just the service connectors. Perhaps I can use the endpoint that @bcdurak already set up. Maybe add a "verify" flag to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can;t we have those checks on this exact URL endpoint? Or what is the purpose for creating this URL other than to open it and create stacks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LLM Finetuning template updates in |
76a42cd
to
5c44727
Compare
LLM Finetuning template updates in |
59df59e
to
a768870
Compare
f22eb4b
to
6804752
Compare
Add cloudshell instructions Update Update GCP templates WIP new zenml deploy CLI command Change AWS template URL Implemented zenml deploy CLI command Complete zenml stack deploy CLI command Adapt stack registration to new endpoint structure Add default repository configuration attr to all container registries Update template with proper sagemaker permissions Fix template permissions Final IAM permission fixes Fix AWS service connector permissions Remove unneeded permissions Fix container registry default repository setting support Fix the S3 artifact store to use the AWS region in the service connector Fix linter Improve zenml stack deploy CLI Update cloud formation template Fix cloud formation template Move cloud stack deployment logic behind the REST API Re-add local store check Use HTTPS for the server URL Fix stack link in zenml stack deploy CLI Fix docstring errors Fixed more docstrings Apply suggestions from code review Co-authored-by: Barış Can Durak <36421093+bcdurak@users.noreply.github.com> Co-authored-by: Michael Schuster <schustmi@users.noreply.github.com> Add labels to full stacks and use them in event metadata and to detect deployed stacks Update deployment scripts Add templates publish github action Fix bugs Rename stack labels to start with zenml: Fix service connector labels to be strings Fix linter errors Fix component labels in CLI and add stack labels to CLI Fix deployed stack detection to account for new labels Move AWS CF templates to their own subdir
713d44d
to
62f63f1
Compare
Describe changes
Implements a new automated experience of deploying cloud infrastructure and registering full cloud ZenML stacks two-in-one with minimal user input. This experience is meant to reduce the friction and technical difficulties usually encountered when a user has to create a cloud ZenML stack from scratch.
The
zenml stack deploy
CLI command has been repurposed to support this new experience as showcased here:AWS is currently the only supported provider with Google and Azure to follow shortly.
Side changes
Make the
zenml
repository name configurable in the container registry stack componentCurrently, ZenML uses the
.../zenml
docker repository by default for all pipeline images it builds/pushes. This can be overridden, but it's a docker settings that must be configured in code.At the same time, we want these full stack deployment templates to create unique cloud resources, which includes container registries. Therefore, we need the ability to configure the container registry stack components to use a "default repository name" that is different than
zenml
. This PR also adds that capability by modeling a new (optional)default_repository
configuration attribute to all container registry stack components.Use the region from the AWS service connector in the S3 artifact store
The S3 artifact store doesn't explicitly configure the AWS region in the s3fs client. This can lead to errors in environments such as Sagemaker, because the region might default to something else. This PR now uses the AWS region in the service connector to explicitly configure the s3fs client.
Deprecations
The
zenml stack deploy
CLI command has been renamed tozenml stack deploy-mlstack
and marked as deprecated. Users should use the newzenml stack deploy
CLI command to benefit from the new experience or use the mlstacks project stacks separately.Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes