Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2.0 Definition for snowpark #1402

Merged
merged 39 commits into from
Aug 10, 2024
Merged

V2.0 Definition for snowpark #1402

merged 39 commits into from
Aug 10, 2024

Conversation

sfc-gh-jsikorski
Copy link
Collaborator

@sfc-gh-jsikorski sfc-gh-jsikorski commented Aug 1, 2024

Pre-review checklist

  • I've confirmed that instructions included in README.md are still correct after my changes in the codebase.
  • I've added or updated automated unit tests to verify correctness of my new code.
  • I've added or updated integration tests to verify correctness of my new code.
  • I've confirmed that my changes are working by executing CLI's commands manually on MacOS.
  • I've confirmed that my changes are working by executing CLI's commands manually on Windows.
  • I've confirmed that my changes are up-to-date with the target branch.
  • I've described my changes in the release notes.
  • I've described my changes in the section below.

Changes description

Creating V2.0 project definition for Snowpark entities

sfc-gh-turbaszek and others added 8 commits July 26, 2024 14:07
# Conflicts:
#	src/snowflake/cli/plugins/snowpark/commands.py
#	src/snowflake/cli/plugins/streamlit/commands.py
#	src/snowflake/cli/templates/default_snowpark/snowflake.yml
#	src/snowflake/cli/templates/default_streamlit/snowflake.yml
# Conflicts:
#	src/snowflake/cli/api/feature_flags.py
#	src/snowflake/cli/api/project/schemas/entities/entities.py
#	src/snowflake/cli/api/project/schemas/entities/streamlit_entity.py
#	src/snowflake/cli/api/project/schemas/project_definition.py
#	tests/streamlit/__snapshots__/test_commands.ambr
#	tests/streamlit/test_commands.py
sfc-gh-jsikorski and others added 2 commits August 1, 2024 11:17
Co-authored-by: Tomasz Urbaszek <tomasz.urbaszek@snowflake.com>
Co-authored-by: Tomasz Urbaszek <tomasz.urbaszek@snowflake.com>
EntityOrList = Union[
AnnotatedEntity, List[AnnotatedEntity]
] # TODO: Probablyu should delete this line

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change neccessary? Are you building on current main?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just for simplicity - same thing is used 4 times, so seems reasonable to put it into variable

@sfc-gh-jsikorski sfc-gh-jsikorski marked this pull request as ready for review August 2, 2024 05:48
@sfc-gh-jsikorski sfc-gh-jsikorski requested review from a team as code owners August 2, 2024 05:48
Comment on lines 118 to 125
defaults: Optional[DefaultsField] = Field(
title="Default key/value entity values that are merged recursively for each entity.",
default=None,
)

env: Optional[Dict[str, Union[str, int, bool]]] = Field(
title="Default environment specification for this project.",
default=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really being added in this PR? How does it releat to Snowpark?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was moved from line 165. It seems a bit more consistent to have attributes first, then methods - not attribute, method, attribute, method etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's limit the changed to snowpark only. It's easier to revert or review when changes has well defined scope.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 150 to 172
def validate_entities(
cls, entities: Dict[str, AnnotatedEntity]
) -> Dict[str, AnnotatedEntity]:
for key, entity in entities.items():
# TODO Automatically detect TargetFields to validate
if entity.type == ApplicationEntity.get_type():
if isinstance(entity.from_, TargetField):
target_key = entity.from_.target
target_object = entity.from_
target_type = target_object.get_type()
cls._validate_target_field(target_key, target_type, entities)
if isinstance(entity, list):
for e in entity:
cls._validate_single_entity(e, entities)
else:
cls._validate_single_entity(entity, entities)
return entities

@classmethod
def _validate_single_entity(
cls, entity: Entity, entities: Dict[str, AnnotatedEntity]
):
if entity.type == ApplicationEntity.get_type():
if isinstance(entity.from_, TargetField):
target_key = entity.from_.target
target_object = entity.from_
target_type = target_object.get_type()
cls._validate_target_field(target_key, target_type, entities)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How this change relates to snowpark?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - it was moved in file, to make the code more readable

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 176 to 188
stage_names = {
entity.stage for entity in [*functions.values(), *procedures.values()]
}
stage_manager = StageManager()
project_name = (
pd.defaults.project_name if pd.defaults.project_name else "my_snowpark_project"
)
for stage in stage_names:
stage = FQN.from_string(stage).using_context()
stage_manager.create(
stage_name=stage, comment="deployments managed by Snowflake CLI"
)
artifact_stage_directory = get_app_stage_path(stage, project_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We create multiple stages but use only one artifact_stage_directory? I think we may want to consider doing

for stage in stages:
   create_stage(stage)
   for udf_sproc in stage_to_objects_map[stage]:
       _deploy_single_object()

WDYT? In this way we are processing stage by stage. In case of single stage there's no difference compared to current bahavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what way? Currently new artifact_stage_directory is created for each stage.
We use common project name, as we can have multiple udfs on single stage, and need a name to group them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This loop:

for stage in stage_names:
        stage = FQN.from_string(stage).using_context()
        stage_manager.create(
            stage_name=stage, comment="deployments managed by Snowflake CLI"
        )
        artifact_stage_directory = get_app_stage_path(stage, project_name)

Results in artifact_stage_directory being directory for last stage in stage_names. And this single value is used for all entities independent of their stages.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected - now it's all in a loop.

)
log.info("Building package using sources from: %s", snowpark_paths.source.path)
log.info("Building package using sources from:")
log.info(",".join(str(s) for s in snowpark_paths.sources))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of that duplication check, move project_definition to signature.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if i understand

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    cli_context = get_cli_context()
    pd = cli_context.project_definition
    if not pd.meets_version_requirement("2"):
        pd = _migrate_v1_snowpark_to_v2(pd)`

this duplicated code, we did the check in very first line in command definition. Consider making:

def _deploy_single_object(
   cli_contex,   <-- here
    manager: FunctionManager | ProcedureManager,
    object_type: ObjectType,
    object_definition: SnowparkEntity,
    existing_objects: Dict[str, Dict],
    snowflake_dependencies: List[str],
    stage_artifact_path: str,
):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to a helper function, as it is used in bot build and deploy commands

sfc-gh-jsikorski and others added 3 commits August 2, 2024 12:27
Co-authored-by: Tomasz Urbaszek <tomasz.urbaszek@snowflake.com>
Co-authored-by: Patryk Czajka <patryk.czajka@snowflake.com>
default=[],
)
stage: str = Field(title="Stage in which artifacts will be stored")
src: str = Field(title="Folder where your code should be located")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

source and imports should be replaced with artifacts as per design

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it seems, that artifacts mean local files that will be uploaded - so imports should be separated, as those are the files that already exist server-side, and just need to be included in the SQL declaration

}
stage_manager = StageManager()
project_name = (
pd.defaults.project_name if pd.defaults.project_name else "my_snowpark_project"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put the default "my_snowpark_project" value under SnowparkEntity?
Also, each defaults entry will automatically apply to each entity that has this field but was not specified by the user. Not sure this check is needed here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sfc-gh-jsikorski and others added 8 commits August 6, 2024 19:01
# Conflicts:
#	src/snowflake/cli/_plugins/snowpark/commands.py
# Conflicts:
#	src/snowflake/cli/api/project/schemas/entities/entities.py
#	src/snowflake/cli/api/project/schemas/project_definition.py
Comment on lines 4 to 13
class SnowparkEntity(EntityBase):
pass


class FunctionEntity(SnowparkEntity):
pass


class ProcedureEntity(SnowparkEntity):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all 3 be in the same file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They`re so closely related, it seems logical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor later if we need to, but I imagine we would want to compose functions and procedures with other non-snowpark entities (maybe even as standalone?)

for stage in stage_names:
stage = FQN.from_string(stage).using_context()
stage_manager.create(fqn=stage, comment="deployments managed by Snowflake CLI")
artifact_stage_directory = get_app_stage_path(stage, pd.defaults.project_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're assuming pd.defaults.project_name is defined, but shouldn't it be optional? What should be the stage if different values are defined on each entity?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ATM we have only one snowpark project value. I think we should introduce different stage paths after introducing mixins, as it will change much more logic. I.e. now we create only one artifact for all snowpark entities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now realize you used the defaults section to store global-like values for entities to reference, but it's not exactly its purpose. You can think of defaults as automatic mixins that are applied to all entities if they have a value of the same name and the user didn't specify it on the entity.

In this example sp1.project_name will be "proj1", and sp2.project_name will be "default_proj":

entities:
  sp1:
    type: snowpark
    project_name: proj1
  sp2:
    type: snowpark
  defaults:
    project_name: default_proj

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right- i used the default value when converting from v1 and went on with this logic.
Nevertheless - currently it doesn't change anything, as we use single artifact for all snowpark entities

from snowflake.cli.api.project.schemas.updatable_model import DiscriminatorField


class SnowparkEntityModel(EntityModelBase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't we missing an entity type here?

Suggested change
class SnowparkEntityModel(EntityModelBase):
class SnowparkEntityModel(EntityModelBase):
type: Literal["snowpark"] = DiscriminatorField() # noqa: A003

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what context snowpark would be used? Currently it's a base class for common elements between entities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. For clarity, maybe we should have SnowparkEntityModel to also inherit from ABC and add "Base" to its name?
SnowparkFunctionEntity might also be more accurate.

Comment on lines 4 to 13
class SnowparkEntity(EntityBase):
pass


class FunctionEntity(SnowparkEntity):
pass


class ProcedureEntity(SnowparkEntity):
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can refactor later if we need to, but I imagine we would want to compose functions and procedures with other non-snowpark entities (maybe even as standalone?)

Comment on lines 170 to 181
"defaults": {"stage": "dev"},
"entities": {
"function1": {
"type": "function",
"name": "name",
"handler": "app.hello",
"returns": "string",
"signature": [{"name": "name", "type": "string"}],
"runtime": "3.8",
"artifacts": "src",
}
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we also have a test for "type": "procedure"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, added

for stage in stage_names:
stage = FQN.from_string(stage).using_context()
stage_manager.create(fqn=stage, comment="deployments managed by Snowflake CLI")
artifact_stage_directory = get_app_stage_path(stage, pd.defaults.project_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now realize you used the defaults section to store global-like values for entities to reference, but it's not exactly its purpose. You can think of defaults as automatic mixins that are applied to all entities if they have a value of the same name and the user didn't specify it on the entity.

In this example sp1.project_name will be "proj1", and sp2.project_name will be "default_proj":

entities:
  sp1:
    type: snowpark
    project_name: proj1
  sp2:
    type: snowpark
  defaults:
    project_name: default_proj

@sfc-gh-jsikorski sfc-gh-jsikorski merged commit bb4ee2b into main Aug 10, 2024
16 checks passed
@sfc-gh-jsikorski sfc-gh-jsikorski deleted the jsikorski/snowpark-v2 branch August 10, 2024 06:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants