-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
V2.0 Definition for snowpark #1402
Conversation
# Conflicts: # src/snowflake/cli/plugins/snowpark/commands.py # src/snowflake/cli/plugins/streamlit/commands.py # src/snowflake/cli/templates/default_snowpark/snowflake.yml # src/snowflake/cli/templates/default_streamlit/snowflake.yml
# Conflicts: # src/snowflake/cli/api/feature_flags.py # src/snowflake/cli/api/project/schemas/entities/entities.py # src/snowflake/cli/api/project/schemas/entities/streamlit_entity.py # src/snowflake/cli/api/project/schemas/project_definition.py # tests/streamlit/__snapshots__/test_commands.ambr # tests/streamlit/test_commands.py
src/snowflake/cli/api/project/schemas/entities/snowpark_entity.py
Outdated
Show resolved
Hide resolved
src/snowflake/cli/api/project/schemas/entities/snowpark_entity.py
Outdated
Show resolved
Hide resolved
Co-authored-by: Tomasz Urbaszek <tomasz.urbaszek@snowflake.com>
Co-authored-by: Tomasz Urbaszek <tomasz.urbaszek@snowflake.com>
EntityOrList = Union[ | ||
AnnotatedEntity, List[AnnotatedEntity] | ||
] # TODO: Probablyu should delete this line | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this change neccessary? Are you building on current main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just for simplicity - same thing is used 4 times, so seems reasonable to put it into variable
defaults: Optional[DefaultsField] = Field( | ||
title="Default key/value entity values that are merged recursively for each entity.", | ||
default=None, | ||
) | ||
|
||
env: Optional[Dict[str, Union[str, int, bool]]] = Field( | ||
title="Default environment specification for this project.", | ||
default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really being added in this PR? How does it releat to Snowpark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was moved from line 165. It seems a bit more consistent to have attributes first, then methods - not attribute, method, attribute, method etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's limit the changed to snowpark only. It's easier to revert or review when changes has well defined scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def validate_entities( | ||
cls, entities: Dict[str, AnnotatedEntity] | ||
) -> Dict[str, AnnotatedEntity]: | ||
for key, entity in entities.items(): | ||
# TODO Automatically detect TargetFields to validate | ||
if entity.type == ApplicationEntity.get_type(): | ||
if isinstance(entity.from_, TargetField): | ||
target_key = entity.from_.target | ||
target_object = entity.from_ | ||
target_type = target_object.get_type() | ||
cls._validate_target_field(target_key, target_type, entities) | ||
if isinstance(entity, list): | ||
for e in entity: | ||
cls._validate_single_entity(e, entities) | ||
else: | ||
cls._validate_single_entity(entity, entities) | ||
return entities | ||
|
||
@classmethod | ||
def _validate_single_entity( | ||
cls, entity: Entity, entities: Dict[str, AnnotatedEntity] | ||
): | ||
if entity.type == ApplicationEntity.get_type(): | ||
if isinstance(entity.from_, TargetField): | ||
target_key = entity.from_.target | ||
target_object = entity.from_ | ||
target_type = target_object.get_type() | ||
cls._validate_target_field(target_key, target_type, entities) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this change relates to snowpark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again - it was moved in file, to make the code more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
stage_names = { | ||
entity.stage for entity in [*functions.values(), *procedures.values()] | ||
} | ||
stage_manager = StageManager() | ||
project_name = ( | ||
pd.defaults.project_name if pd.defaults.project_name else "my_snowpark_project" | ||
) | ||
for stage in stage_names: | ||
stage = FQN.from_string(stage).using_context() | ||
stage_manager.create( | ||
stage_name=stage, comment="deployments managed by Snowflake CLI" | ||
) | ||
artifact_stage_directory = get_app_stage_path(stage, project_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We create multiple stages but use only one artifact_stage_directory
? I think we may want to consider doing
for stage in stages:
create_stage(stage)
for udf_sproc in stage_to_objects_map[stage]:
_deploy_single_object()
WDYT? In this way we are processing stage by stage. In case of single stage there's no difference compared to current bahavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what way? Currently new artifact_stage_directory
is created for each stage.
We use common project name, as we can have multiple udfs on single stage, and need a name to group them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop:
for stage in stage_names:
stage = FQN.from_string(stage).using_context()
stage_manager.create(
stage_name=stage, comment="deployments managed by Snowflake CLI"
)
artifact_stage_directory = get_app_stage_path(stage, project_name)
Results in artifact_stage_directory
being directory for last stage in stage_names
. And this single value is used for all entities independent of their stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected - now it's all in a loop.
) | ||
log.info("Building package using sources from: %s", snowpark_paths.source.path) | ||
log.info("Building package using sources from:") | ||
log.info(",".join(str(s) for s in snowpark_paths.sources)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of that duplication check, move project_definition to signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if i understand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cli_context = get_cli_context()
pd = cli_context.project_definition
if not pd.meets_version_requirement("2"):
pd = _migrate_v1_snowpark_to_v2(pd)`
this duplicated code, we did the check in very first line in command definition. Consider making:
def _deploy_single_object(
cli_contex, <-- here
manager: FunctionManager | ProcedureManager,
object_type: ObjectType,
object_definition: SnowparkEntity,
existing_objects: Dict[str, Dict],
snowflake_dependencies: List[str],
stage_artifact_path: str,
):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to a helper function, as it is used in bot build
and deploy
commands
Co-authored-by: Tomasz Urbaszek <tomasz.urbaszek@snowflake.com>
Co-authored-by: Patryk Czajka <patryk.czajka@snowflake.com>
default=[], | ||
) | ||
stage: str = Field(title="Stage in which artifacts will be stored") | ||
src: str = Field(title="Folder where your code should be located") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source
and imports
should be replaced with artifacts
as per design
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it seems, that artifacts
mean local files that will be uploaded - so imports
should be separated, as those are the files that already exist server-side, and just need to be included in the SQL declaration
} | ||
stage_manager = StageManager() | ||
project_name = ( | ||
pd.defaults.project_name if pd.defaults.project_name else "my_snowpark_project" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we put the default "my_snowpark_project" value under SnowparkEntity
?
Also, each defaults entry will automatically apply to each entity that has this field but was not specified by the user. Not sure this check is needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# Conflicts: # src/snowflake/cli/_plugins/snowpark/commands.py
# Conflicts: # src/snowflake/cli/api/project/schemas/entities/entities.py # src/snowflake/cli/api/project/schemas/project_definition.py
class SnowparkEntity(EntityBase): | ||
pass | ||
|
||
|
||
class FunctionEntity(SnowparkEntity): | ||
pass | ||
|
||
|
||
class ProcedureEntity(SnowparkEntity): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all 3 be in the same file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They`re so closely related, it seems logical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can refactor later if we need to, but I imagine we would want to compose functions and procedures with other non-snowpark entities (maybe even as standalone?)
for stage in stage_names: | ||
stage = FQN.from_string(stage).using_context() | ||
stage_manager.create(fqn=stage, comment="deployments managed by Snowflake CLI") | ||
artifact_stage_directory = get_app_stage_path(stage, pd.defaults.project_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're assuming pd.defaults.project_name
is defined, but shouldn't it be optional? What should be the stage if different values are defined on each entity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ATM we have only one snowpark project value. I think we should introduce different stage paths after introducing mixins, as it will change much more logic. I.e. now we create only one artifact for all snowpark entities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now realize you used the defaults
section to store global-like values for entities to reference, but it's not exactly its purpose. You can think of defaults
as automatic mixins that are applied to all entities if they have a value of the same name and the user didn't specify it on the entity.
In this example sp1.project_name
will be "proj1", and sp2.project_name
will be "default_proj":
entities:
sp1:
type: snowpark
project_name: proj1
sp2:
type: snowpark
defaults:
project_name: default_proj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right- i used the default value when converting from v1 and went on with this logic.
Nevertheless - currently it doesn't change anything, as we use single artifact for all snowpark entities
from snowflake.cli.api.project.schemas.updatable_model import DiscriminatorField | ||
|
||
|
||
class SnowparkEntityModel(EntityModelBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aren't we missing an entity type here?
class SnowparkEntityModel(EntityModelBase): | |
class SnowparkEntityModel(EntityModelBase): | |
type: Literal["snowpark"] = DiscriminatorField() # noqa: A003 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what context snowpark
would be used? Currently it's a base class for common elements between entities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. For clarity, maybe we should have SnowparkEntityModel
to also inherit from ABC
and add "Base" to its name?
SnowparkFunctionEntity
might also be more accurate.
class SnowparkEntity(EntityBase): | ||
pass | ||
|
||
|
||
class FunctionEntity(SnowparkEntity): | ||
pass | ||
|
||
|
||
class ProcedureEntity(SnowparkEntity): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can refactor later if we need to, but I imagine we would want to compose functions and procedures with other non-snowpark entities (maybe even as standalone?)
"defaults": {"stage": "dev"}, | ||
"entities": { | ||
"function1": { | ||
"type": "function", | ||
"name": "name", | ||
"handler": "app.hello", | ||
"returns": "string", | ||
"signature": [{"name": "name", "type": "string"}], | ||
"runtime": "3.8", | ||
"artifacts": "src", | ||
} | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we also have a test for "type": "procedure"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, added
for stage in stage_names: | ||
stage = FQN.from_string(stage).using_context() | ||
stage_manager.create(fqn=stage, comment="deployments managed by Snowflake CLI") | ||
artifact_stage_directory = get_app_stage_path(stage, pd.defaults.project_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now realize you used the defaults
section to store global-like values for entities to reference, but it's not exactly its purpose. You can think of defaults
as automatic mixins that are applied to all entities if they have a value of the same name and the user didn't specify it on the entity.
In this example sp1.project_name
will be "proj1", and sp2.project_name
will be "default_proj":
entities:
sp1:
type: snowpark
project_name: proj1
sp2:
type: snowpark
defaults:
project_name: default_proj
# Conflicts: # src/snowflake/cli/_plugins/snowpark/common.py
Pre-review checklist
Changes description
Creating V2.0 project definition for Snowpark entities