Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1342243: Call getOrCreate at package import time so that sproc user does not need to #1910

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sfc-gh-helmeleegy
Copy link
Contributor

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-1342243

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
  3. Please describe how your code solves the related issue.

    Call getOrCreate at package import time so that sproc user does not need to.

Copy link
Contributor

@sfc-gh-jrose sfc-gh-jrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change seems dangerous to me. Here are some scenarios that might break:

  • What happens when a user does not have credentials configured before importing this package.
  • What happens if the user ignores the default session and creates their own, then calls something that references get_active_session?
  • What happens if a user imports this package while using local testing mode?


# Call getOrCreate() at import time on behalf of user so that in a stored procedure,
# the user does not need to do it explicitly.
snowflake.snowpark.Session.SessionBuilder().getOrCreate()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-helmeleegy for snowpark pandas, we by design to ask the user to explicitly set up the session to stay consistent with sowpark usage, and we do not require customer to provide connection files , so getOrCreate() could fail if user does not have a connection file.

With stored sproc, i thought the session object is automatically created, and snowaprk pandas have the capablity to get the current active session for usage. For some reason, that seems not the case. @sfc-gh-sfan why it is happening in such way?

Copy link
Collaborator

@sfc-gh-yzou sfc-gh-yzou Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually @sfc-gh-sfan I have a question, do you have a code pointer on how the session creation is handled in stored sproc today? One thing i am not sure here is if that getOrCreate will always work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After making the session argument optional in the proc's handler, the session object was not being created anymore in the case when the session argument was not specified. That's because it was not being passed along with the other arguments to the handler anymore.

@sfc-gh-zyao has a draft PR for pre-creating session objects even for session-less handlers, which seems promising. Until this becomes available, we will need to either pass the session argument, or create it at the sproc side using getOrCreate. Currently, it's left to the user to create it inside the body of the handler.

This PR attempts to automatically create the session object at package import time (only for sprocs) so that users do not need to worry about creating it themselves.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, i think the direction @sfc-gh-zyao is the right direction. Our original intention is to make the argument optional, not making the session object creation optional. I was talking to @sfc-gh-zyao , he mentioned he will work on this to get the fix in soon. I think we can just wait for this fix, instead of adding the workaround here, since it also took long time for the package to be released in stored sproc also.

@sfc-gh-helmeleegy sfc-gh-helmeleegy marked this pull request as draft July 12, 2024 20:55
CHANGELOG.md Outdated Show resolved Hide resolved
@@ -239,6 +239,20 @@ def sp_pow(session_, x, y):
assert pow_sp(2, 10, session=session) == 1024


def test_basic_snowpark_pandas_stored_procedure(session):
def return1(session_):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the goal was to test that session is not required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, the Snowpark python API would still require having a session argument for the sproc's handler. I updated the PR to allow for optional session argument even when using the Snowpark python API and added tests for both cases (with and without the session argument).

@sfc-gh-sfan would be great if you can confirm that the update to Snowpark python does make sense.

# Call getOrCreate() at import time on behalf of user so that in a stored procedure,
# the user does not need to do it explicitly.
if is_in_stored_procedure():
snowflake.snowpark.Session.SessionBuilder().getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's possible to create multiple sessions in a stored procedure, then there's an edge case where the user creates multiple Snowpark sessions in the stored procedure and then imports snowpark pandas. Before this change, that would work, but now, they'll get an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the need for this. Inside a python stored proc environment, we already have access to a session object. Why can't we use that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-aalam As pointed out in another thread here. Since we made the session argument optional (at the server/xp side), the session object was not being created anymore unless it was explicitly specified in the sproc's handler. Otherwise, it won't be created. This draft PR should make sure it's created in both cases.

@sfc-gh-mvashishtha I guess if we want, we can try-catch this scenario and suppress that error if it was raised - to make sure we keep the current behavior?

# Call getOrCreate() at import time on behalf of user so that in a stored procedure,
# the user does not need to do it explicitly.
if is_in_stored_procedure():
snowflake.snowpark.Session.SessionBuilder().getOrCreate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a TODO to stop doing this once Snowpark always creates the session automatically, with a link to the jira ticket for that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware that snowpark has plan to create the session automatically. cc @sfc-gh-jdu do we have any plan for this?

The problem here is more for stored sproc, in stored sproc, user is suppose to use the session created by stored sproc which uses specific connection configurations, because the one used by users are not usable within stored sproc.
Originally, we thought stored sproc will always create the session, but seems it is only creating the session when the session parameter passed. The fix/workaround here seems make sure if snowpark pandas is used in stored sproc, make sure we create the session.
However, since session parameter is a required parameter for stored sproc before, it seems it is almost always creating a session object. I think we can always create a session object in snowpark python stored sproc, i don't see a reason why user will not use a session object with snowpark python stored sproc. cc @sfc-gh-sfan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sfc-gh-yzou Please refer to the other conversation threads.

@sfc-gh-helmeleegy
Copy link
Contributor Author

sfc-gh-helmeleegy commented Jul 15, 2024

This change seems dangerous to me. Here are some scenarios that might break:

  • What happens when a user does not have credentials configured before importing this package.
  • What happens if the user ignores the default session and creates their own, then calls something that references get_active_session?
  • What happens if a user imports this package while using local testing mode?

These are all good points @sfc-gh-jrose. Do you think that they will be addressed after this other PR is merged? In this case, calling getOrCreate will actually get rather than create. Also, is there a way to detect if we are in a local testing mode, such that we can skip this workaround?

@sfc-gh-jrose
Copy link
Contributor

sfc-gh-jrose commented Jul 16, 2024

These are all good points @sfc-gh-jrose. Do you think that they will be addressed after this other PR is merged? In this case, calling getOrCreate will actually get rather than create. Also, is there a way to detect if we are in a local testing mode, such that we can skip this workaround?

With the code now gated by is_in_stored_procedure I don't think local testing is a concern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants