-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNOW-1335071: add method DataFrame.transform #1400
Labels
feature
New feature or request
status-triage_done
Initial triage done, will be further handled by the driver team
Comments
github-actions
bot
changed the title
add method DataFrame.transform
SNOW-1335071: add method DataFrame.transform
Apr 18, 2024
thanks for your feedback! we will look into supporting this |
sfc-gh-dszmolka
added
the
status-triage_done
Initial triage done, will be further handled by the driver team
label
Apr 29, 2024
5 tasks
sfc-gh-mvashishtha
added a commit
that referenced
this issue
May 4, 2024
) Currently the template says "What GitHub issue is this PR addressing", but we only want Jira numbers. We should always add a Snowflake JIRA number, even if a GitHub issue exists. If a user creates a GitHub issue and wants to reference it in a PR, a bot will create a SNOW-jira ticket for them, as in #1400. --------- Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
sfc-gh-vbudati
pushed a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati
pushed a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati
added a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1345607 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Fix README/md pip install command.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1345607 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Fix README/md pip install command.
sfc-gh-rdurrani
added a commit
that referenced
this issue
May 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1357748 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR updates read_snowflake to use string matching for the order by warning.
sfc-gh-nkrishna
added a commit
that referenced
this issue
May 8, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR adds double quotes to the pip install message users see when installing Modin to accomodate for zsh. Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-joshi
added a commit
that referenced
this issue
May 8, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1370365 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR avoids UNION ALL operations for computing quantiles over 1-column datasets. This optimization has significant implications for `pd.qcut`, which frequently computes a large number of quantiles and previously would had extremely high union counts in queries. In particular, `test_qcut.py::test_qcut_two_columns` goes from 90 unions -> 0 unions, 34 joins -> 14 joins; and `series/test_quantile.py::test_quantile_large' goes from ~80 queries -> 6 queries.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 9, 2024
) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1348621 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Bug: ```py # Performing the following loc operation would fail. >>> df = pd.DataFrame( { "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]), "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]), "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]), } ) >>> df2 = df.copy() >>> df2.loc["a", "three"] = 1.0 # However, when you take a closer look the issue is not with loc set but with the way the DataFrame was being generated. Ignore the numbers inside since these are randomly generated. >>> df2 one two three # <-- notice how there are two rows of column names instead of one row one two three a -0.238524 0.900504 NaN b -1.603478 -0.715938 0.786343 c -0.603704 -1.046051 0.371374 d NaN -0.019357 0.353722 NotImplementedError: loc set for multiindex is not yet implemented # Expected result: >>> df2 one two three # <-- only one row a 0.357285 -1.225845 NaN b 0.709229 1.120475 1.551948 c -2.173472 0.682472 -0.738533 d NaN -1.211516 0.222008 ``` This is a bug in concat on axis=1 when all the objects are Series.
sfc-gh-azhan
added a commit
that referenced
this issue
May 9, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1357611 Fix all quarantined pandas tests for 8.18 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. remove all skipped tests from SNOW-1358681
sfc-gh-azhan
added a commit
that referenced
this issue
May 9, 2024
…lready (#1547) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1348919 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This bug has been fixed in #1533
sfc-gh-nkrishna
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1296779, SNOW-1254730 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR removes unused sproc fallback code from Snowpark pandas, not that all APIs using fallback have been replaced with NotImplementedError. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-azhan
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1347052 Update pandas API PuPr warning messages 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. - change the words from "private preview" to "public preview" --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-mvashishtha
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1374343 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. CI times for this PR: - [GCP](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489292?pr=1553): 25 minutes - [AWS](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489826?pr=1553): 20 minutes - [Azure](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800490421?pr=1553): 22 minutes These are a bit better than the usual times, but it's hard to tell because CI time is so variable (probably dependent on how many different jobs are running at the same time-- see SNOW-1347210). Let's make this commit and see whether the warehouses can handle the extra load without too much queueing. We have set `MAX_CONCURRENCY_LEVEL` on the warehouse for each cloud provider.
sfc-gh-nkumar
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1375037 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Rewrite parts of qcut implementation to avoid joins completely. This results in significant performance improvement of qcut. With this change overall runtime of benchmark notebook reduced from 297 seconds to 66 seconds and number of sql queries reduced from 521 to 91.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1361200 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. - The bug here is caused due to the implementation of `create_udtf_for_groupby_apply`. Groupby apply/transform use this method to create a UDTF. - When multiple groupby apply/transform operations are performed on the same DataFrame, this method is called multiple times. However, it uses a fixed name for the column labels used to create the OrderedDataFrame used with the UDTF. This is what causes the issue - "ambiguous column name 'ROW_POSITION_WITHIN_GROUP'". - "'ROW_POSITION_WITHIN_GROUP'" is a column label created and used by `create_udtf_for_groupby_apply`. - Similarly, this issue occurs with the "'ORIGINAL_ROW_POSITION'"column label. - To solve this issue, I appended a random number at the end of these column labels to prevent the collision and eradicate the error.
sfc-gh-nkrishna
added a commit
that referenced
this issue
May 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1374306 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR adds a docstring for resample. --------- Signed-off-by: Naren Krishna <naren.krishna@snowflake.com> Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
sfc-gh-stan
added a commit
that referenced
this issue
May 10, 2024
…ock tests (#1561) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1063738 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR changes `tox -e local` to run all tests in `tests/integ`, `tests/unit` and `tests/mock` (previously mock_unit) against Local Testing (except for modin tests specified in `SNOWFLAKE_PYTEST_IGNORE_MODIN_CMD`).
sfc-gh-vbudati
added a commit
that referenced
this issue
May 13, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1326280 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Update the links in the documentation for `to_snowpark_pandas` from the LIMITED-ACCESS version to public! These links will not work until all of the Snowpark pandas documentation is public.
sfc-gh-aalam
added a commit
that referenced
this issue
May 13, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
sfc-gh-joshi
added a commit
that referenced
this issue
May 13, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1373790 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. This PR prunes some duplicated + unsupported methods from Snowpark pandas API docs, and adds more comprehensive docstrings (mostly copied from pandas) for other methods. This PR does not add any doctests, nor does it change any meaningful code. All changes are listed below: _De-duplicated listings_: - Series - is_unique - duplicated - DataFrame - round Note that Series/DF head and tail are deliberately left duplicated; this matches pandas documentation, as they are mentioned under both the "Indexing, iteration" and "Reindexing / selection" headings. _Removed unimplemented listing_: - Series - kurtosis - SeriesGroupBy - apply - Resampler - groups - indices - get_group - apply - aggregate - transform - bfill - nearest - fillna - asfreq - nunique - first - last - interpolate - ohlc - pad - pipe - prod - quantile - sem - size - Rolling - aggregate - apply - corr - count - cov - kurt - median - quantile - rank - sem - skew _Added implemented listing_: I did not comprehensively look for implemented methods that were not listed, these were just a few methods that I noticed in the course of checking other APIs. - SeriesGroupBy - head - idxmax - idxmin - nunique - tail _Improved documentation_: - pd - qcut (I'm not sure why it wasn't inheriting pandas docs, but we should override them anyway since we don't implement all parameters) - BasePandasDataset - convert_dtypes - rename_axis - values - ffill/pad - Series - name - empty - hasnans - ndim - shape - rename_axis - quantile - DataFrame - empty - quantile - select_dtypes - GroupBy - std - var - rank - nunique - quantile - __iter__ --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-lmukhopadhyay
added a commit
that referenced
this issue
May 14, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1373899 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Updates notebook testing workflow with increased cell timeout, and adds SnowparkPandasAPIDemo.ipynb notebook from customer demo and SnowflakeChainTesting.ipynb which was previously blocked. --------- Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
sfc-gh-oplaton
added a commit
that referenced
this issue
May 14, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-0 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [x] I am adding a new dependency 3. Please describe how your code solves the related issue. Update `ast_pb2.py` (already present in the repository). Add the `setuptools` dependencies required for development. Include the module path for `ast_pb2.py` in the manifest, so that the file makes it into the Snowpark wheel.
sfc-gh-vbudati
added a commit
that referenced
this issue
May 14, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1375263 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency 3. Please describe how your code solves the related issue. Updating the changelog to reflect what was actually released with v1.15.0a1 and what is new.
sfc-gh-jdu
added a commit
that referenced
this issue
Oct 1, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. The parameter value is changed.
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 2, 2024
) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1566362 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. 1) Move sql counter check infrastructure from modin to snowpark in general 2) applies sql counter check to test_cte.py
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 3, 2024
… in query generator (#2387) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1706295 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. The snowflake plan query generation is not used in actual generation, but only used by the testing. This pr does the following: 1) remove the SnowflakePlan overwrite in the code and test the query generator in a different way 2) add sql counter check for the test
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 3, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1678113] 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. In the reported case, the unstack calls pivot table underneath, and customer end with about 2000 columns after pivot. The pivot it's self took about 1~2 seconds to finish, but the after pivot it start looping over all columns and calling append_columns in each loop here [snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py at 272e4e1 · snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/272e4e1ee5da84f8ac0abfefda95aab3b0bf4d7e/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py#L704). The append columns eventually calls select with all existing projected columns, and the a check is performed on each column here [snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py at main · snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py#L616) our profiling shows that each check too about 0.015s, and 2000 check is about 30seconds, and the snowpark select took about 0.5 seconds since it need to perform sql simplification. Since there is an outer loop of 2000, overall it could take about (30.5*2000)s , which is close to 16 h. In order to handle the issue, we did the following: 1) use "*" for append_columns to avoid checking for each columns 2) instead of calling append_column in each loop, try to get all columns to append and only call append_columns once. with manual testing, the customer case now took about 8.002492904663086 s to finish the unstack reported TODO: add this to our performance benchmark https://snowflakecomputing.atlassian.net/browse/SNOW-1706311
sfc-gh-jdu
added a commit
that referenced
this issue
Oct 4, 2024
…ilation stage is applied (#2385) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1703599 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. We shouldn't call session._table_exists inside resolve, but we can call it before resolve.
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Oct 4, 2024
… Series.tz_localize (#2398) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1677892, SNOW-1677897 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for DataFrame.tz_localize and Series.tz_localize.
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Oct 4, 2024
…Series.tz_convert (#2399) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1677888, SNOW-1677890 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for DataFrame.tz_convert and Series.tz_convert.
sfc-gh-nkrishna
pushed a commit
that referenced
this issue
Oct 4, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 7, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1566363 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. 1. add status for number of selectStatement with complexity merged 2.add status for number of cte created during repeated subquery elimination
sfc-gh-rdurrani
added a commit
that referenced
this issue
Oct 7, 2024
…value (#2213) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1649172 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. When doing `df.loc[x] = series`, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns. --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-jdu
added a commit
that referenced
this issue
Oct 7, 2024
…is already closed (#2409) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1727163 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. It's possible that the session is already closed before garbage collection kicks in, where we should avoid sending drop table sql and eliminate the warning
sfc-gh-rdurrani
added a commit
that referenced
this issue
Oct 8, 2024
…read_snowflake` tests (#2408) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1726720 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Use `session.sql.to_pandas()` instead of `native_pd.DataFrame(session.sql.collect)` to generate expected DataFrames when testing `read_snowflake`, so that if the expected DataFrame is empty, but has metadata, e.g. columns, that data is passed on to the expected DataFrame.
sfc-gh-jdu
added a commit
that referenced
this issue
Oct 8, 2024
…atement instead of SnowflakeTable (#2411) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1727512 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. It should be the same as session.table(...).select_statement. Otherwise, we will cache metadata on wrong source_plan in the future.
sfc-gh-azhan
added a commit
that referenced
this issue
Oct 8, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1690717 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. This is the first PR to support Snowpark Python functions in pandas apply. It only introduce `sin` as the first example.
sfc-gh-helmeleegy
added a commit
that referenced
this issue
Oct 8, 2024
…na and SeriesGroupBy.fillna (#2417) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1728471 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Fix changelog entry placement for DataFrameGroupBy.fillna and SeriesGroupBy.fillna.
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 10, 2024
… in query generator (#2387) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1706295 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. The snowflake plan query generation is not used in actual generation, but only used by the testing. This pr does the following: 1) remove the SnowflakePlan overwrite in the code and test the query generator in a different way 2) add sql counter check for the test
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1678113] 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. In the reported case, the unstack calls pivot table underneath, and customer end with about 2000 columns after pivot. The pivot it's self took about 1~2 seconds to finish, but the after pivot it start looping over all columns and calling append_columns in each loop here [snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py at 272e4e1 · snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/272e4e1ee5da84f8ac0abfefda95aab3b0bf4d7e/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py#L704). The append columns eventually calls select with all existing projected columns, and the a check is performed on each column here [snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py at main · snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py#L616) our profiling shows that each check too about 0.015s, and 2000 check is about 30seconds, and the snowpark select took about 0.5 seconds since it need to perform sql simplification. Since there is an outer loop of 2000, overall it could take about (30.5*2000)s , which is close to 16 h. In order to handle the issue, we did the following: 1) use "*" for append_columns to avoid checking for each columns 2) instead of calling append_column in each loop, try to get all columns to append and only call append_columns once. with manual testing, the customer case now took about 8.002492904663086 s to finish the unstack reported TODO: add this to our performance benchmark https://snowflakecomputing.atlassian.net/browse/SNOW-1706311
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
…ilation stage is applied (#2385) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1703599 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. We shouldn't call session._table_exists inside resolve, but we can call it before resolve.
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
… Series.tz_localize (#2398) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1677892, SNOW-1677897 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for DataFrame.tz_localize and Series.tz_localize.
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
…Series.tz_convert (#2399) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1677888, SNOW-1677890 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Add support for DataFrame.tz_convert and Series.tz_convert.
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1566363 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. 1. add status for number of selectStatement with complexity merged 2.add status for number of cte created during repeated subquery elimination
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
…value (#2213) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1649172 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. When doing `df.loc[x] = series`, an error occurs because series does not have the same number of columns as the dataframe being set. Instead, the Series should be transposed and set, regardless of whether it has an equal number of rows as the dataframe has columns. --------- Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
…is already closed (#2409) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1727163 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. It's possible that the session is already closed before garbage collection kicks in, where we should avoid sending drop table sql and eliminate the warning
sfc-gh-yzou
pushed a commit
that referenced
this issue
Oct 10, 2024
…read_snowflake` tests (#2408) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1726720 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Use `session.sql.to_pandas()` instead of `native_pd.DataFrame(session.sql.collect)` to generate expected DataFrames when testing `read_snowflake`, so that if the expected DataFrame is empty, but has metadata, e.g. columns, that data is passed on to the expected DataFrame.
sfc-gh-yzou
added a commit
that referenced
this issue
Oct 10, 2024
…2403) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> SNOW-1708573 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. 1) Add sql_counter check for test points in large query breakdown 2) Refine the sql_counter check for test_cte, and allow it to run on env without pandas
sfc-gh-oplaton
added a commit
that referenced
this issue
Oct 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1491303, SNOW-1621201, SNOW-1629946, SNOW-1621208 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Pick up the updated model from the [corresponding server-side work](snowflakedb/snowflake#217938). Support `DataFrame.join_table_function`. Support `DataFrame.select` with `TableFunctionCall` parameters. Switch `functions.explode`, `explode_outer`, `flatten` and `functions.table_function` to look like built-in function. We don't use the fact that they're table functions anywhere yet (as opposed to scalar functions). We can reintroduce the distinction easily, if necessary. Simplify `Session.table_function` by taking advantage of indirect function calls. Support `TableFunctionCall` and its member functions. Introduce `build_call_table_function_apply` and `build_indirect_table_fn_apply` for the specialized cases that require the corresponding entities. Remove `build_session_table_fn_apply` and `build_table_fn_apply`. Update type annotations. There are instances that require accumulating a `proto.Expr` (similar to much of the column and built-in function functionality) until the root value can be assigned in the context of an `AstBatch`. Introduce a `build_intermediate_stmt` which takes any Python object that has an `_ast` attribute and assigns the value in a statement if necessary. Random thought: The IR client captures all built-in function calls by unqualified name. We should fix this at some point.
sfc-gh-nkumar
added a commit
that referenced
this issue
Oct 10, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-NNNNNNN 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue. --------- Co-authored-by: Adam Ling <adam.ling@snowflake.com>
sfc-gh-lninobrijaldo
pushed a commit
that referenced
this issue
Oct 11, 2024
<!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> ADHOC: Fix a misspelling 2. Fill out the following pre-review checklist: - [ ] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-rsureshbabu
added a commit
that referenced
this issue
Oct 11, 2024
…trace` API (#2418) <!--- Please answer these questions before creating your pull request. Thanks! ---> 1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR. <!--- In this section, please add a Snowflake Jira issue number. Note that if a corresponding GitHub issue exists, you should still include the Snowflake Jira issue number. For example, for GitHub issue #1400, you should add "SNOW-1335071" here. ---> Fixes SNOW-1626173 2. Fill out the following pre-review checklist: - [x] I am adding a new automated test(s) to verify correctness of my new code - [ ] If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing - [ ] I am adding new logging messages - [ ] I am adding a new telemetry message - [ ] I am adding new credentials - [ ] I am adding a new dependency - [ ] If this is a new feature/behavior, I'm adding the Local Testing parity changes. 3. Please describe how your code solves the related issue. Please write a short description of how your code change solves the related issue.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature
New feature or request
status-triage_done
Initial triage done, will be further handled by the driver team
Apply a function/callable to Snowpark DF
What is the current behavior?
What is the desired behavior?
df = df.transform(func1).transform(func2)
How would this improve
snowflake-snowpark-python
?Make Snowpark code more consistent
References, Other Background
.transform()
is available in pySparkhttps://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.DataFrame.transform.html
The text was updated successfully, but these errors were encountered: