Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNOW-1335071: add method DataFrame.transform #1400

Open
ChuliangXiao opened this issue Apr 18, 2024 · 1 comment
Open

SNOW-1335071: add method DataFrame.transform #1400

ChuliangXiao opened this issue Apr 18, 2024 · 1 comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team

Comments

@ChuliangXiao
Copy link
Contributor

ChuliangXiao commented Apr 18, 2024

Apply a function/callable to Snowpark DF

What is the current behavior?

df = func1(df)
df = func2(df)
# or
df = func2(func1(df))

What is the desired behavior?

df = df.transform(func1).transform(func2)

How would this improve snowflake-snowpark-python?

Make Snowpark code more consistent

References, Other Background

.transform() is available in pySpark
https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.DataFrame.transform.html

@ChuliangXiao ChuliangXiao added the feature New feature or request label Apr 18, 2024
@github-actions github-actions bot changed the title add method DataFrame.transform SNOW-1335071: add method DataFrame.transform Apr 18, 2024
@sfc-gh-aling
Copy link
Contributor

thanks for your feedback! we will look into supporting this transform function call.
cc: @sfc-gh-yixie @sfc-gh-jdu @sfc-gh-aalam this is for feature parity with pyspark.

@sfc-gh-dszmolka sfc-gh-dszmolka added the status-triage_done Initial triage done, will be further handled by the driver team label Apr 29, 2024
sfc-gh-mvashishtha added a commit that referenced this issue May 4, 2024
)

Currently the template says "What GitHub issue is this PR addressing",
but we only want Jira numbers.

We should always add a Snowflake JIRA number, even if a GitHub issue
exists.

If a user creates a GitHub issue and wants to reference it in a PR, a
bot will create a SNOW-jira ticket for them, as in #1400.

---------
Signed-off-by: sfc-gh-mvashishtha <mahesh.vashishtha@snowflake.com>
sfc-gh-vbudati pushed a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati pushed a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

Move Snowpark pandas modin import changelog to 1.15
sfc-gh-vbudati added a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1345607

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Fix README/md pip install command.
sfc-gh-vbudati added a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1345607

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Fix README/md pip install command.
sfc-gh-rdurrani added a commit that referenced this issue May 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1357748

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR updates read_snowflake to use string matching for the order by
warning.
sfc-gh-nkrishna added a commit that referenced this issue May 8, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR adds double quotes to the pip install message users see when
installing Modin to accomodate for zsh.

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-joshi added a commit that referenced this issue May 8, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1370365

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR avoids UNION ALL operations for computing quantiles over
1-column datasets. This optimization has significant implications for
`pd.qcut`, which frequently computes a large number of quantiles and
previously would had extremely high union counts in queries.
In particular, `test_qcut.py::test_qcut_two_columns` goes from 90 unions
-> 0 unions, 34 joins -> 14 joins; and
`series/test_quantile.py::test_quantile_large' goes from ~80 queries ->
6 queries.
sfc-gh-vbudati added a commit that referenced this issue May 9, 2024
)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1348621

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   Bug:
```py
# Performing the following loc operation would fail.
>>> df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)
>>> df2 = df.copy()
>>> df2.loc["a", "three"] = 1.0

# However, when you take a closer look the issue is not with loc set but with the way the DataFrame was being generated. Ignore the numbers inside since these are randomly generated.
>>> df2
        one       two     three               # <-- notice how there are two rows of column names instead of one row
        one       two     three
a -0.238524  0.900504       NaN
b -1.603478 -0.715938  0.786343
c -0.603704 -1.046051  0.371374
d       NaN -0.019357  0.353722
NotImplementedError: loc set for multiindex is not yet implemented

# Expected result:
>>> df2
        one       two     three              # <-- only one row
a  0.357285 -1.225845       NaN
b  0.709229  1.120475  1.551948
c -2.173472  0.682472 -0.738533
d       NaN -1.211516  0.222008
```

This is a bug in concat on axis=1 when all the objects are Series.
sfc-gh-azhan added a commit that referenced this issue May 9, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1357611 Fix all quarantined pandas tests for 8.18

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

remove all skipped tests from SNOW-1358681
sfc-gh-azhan added a commit that referenced this issue May 9, 2024
…lready (#1547)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1348919

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This bug has been fixed in
#1533
sfc-gh-nkrishna added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1296779, SNOW-1254730

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR removes unused sproc fallback code from Snowpark pandas, not
that all APIs using fallback have been replaced with
NotImplementedError.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
sfc-gh-azhan added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1347052 Update pandas API PuPr warning messages

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

- change the words from "private preview" to "public preview"

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-mvashishtha added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1374343

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

    CI  times for this PR:
-
[GCP](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489292?pr=1553):
25 minutes
-
[AWS](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800489826?pr=1553):
20 minutes
-
[Azure](https://github.com/snowflakedb/snowpark-python/actions/runs/9025175335/job/24800490421?pr=1553):
22 minutes

These are a bit better than the usual times, but it's hard to tell
because CI time is so variable (probably dependent on how many different
jobs are running at the same time-- see SNOW-1347210).

Let's make this commit and see whether the warehouses can handle the
extra load without too much queueing. We have set
`MAX_CONCURRENCY_LEVEL` on the warehouse for each cloud provider.
sfc-gh-nkumar added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1375037

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.
Rewrite parts of qcut implementation to avoid joins completely. This
results in significant performance improvement of qcut.
With this change overall runtime of benchmark notebook reduced from 297
seconds to 66 seconds and number of sql queries reduced from 521 to 91.
sfc-gh-vbudati added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1361200

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.
- The bug here is caused due to the implementation of
`create_udtf_for_groupby_apply`. Groupby apply/transform use this method
to create a UDTF.
- When multiple groupby apply/transform operations are performed on the
same DataFrame, this method is called multiple times. However, it uses a
fixed name for the column labels used to create the OrderedDataFrame
used with the UDTF. This is what causes the issue - "ambiguous column
name 'ROW_POSITION_WITHIN_GROUP'".
- "'ROW_POSITION_WITHIN_GROUP'" is a column label created and used by
`create_udtf_for_groupby_apply`.
- Similarly, this issue occurs with the "'ORIGINAL_ROW_POSITION'"column
label.
- To solve this issue, I appended a random number at the end of these
column labels to prevent the collision and eradicate the error.
sfc-gh-nkrishna added a commit that referenced this issue May 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1374306

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

   This PR adds a docstring for resample.

---------

Signed-off-by: Naren Krishna <naren.krishna@snowflake.com>
Co-authored-by: Jonathan Shi <149419494+sfc-gh-joshi@users.noreply.github.com>
sfc-gh-stan added a commit that referenced this issue May 10, 2024
…ock tests (#1561)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1063738

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR changes `tox -e local` to run all tests in `tests/integ`,
`tests/unit` and `tests/mock` (previously mock_unit) against Local
Testing (except for modin tests specified in
`SNOWFLAKE_PYTEST_IGNORE_MODIN_CMD`).
sfc-gh-vbudati added a commit that referenced this issue May 13, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1326280

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Update the links in the documentation for `to_snowpark_pandas` from the
LIMITED-ACCESS version to public! These links will not work until all of
the Snowpark pandas documentation is public.
sfc-gh-aalam added a commit that referenced this issue May 13, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
sfc-gh-joshi added a commit that referenced this issue May 13, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1373790

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

This PR prunes some duplicated + unsupported methods from Snowpark
pandas API docs, and adds more comprehensive docstrings (mostly copied
from pandas) for other methods. This PR does not add any doctests, nor
does it change any meaningful code.

All changes are listed below:
_De-duplicated listings_:
- Series
  - is_unique
  - duplicated
- DataFrame
  - round
Note that Series/DF head and tail are deliberately left duplicated; this
matches pandas documentation, as they are mentioned under both the
"Indexing, iteration" and "Reindexing / selection" headings.

_Removed unimplemented listing_:
- Series
  - kurtosis
- SeriesGroupBy
  - apply
- Resampler
  - groups
  - indices
  - get_group
  - apply
  - aggregate
  - transform
  - bfill
  - nearest
  - fillna
  - asfreq
  - nunique
  - first
  - last
  - interpolate
  - ohlc
  - pad
  - pipe
  - prod
  - quantile
  - sem
  - size
- Rolling
  - aggregate
  - apply
  - corr
  - count
  - cov
  - kurt
  - median
  - quantile
  - rank
  - sem
  - skew

_Added implemented listing_:
I did not comprehensively look for implemented methods that were not
listed, these were just a few methods that I noticed in the course of
checking other APIs.
- SeriesGroupBy
  - head
  - idxmax
  - idxmin
  - nunique
  - tail

_Improved documentation_:
- pd
- qcut (I'm not sure why it wasn't inheriting pandas docs, but we should
override them anyway since we don't implement all parameters)
- BasePandasDataset
  - convert_dtypes
  - rename_axis
  - values
  - ffill/pad
- Series
  - name
  - empty
  - hasnans
  - ndim
  - shape
  - rename_axis
  - quantile
- DataFrame
  - empty
  - quantile
  - select_dtypes
- GroupBy
  - std
  - var
  - rank
  - nunique
  - quantile
  - __iter__

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-lmukhopadhyay added a commit that referenced this issue May 14, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1373899

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Updates notebook testing workflow with increased cell timeout, and adds
SnowparkPandasAPIDemo.ipynb notebook from customer demo and
SnowflakeChainTesting.ipynb which was previously blocked.

---------

Signed-off-by: Labanya Mukhopadhyay <labanya.mukhopadhyay@snowflake.com>
sfc-gh-oplaton added a commit that referenced this issue May 14, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-0

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [x] I am adding a new dependency

3. Please describe how your code solves the related issue.

Update `ast_pb2.py` (already present in the repository).
Add the `setuptools` dependencies required for development.
Include the module path for `ast_pb2.py` in the manifest, so that the
file makes it into the Snowpark wheel.
sfc-gh-vbudati added a commit that referenced this issue May 14, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1375263

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency

3. Please describe how your code solves the related issue.

Updating the changelog to reflect what was actually released with
v1.15.0a1 and what is new.
sfc-gh-jdu added a commit that referenced this issue Oct 1, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   The parameter value is changed.
sfc-gh-yzou added a commit that referenced this issue Oct 2, 2024
)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

 SNOW-1566362

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
1) Move sql counter check infrastructure from modin to snowpark in
general
2) applies sql counter check to test_cte.py
sfc-gh-yzou added a commit that referenced this issue Oct 3, 2024
… in query generator (#2387)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1706295

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
The snowflake plan query generation is not used in actual generation,
but only used by the testing. This pr does the following:
1) remove the SnowflakePlan overwrite in the code and test the query
generator in a different way
2) add sql counter check for the test
sfc-gh-yzou added a commit that referenced this issue Oct 3, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1678113]

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
In the reported case, the unstack calls pivot table underneath, and
customer end with about 2000 columns after pivot. The pivot it's self
took about 1~2 seconds to finish, but the after pivot it start looping
over all columns and calling append_columns in each loop here
[snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py
at 272e4e1 ·
snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/272e4e1ee5da84f8ac0abfefda95aab3b0bf4d7e/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py#L704).
The append columns eventually calls select with all existing projected
columns, and the a check is performed on each column here
[snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py
at main ·
snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py#L616)

our profiling shows that each check too about 0.015s, and 2000 check is
about 30seconds, and the snowpark select took about 0.5 seconds since it
need to perform sql simplification. Since there is an outer loop of
2000, overall it could take about (30.5*2000)s , which is close to 16 h.

In order to handle the issue, we did the following:
1) use "*" for append_columns to avoid checking for each columns
2) instead of calling append_column in each loop, try to get all columns
to append and only call append_columns once.
with manual testing, the customer case now took about 8.002492904663086
s to finish the unstack reported

TODO:
add this to our performance benchmark
https://snowflakecomputing.atlassian.net/browse/SNOW-1706311
sfc-gh-jdu added a commit that referenced this issue Oct 4, 2024
…ilation stage is applied (#2385)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1703599

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

We shouldn't call session._table_exists inside resolve, but we can call
it before resolve.
sfc-gh-helmeleegy added a commit that referenced this issue Oct 4, 2024
… Series.tz_localize (#2398)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1677892, SNOW-1677897

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for DataFrame.tz_localize and Series.tz_localize.
sfc-gh-helmeleegy added a commit that referenced this issue Oct 4, 2024
…Series.tz_convert (#2399)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1677888, SNOW-1677890

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for DataFrame.tz_convert and Series.tz_convert.
sfc-gh-nkrishna pushed a commit that referenced this issue Oct 4, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-yzou added a commit that referenced this issue Oct 7, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1566363

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

  1. add status for number of selectStatement with complexity merged
2.add status for number of cte created during repeated subquery
elimination
sfc-gh-rdurrani added a commit that referenced this issue Oct 7, 2024
…value (#2213)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1649172

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

When doing `df.loc[x] = series`, an error occurs because series does not
have the same number of columns as the dataframe being set. Instead, the
Series should be transposed and set, regardless of whether it has an
equal number of rows as the dataframe has columns.

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-jdu added a commit that referenced this issue Oct 7, 2024
…is already closed (#2409)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1727163

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

It's possible that the session is already closed before garbage
collection kicks in, where we should avoid sending drop table sql and
eliminate the warning
sfc-gh-rdurrani added a commit that referenced this issue Oct 8, 2024
…read_snowflake` tests (#2408)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1726720

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Use `session.sql.to_pandas()` instead of
`native_pd.DataFrame(session.sql.collect)` to generate expected
DataFrames when testing `read_snowflake`, so that if the expected
DataFrame is empty, but has metadata, e.g. columns, that data is passed
on to the expected DataFrame.
sfc-gh-jdu added a commit that referenced this issue Oct 8, 2024
…atement instead of SnowflakeTable (#2411)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1727512

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

It should be the same as session.table(...).select_statement. Otherwise,
we will cache metadata on wrong source_plan in the future.
sfc-gh-azhan added a commit that referenced this issue Oct 8, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   SNOW-1690717

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

This is the first PR to support Snowpark Python functions in pandas
apply. It only introduce `sin` as the first example.
sfc-gh-helmeleegy added a commit that referenced this issue Oct 8, 2024
…na and SeriesGroupBy.fillna (#2417)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1728471

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Fix changelog entry placement for DataFrameGroupBy.fillna and
SeriesGroupBy.fillna.
sfc-gh-yzou added a commit that referenced this issue Oct 10, 2024
… in query generator (#2387)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1706295

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
The snowflake plan query generation is not used in actual generation,
but only used by the testing. This pr does the following:
1) remove the SnowflakePlan overwrite in the code and test the query
generator in a different way
2) add sql counter check for the test
sfc-gh-yzou added a commit that referenced this issue Oct 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1678113]

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.
In the reported case, the unstack calls pivot table underneath, and
customer end with about 2000 columns after pivot. The pivot it's self
took about 1~2 seconds to finish, but the after pivot it start looping
over all columns and calling append_columns in each loop here
[snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py
at 272e4e1 ·
snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/272e4e1ee5da84f8ac0abfefda95aab3b0bf4d7e/src/snowflake/snowpark/modin/plugin/_internal/pivot_utils.py#L704).
The append columns eventually calls select with all existing projected
columns, and the a check is performed on each column here
[snowpark-python/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py
at main ·
snowflakedb/snowpark-python](https://github.com/snowflakedb/snowpark-python/blob/main/src/snowflake/snowpark/modin/plugin/_internal/ordered_dataframe.py#L616)

our profiling shows that each check too about 0.015s, and 2000 check is
about 30seconds, and the snowpark select took about 0.5 seconds since it
need to perform sql simplification. Since there is an outer loop of
2000, overall it could take about (30.5*2000)s , which is close to 16 h.

In order to handle the issue, we did the following:
1) use "*" for append_columns to avoid checking for each columns
2) instead of calling append_column in each loop, try to get all columns
to append and only call append_columns once.
with manual testing, the customer case now took about 8.002492904663086
s to finish the unstack reported

TODO:
add this to our performance benchmark
https://snowflakecomputing.atlassian.net/browse/SNOW-1706311
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
…ilation stage is applied (#2385)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1703599

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

We shouldn't call session._table_exists inside resolve, but we can call
it before resolve.
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
… Series.tz_localize (#2398)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1677892, SNOW-1677897

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for DataFrame.tz_localize and Series.tz_localize.
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
…Series.tz_convert (#2399)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1677888, SNOW-1677890

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Add support for DataFrame.tz_convert and Series.tz_convert.
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-yzou added a commit that referenced this issue Oct 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1566363

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

  1. add status for number of selectStatement with complexity merged
2.add status for number of cte created during repeated subquery
elimination
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
…value (#2213)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1649172

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

When doing `df.loc[x] = series`, an error occurs because series does not
have the same number of columns as the dataframe being set. Instead, the
Series should be transposed and set, regardless of whether it has an
equal number of rows as the dataframe has columns.

---------

Co-authored-by: Varnika Budati <varnika.budati@snowflake.com>
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
…is already closed (#2409)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1727163

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

It's possible that the session is already closed before garbage
collection kicks in, where we should avoid sending drop table sql and
eliminate the warning
sfc-gh-yzou pushed a commit that referenced this issue Oct 10, 2024
…read_snowflake` tests (#2408)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1726720

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Use `session.sql.to_pandas()` instead of
`native_pd.DataFrame(session.sql.collect)` to generate expected
DataFrames when testing `read_snowflake`, so that if the expected
DataFrame is empty, but has metadata, e.g. columns, that data is passed
on to the expected DataFrame.
sfc-gh-yzou added a commit that referenced this issue Oct 10, 2024
…2403)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

SNOW-1708573

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

1) Add sql_counter check for test points in large query breakdown
2) Refine the sql_counter check for test_cte, and allow it to run on env
without pandas
sfc-gh-oplaton added a commit that referenced this issue Oct 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1491303, SNOW-1621201, SNOW-1629946, SNOW-1621208

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Pick up the updated model from the [corresponding server-side
work](snowflakedb/snowflake#217938).

Support `DataFrame.join_table_function`.

Support `DataFrame.select` with `TableFunctionCall` parameters.

Switch `functions.explode`, `explode_outer`, `flatten` and
`functions.table_function` to look like built-in function. We don't use
the fact that they're table functions anywhere yet (as opposed to scalar
functions). We can reintroduce the distinction easily, if necessary.

Simplify `Session.table_function` by taking advantage of indirect
function calls.

Support `TableFunctionCall` and its member functions.

Introduce `build_call_table_function_apply` and
`build_indirect_table_fn_apply` for the specialized cases that require
the corresponding entities.
Remove `build_session_table_fn_apply` and `build_table_fn_apply`.
Update type annotations.

There are instances that require accumulating a `proto.Expr` (similar to
much of the column and built-in function functionality) until the root
value can be assigned in the context of an `AstBatch`. Introduce a
`build_intermediate_stmt` which takes any Python object that has an
`_ast` attribute and assigns the value in a statement if necessary.

Random thought: The IR client captures all built-in function calls by
unqualified name. We should fix this at some point.
sfc-gh-nkumar added a commit that referenced this issue Oct 10, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-NNNNNNN

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.

---------

Co-authored-by: Adam Ling <adam.ling@snowflake.com>
sfc-gh-lninobrijaldo pushed a commit that referenced this issue Oct 11, 2024
<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   ADHOC: Fix a misspelling

2. Fill out the following pre-review checklist:

- [ ] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

   Just notice a misspelling of "ambiguous" , fixing it
sfc-gh-rsureshbabu added a commit that referenced this issue Oct 11, 2024
…trace` API (#2418)

<!---
Please answer these questions before creating your pull request. Thanks!
--->

1. Which Jira issue is this PR addressing? Make sure that there is an
accompanying issue to your PR.

   <!---
   In this section, please add a Snowflake Jira issue number.
   
Note that if a corresponding GitHub issue exists, you should still
include
   the Snowflake Jira issue number. For example, for GitHub issue
#1400, you should
   add "SNOW-1335071" here.
    --->

   Fixes SNOW-1626173

2. Fill out the following pre-review checklist:

- [x] I am adding a new automated test(s) to verify correctness of my
new code
- [ ] If this test skips Local Testing mode, I'm requesting review from
@snowflakedb/local-testing
   - [ ] I am adding new logging messages
   - [ ] I am adding a new telemetry message
   - [ ] I am adding new credentials
   - [ ] I am adding a new dependency
- [ ] If this is a new feature/behavior, I'm adding the Local Testing
parity changes.

3. Please describe how your code solves the related issue.

Please write a short description of how your code change solves the
related issue.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request status-triage_done Initial triage done, will be further handled by the driver team
Projects
None yet
Development

No branches or pull requests

3 participants