Write partitioned Parquet file using to_parquet #23283

ispmarin · 2018-10-22T18:39:41Z

Hi,

I'm trying to write a partitioned Parquet file using the to_parquet function:

df.to_parquet('table_name', engine='pyarrow', partition_cols = ['partone', 'parttwo'])
TypeError: __cinit__() got an unexpected keyword argument 'partition_cols'

Problem description

It was my understanding that the to_parquet method pass the kwargs to Pyarrow and save a partitioned table.

Expected Output

Partitioned Parquet file saved.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-5-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: None
pip: 18.0
setuptools: 32.3.1
Cython: None
numpy: 1.15.2
scipy: 1.1.0
pyarrow: 0.11.0
xarray: None
IPython: 7.0.1
sphinx: None
patsy: None
dateutil: 2.7.3
pytz: 2018.5
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

Thanks!

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-10-22T20:50:08Z

pandas uses pyarrow.parquet.write_table. It seems like multi-part Datasets are written using pyarrow.parquet.write_to_dataset.

I'm not sure whether it makes sense for us to (optionally) use write_to_dataset, or whether pyarrow should support partition_cols in write_table.

cc @wesm if you have thoughts here.

xhochy · 2018-10-22T21:02:49Z

In the case of partition_cols, one should use write_to_dataset. write_table is much more simple/low level function.

TomAugspurger · 2018-10-22T21:07:46Z

So, pandas could look for kwargs like partition_cols (any others?) and if that's detected use write_to_dataset(table, ...). That seems fine to me.

anjsudh · 2018-10-24T17:16:23Z

Will pick this up

* closes pandas-dev#23283

TomAugspurger added the IO Parquet parquet, feather label Oct 22, 2018

anjsudh mentioned this issue Oct 24, 2018

Support for partition_cols in to_parquet #23321

Merged

4 tasks

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 25, 2018

add test for issue pandas-dev#23283

692b9c1

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 26, 2018

closes pandas-dev#23283

d2ec124

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 26, 2018

closes pandas-dev#23283

c18c99c

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

closes pandas-dev#23283

d4d6969

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

closes pandas-dev#23283

02fd984

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

closes pandas-dev#23283

2cae2fe

anjsudh added a commit to anjsudh/pandas that referenced this issue Oct 27, 2018

closes pandas-dev#23283

41c2828

jreback added this to the 0.24.0 milestone Oct 28, 2018

TomAugspurger closed this as completed in 8ed92ef Nov 10, 2018

JustinZhengBC pushed a commit to JustinZhengBC/pandas that referenced this issue Nov 14, 2018

ENH: Support for partition_cols in to_parquet (pandas-dev#23321)

eefb76e

* closes pandas-dev#23283

tm9k1 pushed a commit to tm9k1/pandas that referenced this issue Nov 19, 2018

ENH: Support for partition_cols in to_parquet (pandas-dev#23321)

a634a9a

* closes pandas-dev#23283

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

ENH: Support for partition_cols in to_parquet (pandas-dev#23321)

55c259d

* closes pandas-dev#23283

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019

ENH: Support for partition_cols in to_parquet (pandas-dev#23321)

a8f3abe

* closes pandas-dev#23283

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write partitioned Parquet file using to_parquet #23283

Write partitioned Parquet file using to_parquet #23283

ispmarin commented Oct 22, 2018

INSTALLED VERSIONS

TomAugspurger commented Oct 22, 2018

xhochy commented Oct 22, 2018 •

edited

Loading

TomAugspurger commented Oct 22, 2018

anjsudh commented Oct 24, 2018

Write partitioned Parquet file using to_parquet #23283

Write partitioned Parquet file using to_parquet #23283

Comments

ispmarin commented Oct 22, 2018

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented Oct 22, 2018

xhochy commented Oct 22, 2018 • edited Loading

TomAugspurger commented Oct 22, 2018

anjsudh commented Oct 24, 2018

Output of `pd.show_versions()`

xhochy commented Oct 22, 2018 •

edited

Loading