Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read Parquet file with fastparquet instead of Arrow #5285

Closed
trgiangdo opened this issue Nov 30, 2022 · 5 comments · Fixed by #5297
Closed

Read Parquet file with fastparquet instead of Arrow #5285

trgiangdo opened this issue Nov 30, 2022 · 5 comments · Fixed by #5297
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas pandas.io

Comments

@trgiangdo
Copy link
Contributor

Hello Modin team,

I'm trying to read Parquet file using

modin.pandas.read_parquet("example.parquet", engine="fastparquet")

On my environment, only fastparquet is install.
The read_parquet will raise:

ImportError: Missing optional dependency 'pyarrow'. pyarrow is required to read parquet files.

However, if I remove Line 606-609 from parquet_dispatcher.py, the read function works fine since I believe Modin support both fastparquet and pyarrow.

Can we also check for fastparquet here?
Or is there any specific reason to not do so?

Thank you in advance for answering my question,

@pyrito
Copy link
Collaborator

pyrito commented Nov 30, 2022

Hi @trgiangdo thank you for raising this issue! I was able to replicate the issue on latest master. I think you're right, we should be adding fastparquet to that check as well! I think that eluded our tests since we have pyarrow installed in our CI runners.

@trgiangdo would you like to contribute the change to Modin? We can help you with your PR! 😄

@pyrito pyrito added pandas concordance 🐼 Functionality that does not match pandas pandas.io P2 Minor bugs or low-priority feature requests and removed question ❓ Questions about Modin Triage 🩹 Issues that need triage labels Nov 30, 2022
@mvashishtha mvashishtha added the bug 🦗 Something isn't working label Nov 30, 2022
@trgiangdo
Copy link
Contributor Author

Yes, this should be a simple fix.

My main concern is that we have to wait until the next realease of Modin for this to work.

Is there any possible way to add a hotfix to the current version of Modin?

@pyrito
Copy link
Collaborator

pyrito commented Nov 30, 2022

We could merge in the fix to master and you could work off the latest master if that would work for your codebase? I also think the next minor release should be coming soon.

@trgiangdo
Copy link
Contributor Author

@pyrito @mvashishtha Can you share when will the next minor version be released? It would help us so much. Thank you.

@mvashishtha
Copy link
Collaborator

@trgiangdo the next planned release is 0.18.0, scheduled for December 7. After #5285 is merged and before the next release comes out, you can install from master or from a specific commit to get your fix.

anmyachev pushed a commit that referenced this issue Dec 1, 2022
…ormat (#5297)

Signed-off-by: trgiangdo <dtr.giang.1299@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas pandas.io
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants