Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map_chunked implementation #99

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

lorenzorubi-db
Copy link

DataExplorer new method map_chunked as an alternative to map:

  • map processes the tables one by one
  • map_chunked processes the tables in chunks of size tables_per_chunk

discoverx/explorer.py Outdated Show resolved Hide resolved
@@ -197,6 +198,39 @@ def map(self, f) -> list[any]:

return res

def map_chunked(self, f: Callable, tables_per_chunk: int, **kwargs) -> list[any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def map_chunked(self, f: Callable, tables_per_chunk: int, **kwargs) -> list[any]:
def map_chunked(self, f: Callable, tables_per_chunk: int, **kwargs) -> list[Any]:

any is a function, not a type

setup.py Outdated
@@ -34,6 +34,7 @@
"delta-spark>=2.2.0",
"pandas<2.0.0", # From 2.0.0 onwards, pandas does not support iteritems() anymore, spark.createDataFrame will fail
"numpy<1.24", # From 1.24 onwards, module 'numpy' has no attribute 'bool'.
"more_itertools",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create LPP ticket for this, otherwise re-implement a single function. Don't add whole library for the sake of a function

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants