-
Notifications
You must be signed in to change notification settings - Fork 651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT-#4527: Add Modin logging to AxisPartition and BlockPartition #7079
Conversation
3fc0ed0
to
63f6ad6
Compare
…Partition classes Signed-off-by: arunjose696 <arunjose696@gmail.com>
63f6ad6
to
a7e3b50
Compare
|
||
class BaseDataframeAxisPartition(ABC): # pragma: no cover | ||
|
||
class BaseDataframeAxisPartition(ABC, ClassLogger): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we use the methods of this class inside remote functions, how does this addition of ClassLogger here affect logging for remote functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand BaseDataframeAxisPartition's methods would call the remote functions, I could not see the other way around where methods of BaseDataframeAxisPartition are called in remote functions, I could see data of the partitions are send to remote functions and operations are performed on them, Is there anyplace where partitions would be directly passed to remote functions?
When running remote functions from the methods of BaseDataframeAxisPartition the class logger would work normally as it would log BaseDataframeAxisPartition methods and leave out logging of remote functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BaseDataframeAxisPartition's methods do not call the remote functions. Instead, engine specific virtual partitions call remote functions inside of which BaseDataframeAxisPartition's methods get called. Here is an example where we pass an object reference to the method of BaseDataframeAxisPartition and object references to data partitions in the remote function.
modin/modin/core/execution/ray/implementations/pandas_on_ray/partitioning/virtual_partition.py
Lines 176 to 192 in 93b4e2a
return _deploy_ray_func.options( | |
num_returns=(num_splits if lengths is None else len(lengths)) | |
* (1 + cls._PARTITIONS_METADATA_LEN), | |
**({"max_retries": max_retries} if max_retries is not None else {}), | |
).remote( | |
cls._get_deploy_axis_func(), | |
*f_args, | |
num_splits, | |
maintain_partitioning, | |
*partitions, | |
axis=axis, | |
f_to_deploy=func, | |
f_len_args=len(f_args), | |
f_kwargs=f_kwargs, | |
manual_partition=manual_partition, | |
lengths=lengths, | |
return_generator=True, |
Given that, I wonder if this logging works inside remote functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it Thanks
When we enable the logger with LogMode.enable()
in modin code, the logger gets enabled only on the main process, The logger remains disabled in workers so the remote functions execute the function without logging so info about the function wouldn't be logged from remote function.
@@ -93,6 +93,7 @@ def __init__( | |||
) | |||
) | |||
|
|||
@disable_logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why should we disable logging for this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logging requires modules like sys, which might already have been cleaned up by the time del is invoked(As we couldnt be sure when del would be invoked). This leads to ModuleNotFoundError. So disabling logging of del
What do these changes do?
In #4527 it was mentioned to add modin logging to different layers, Among these logging support has already been added to Partition Manager, The current PR adds logging to Block and Axis partitions.
For logging in remote function it would be recommended to use external tools specific to backends for example Ray timeline for ray backend , UNIDIST_MPI_LOG for unidist etc.
flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
git commit -s
docs/development/architecture.rst
is up-to-date