Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#7202: Use custom resources for Ray #7205

Merged
merged 7 commits into from
Apr 23, 2024

Conversation

YarShev
Copy link
Collaborator

@YarShev YarShev commented Apr 19, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Use custom resources for Ray to schedule a task on a concrete node #7202
  • tests passing
  • module layout described at docs/development/architecture.rst is up-to-date

Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
@anmyachev
Copy link
Collaborator

@YarShev do we need to adjust the procedure for determining the total available number of cores, depending on these custom resources?

@YarShev
Copy link
Collaborator Author

YarShev commented Apr 20, 2024

Custom resources have nothing to do with num_cpus so no need for adjustment.

@anmyachev
Copy link
Collaborator

Custom resources have nothing to do with num_cpus so no need for adjustment.

Isn’t it possible to use these resources to limit the number of nodes on which calculations will be launched? It turns out that we will be dividing into a much larger number of partitions than can be executed in parallel.

@YarShev
Copy link
Collaborator Author

YarShev commented Apr 22, 2024

Your thoughts pushed me to a problem in the current setup. The issue is that if the user sets resources={"special_hardware": 1}, we pass this parameter as is in remote functions. This way we limit the parallelism to only one remote task to be executed. I would think of the following preprocessing resources to further pass those in remote functions. What do you think?

resources_per_task = {}

for k, v in RayCustomResources.get():
    resources_per_task[k] = v / v / num_cpus

Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
modin/config/envvars.py Outdated Show resolved Hide resolved
anmyachev
anmyachev previously approved these changes Apr 22, 2024
@@ -126,6 +127,7 @@ def initialize_ray(
"object_store_memory": object_store_memory,
"_redis_password": redis_password,
"_memory": object_store_memory,
"resources": RayInitCustomResources.get(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a test for this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly would you like to test with this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not test the situation when this config is different from None.

Comment on lines +331 to +332
>>> with context(RayTaskCustomResources={"special_hardware": 0.001}):
... df.<op>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good that there is now an option to limit concurrency, but this only works for Ray. Let's create an issue for the rest of the engines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is not generic but only specific to Ray. I am not sure if there is a way to limit concurrency for other engines. I think if we will want to have something similar for other engines, we will open an issue and explore options if they are there. Do you still think we should create an issue now?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Limiting concurrency in context looks like a good feature for an advanced user. We can create a low priority issue, but now.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
Signed-off-by: Igoshev, Iaroslav <iaroslav.igoshev@intel.com>
self.func, self.data, *self.args, **self.kwargs
)
result, length, width, ip = remote_exec_func.options(
resources=RayTaskCustomResources.get()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just call it RayTaskResources? (the same for RayInitResources) Since this config is used to pass values to resources.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to be explicit here as Ray itself calls it as custom resources - https://docs.ray.io/en/latest/ray-core/scheduling/resources.html#custom-resources.

@anmyachev anmyachev merged commit 71b8da4 into modin-project:main Apr 23, 2024
46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use custom resources for Ray to schedule a task on a concrete node
2 participants