Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Actor Init Callback #2221

Merged
merged 49 commits into from
Aug 23, 2023
Merged

Add Actor Init Callback #2221

merged 49 commits into from
Aug 23, 2023

Conversation

jafermarq
Copy link
Contributor

@jafermarq jafermarq commented Aug 17, 2023

This extends the content of #1969 adding support for Actors to accept a user-defined callback to execute upon object creation. This callback is optional.

One of the most obvious reasons for using this callback is when running Flower with TF. In those cases, due to the default behaviour of TF of mapping the entire VRAM, it's likely that users will encounter OOM immediately (even if their workloads don't actually need all VRAM). Lukily, we can design a workaround by enabling gpu growth. This is exactly what the built-in enable_tf_gpu_growth callback in src/.../simulation/ray_transport/utils.py allows you to do.

You can easily test this by editing the lines around start_simulation in the TF simulation example:

# keep rest of sim.py as is.
....

def main() -> None:

    from flwr.simulation.ray_transport.utils import enable_tf_gpu_growth

    # Enable GPU growth in the main thread (the one used by the server to quite likely
    # run global evaluation using GPU)
    enable_tf_gpu_growth()

    # Start Flower simulation
    fl.simulation.start_simulation(
        client_fn=client_fn,
        num_clients=NUM_CLIENTS,
        client_resources={"num_cpus": 2, "num_gpus": 0.25}, #<----- let's pack 4 actors in each GPU.
        config=fl.server.ServerConfig(num_rounds=5),
        strategy=fl.server.strategy.FedAvg(
            fraction_fit=0.1,
            fraction_evaluate=0.1,
            min_fit_clients=10,
            min_evaluate_clients=10,
            min_available_clients=NUM_CLIENTS,
        ),
        actor_kwargs={
            "on_actor_init_fn": enable_tf_gpu_growth # <--------------- pass the same func to execute upon actor init.
        },  
    )

jafermarq and others added 30 commits June 25, 2023 13:45
Co-authored-by: Daniel J. Beutel <daniel@flower.dev>
Co-authored-by: Daniel J. Beutel <daniel@flower.dev>
@jafermarq jafermarq changed the base branch from main to VCE_with_actorpool August 17, 2023 14:50
@jafermarq jafermarq added the enhancement New feature or request label Aug 17, 2023
@jafermarq jafermarq marked this pull request as ready for review August 21, 2023 07:59
@jafermarq jafermarq changed the title Adding Actor init callback Add Actor Init Callback Aug 21, 2023
Base automatically changed from VCE_with_actorpool to main August 23, 2023 08:34
src/py/flwr/simulation/ray_transport/utils.py Outdated Show resolved Hide resolved
log(ERROR, ex)
raise ex
except Exception as ex:
log(ERROR, "Do you have Tensorflow installed?")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can log this before the try by checking if TF is None (we should also return from the function in that case).

An exception on this line could be many things (whatever the experimental TensorFlow function decides to raise).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true. i moved the TF check at the top of the function.

src/py/flwr/simulation/ray_transport/ray_actor.py Outdated Show resolved Hide resolved
jafermarq and others added 2 commits August 23, 2023 09:51
Co-authored-by: Daniel J. Beutel <daniel@flower.dev>
Copy link
Member

@danieljanes danieljanes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm!

@danieljanes danieljanes merged commit 4ed211d into main Aug 23, 2023
27 checks passed
@danieljanes danieljanes deleted the actor_on_init_callback branch August 23, 2023 09:19
alessiomora pushed a commit to alessiomora/flower that referenced this pull request Aug 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants