Add Actor Init Callback #2221

jafermarq · 2023-08-17T14:49:36Z

This extends the content of #1969 adding support for Actors to accept a user-defined callback to execute upon object creation. This callback is optional.

One of the most obvious reasons for using this callback is when running Flower with TF. In those cases, due to the default behaviour of TF of mapping the entire VRAM, it's likely that users will encounter OOM immediately (even if their workloads don't actually need all VRAM). Lukily, we can design a workaround by enabling gpu growth. This is exactly what the built-in enable_tf_gpu_growth callback in src/.../simulation/ray_transport/utils.py allows you to do.

You can easily test this by editing the lines around start_simulation in the TF simulation example:

# keep rest of sim.py as is.
....

def main() -> None:

    from flwr.simulation.ray_transport.utils import enable_tf_gpu_growth

    # Enable GPU growth in the main thread (the one used by the server to quite likely
    # run global evaluation using GPU)
    enable_tf_gpu_growth()

    # Start Flower simulation
    fl.simulation.start_simulation(
        client_fn=client_fn,
        num_clients=NUM_CLIENTS,
        client_resources={"num_cpus": 2, "num_gpus": 0.25}, #<----- let's pack 4 actors in each GPU.
        config=fl.server.ServerConfig(num_rounds=5),
        strategy=fl.server.strategy.FedAvg(
            fraction_fit=0.1,
            fraction_evaluate=0.1,
            min_fit_clients=10,
            min_evaluate_clients=10,
            min_available_clients=NUM_CLIENTS,
        ),
        actor_kwargs={
            "on_actor_init_fn": enable_tf_gpu_growth # <--------------- pass the same func to execute upon actor init.
        },  
    )

…ll changes

…s removed from pool

…with_actorpool

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

src/py/flwr/simulation/ray_transport/utils.py

danieljanes · 2023-08-23T08:48:32Z

src/py/flwr/simulation/ray_transport/utils.py

+                log(ERROR, ex)
+                raise ex
+    except Exception as ex:
+        log(ERROR, "Do you have Tensorflow installed?")


We can log this before the try by checking if TF is None (we should also return from the function in that case).

An exception on this line could be many things (whatever the experimental TensorFlow function decides to raise).

true. i moved the TF check at the top of the function.

src/py/flwr/simulation/ray_transport/ray_actor.py

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

danieljanes

Lgtm!

jafermarq and others added 30 commits June 25, 2023 13:45

VCE with ActorPool

71d50fb

each clientproxy gets its result (discarding other logic in server.py)

6f91e70

Merge branch 'main' into VCE_with_actorpool

73d4bae

custom ActorPool addressable by each ClientProxy

f6588dd

parse CPU/GPU resources to spawn correct size of ActorPool; other sma…

a1ee630

…ll changes

handling clinents failure

8b09717

formatting

248cba5

client failure OK; actor restarted if it fails; after N fails actor i…

5ff115a

…s removed from pool

fix

e15eb45

tolerant to full node disconnect

72a7bde

Merge branch 'main' into VCE_with_actorpool

20449ba

better

45b4021

consolidate actorpool and utilities

9456470

better handling of exception when actor dies for good

ea967c4

w/ previous

5386bff

minor changes

591e667

suport for node abrupt disconnect

3e9d1a8

reverting lock positioning

18de9e4

minor changes

0213d9d

minor changes; hints on how to try if simulation fails

7a3c791

Merge branch 'main' into VCE_with_actorpool

256b471

Merge branch 'main' into VCE_with_actorpool

774dacb

Format

c47780f

Merge branch 'VCE_with_actorpool' of github.com:adap/flower into VCE_…

fdd55da

…with_actorpool

Update src/py/flwr/simulation/ray_transport/ray_actor.py

36ba7e7

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

Apply suggestions from code review

b0238b9

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

tweaks post-comments; fix to pool_size_from_resources

73ef836

Added TF actor with GPU fix; now you can specify which VCEActor to use

6f2c374

minor tweaks

58d6722

option to specify actor scheduling strategy

b6f5fa7

jafermarq added 13 commits August 15, 2023 17:06

.

d851f4c

tweaks

c4883f3

tweaks, fixes for serialisation

9b2d27e

Merge branch 'main' into VCE_with_actorpool

5cfaa84

types and more

ff48ee1

w/ previous

400e89d

p37

ed5b01d

yes

fcb42a1

actor generator; periodically check for cluster growth

151a507

w/ previous

f524fff

better assesment of number of actors that fit in cluster; fix

de7e252

minor tweaks

935a4ca

on_init_func support for Actors; added default func for TF workloads

3773156

jafermarq changed the base branch from main to VCE_with_actorpool August 17, 2023 14:50

jafermarq added the enhancement New feature or request label Aug 17, 2023

jafermarq marked this pull request as ready for review August 21, 2023 07:59

jafermarq requested review from danieljanes and tanertopal as code owners August 21, 2023 07:59

jafermarq changed the title ~~Adding Actor init callback~~ Add Actor Init Callback Aug 21, 2023

Base automatically changed from VCE_with_actorpool to main August 23, 2023 08:34

jafermarq added 3 commits August 23, 2023 09:41

merge w/ main

1ba9c81

format

0a9886a

more formatting

5bea733

danieljanes requested changes Aug 23, 2023

View reviewed changes

jafermarq and others added 2 commits August 23, 2023 09:51

Update src/py/flwr/simulation/ray_transport/ray_actor.py

5edfae6

Co-authored-by: Daniel J. Beutel <daniel@flower.dev>

update

04643b3

danieljanes approved these changes Aug 23, 2023

View reviewed changes

danieljanes merged commit 4ed211d into main Aug 23, 2023
27 checks passed

danieljanes deleted the actor_on_init_callback branch August 23, 2023 09:19

alessiomora pushed a commit to alessiomora/flower that referenced this pull request Aug 30, 2023

Add Actor Init Callback (adap#2221)

dd15825

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Actor Init Callback #2221

Add Actor Init Callback #2221

jafermarq commented Aug 17, 2023 •

edited

Loading

danieljanes Aug 23, 2023

jafermarq Aug 23, 2023

danieljanes left a comment

Add Actor Init Callback #2221

Add Actor Init Callback #2221

Conversation

jafermarq commented Aug 17, 2023 • edited Loading

danieljanes Aug 23, 2023

Choose a reason for hiding this comment

jafermarq Aug 23, 2023

Choose a reason for hiding this comment

danieljanes left a comment

Choose a reason for hiding this comment

jafermarq commented Aug 17, 2023 •

edited

Loading