-
Notifications
You must be signed in to change notification settings - Fork 833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FedMLB baseline #2340
Add FedMLB baseline #2340
Conversation
…ading state. Implemented a custom server with MyServer in server.py. Changed main.py accordingly. Updated README.me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with previous review.
Co-authored-by: Javier <jafermarq@users.noreply.github.com>
Co-authored-by: Javier <jafermarq@users.noreply.github.com>
Co-authored-by: Javier <jafermarq@users.noreply.github.com>
Co-authored-by: Javier <jafermarq@users.noreply.github.com>
Co-authored-by: Javier <jafermarq@users.noreply.github.com>
Co-authored-by: Javier <jafermarq@users.noreply.github.com>
python -m fedmlb.dataset_preparation dataset_config.alpha_dirichlet=0.6 total_clients=500 | ||
``` | ||
Note that, to reproduce those settings, we leverage the `.txt` files | ||
contained in the `client_data` folder in this project. Such files store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we could remove the client_data
files from the directory ? (they are ~7MB in total). Is there an obvious way of constructing those files via a not-too-complex script? -- we can naturally request people to git-clone them from the original repo you mention below, but that might not be always reliable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those files could be constructed via a script by using a Dirichlet distribution just as the original paper. The files you mention, in fact, contain the IDs of the images assigned to that client, leveraging a Dirichlet distribution with a specific concentration parameters to select a certain number of images for a certain label (and then randomly selecting that number of images in the pool of images with that label). Obviously, if you re-run such a script, you could not be able to reproduce that specific per-client dataset compositions unless you know the seed used to set the pseudo-random generation of numbers (and probably running in the same machine).
For this reason, for reproducibility puproses, I decided to exactly compose the clients' dataset as they were crafted in the original paper.
So, in principle, I can produce a script that generates the composition of datasets (basically the .txt
files) that follows a Dirichlet distribution of labels among clients with a certain concentration parameter (but datasets would be different from the original code), or I can find a better way of storing the data contained in the files under client_data
.
In the original code, you can find those generation scripts here.
For now, I've deleted some unused .txt
files from the folder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I'll think about this and discuss with the others in the team. Now the files are 4MB (down from 7MB) so that's nice to see. Th
ere are other baselines that also have some not-so-small files as part of their proposed PR, so I'll update this thread once i figure out what's the best way to deal with these. Maybe keeping them is fine. Let's see...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @alessiomora ,
Just a small comment for the pyproject.toml
. I also enabled the tests but a small formatting issue was flagged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
Issue
Implementation of FedMLB for the SoR inititative.
Description
Implementation of FedMLB for the SoR inititative.
Related issues/PRs
Issue #2048