What's Changed

New Features in 0.8.4

At FedML, our mission is to remove the friction and pain points of converting your ML & AI models from R&D into production-scale-distributed and federated training & serving via our no-code MLOps platform.
FedML is happy to announce our update 0.8.4. This release is filled with new capabilities, bug fixes, and enhancements. A key announcement is the launch of FedLLM for simplifying & reducing the costs associated with training & serving large language models. You can read more about it on our blog post.

New Features

[CoreEngine/MLOps] Launched FedLLM (Federated Large Language Model) for training and serving GitHub Blog
[CoreEngine] Deployed Helm Charts to our repository for packaging and ease of deploying on Kubernetes https://github.com/FedML-AI/FedML/blob/master/installation/install_on_k8s/fedml-edge-client-server/fedml-server-deployment-latest.tgz https://github.com/FedML-AI/FedML/blob/master/installation/install_on_k8s/fedml-edge-client-server/fedml-client-deployment-latest.tgz
[Documents] Refactored the devops and installation structures (devops for internal pipelines, installation for external users). https://github.com/FedML-AI/FedML/tree/master/installation
[DevOps] Deployed a new fedml fedml-light docker image and related documents. DockerHub GitHub doc
[DevOps] Built the light docker image to deploy to the k8s cluster, refined k8s related installation sections in the document. https://hub.docker.com/r/fedml/fedml-edge-client-server-light/tags
[CoreEngine] Added support for multiple simultaneous training jobs when using our open source MLOPs commands.
[CoreEngine] Improved training health monitoring and properly report failed status.
[CoreEngine] Added APIs for enabling, disabling and querying client agent status. The APIs are as follows.

curl -XPOST http://localhost:40800/fedml/api/v2/disableAgent -d’{}'
curl -XPOST http://localhost:40800/fedml/api/v2/enableAgent -d’{}'
curl -XPOST http://localhost:40800/fedml/api/v2/queryAgentStatus -d’{}'

Bug Fixes

[CoreEngine] Create distinct device ids when running multiple Docker containers to simulate multiple clients or silos on one machine. Now using the product id plus a random id as the device id
[CoreEngine] Fixed a device assignment issue in get_torch_device in the distributed training mode.
[Serving] Fixed the exceptions that occurred when recovering at startup after upgrading.
[CoreEngine] Fixed the device id issue when running in the docker on MacOS.
[App] Fixed the issue in the app fedprox + sage graph regression and graph clf.
[App] Fixed an issue with the heart disease app failing when running in MLOps.
[App] Fixed an issue with the heart disease app’s performance curve
[App/Android] Enhanced Android starting/stopping mechanism and fixed the following issues:

Fixed status displays after stopping the run.
When stopping a Run during a round that has not finished, the MNN process will remain in IDLE state (it was previously going OFFLINE).
When stopping after a round is done, the training will now stop
Python server TAG in the logs is not correct. Now you can easily find the server mentioned in logs.

Enhancements

[Serving] Tested the inference backend and checked the response after the model deployment is finished.
[CoreEngine/Serving] Set the GPU option based on the availability of CUDA when running the inference backend, optimize the mqtt connection checking.
[CoreEngine] Stored model caches to the user home directory when running the federated learning.
[CoreEngine] Added the device id to the monitor message when processing inference request
[CoreEngine] Reported the runner exception and ignored exceptions when missing the bootstrap section in the fedml_config.yaml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FedML 0.8.4

What's Changed

New Features in 0.8.4

New Features

Bug Fixes

Enhancements