Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: local registry #421

Merged
merged 1 commit into from
May 31, 2023
Merged

Conversation

GustavBaumgart
Copy link
Collaborator

Description

A local registry saves the metrics (which are currently implemented on a per-round basis) to a ~/flame-log directory. The structure of this directory is flame-log --> job_id --> task_id --> {model, params, metrics}. Each metric is saved in its own separate csv.

Additionally, the method signatures for call were changed, and the method signatures for save_params were aligned to what the MLflow registry has used.

Type of Change

  • Bug Fix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (please describe)

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

@codecov-commenter
Copy link

codecov-commenter commented May 25, 2023

Codecov Report

Merging #421 (202f798) into main (53a850e) will not change coverage.
The diff coverage is n/a.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@           Coverage Diff           @@
##             main     #421   +/-   ##
=======================================
  Coverage   14.86%   14.86%           
=======================================
  Files          48       48           
  Lines        2832     2832           
=======================================
  Hits          421      421           
  Misses       2382     2382           
  Partials       29       29           

Copy link
Contributor

@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left comments.

lib/python/flame/config.py Show resolved Hide resolved
@@ -70,7 +70,7 @@ def internal_init(self) -> None:
"""Initialize internal state for role."""
self.registry_client = registry_provider.get(self.config.registry.sort)
# initialize registry client
self.registry_client(self.config.registry.uri, self.config.job.job_id)
self.registry_client(self.config.registry.uri, self.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing self.config alone may be sufficient since registry.uri can be accessed from config.

@@ -81,7 +81,7 @@ def internal_init(self) -> None:

self.registry_client = registry_provider.get(self.config.registry.sort)
# initialize registry client
self.registry_client(self.config.registry.uri, self.config.job.job_id)
self.registry_client(self.config.registry.uri, self.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -71,7 +71,7 @@ def internal_init(self) -> None:

self.registry_client = registry_provider.get(self.config.registry.sort)
# initialize registry client
self.registry_client(self.config.registry.uri, self.config.job.job_id)
self.registry_client(self.config.registry.uri, self.config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -21,6 +21,7 @@
from .object_factory import ObjectFactory
from .registry.dummy import DummyRegistryClient
from .registry.mlflow import MLflowRegistryClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please refactor other imports to use absolute path

def setup_run(self, name: str) -> None:
"""Set up a run."""
# set up directories
home_dir = os.path.expanduser("~")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it may be better to use pathlib?
An example usage:

from pathlib import Path
print(Path.home())

log_dir = os.path.join(home_dir, "flame-log")
self.registry_path = os.path.join(log_dir, self.job_id, self.task_id)

os.system(f'rm -rf {self.registry_path}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps better to use shutil.rmtree()?

@@ -23,7 +23,7 @@
from mlflow.exceptions import MlflowException
from mlflow.protos.databricks_pb2 import RESOURCE_ALREADY_EXISTS, ErrorCode

from ..config import Hyperparameters
from ..config import (Hyperparameters, Config)
from .abstract import AbstractRegistryClient
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use absolute path

os.mkdir(os.path.join(self.registry_path, "metrics"))
os.mkdir(os.path.join(self.registry_path, "model"))
os.mkdir(os.path.join(self.registry_path, "params"))
#os.mkdir(os.path.join(self.registry_path, "artifact"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it if this is not needed

os.makedirs(self.registry_path)
os.mkdir(os.path.join(self.registry_path, "metrics"))
os.mkdir(os.path.join(self.registry_path, "model"))
os.mkdir(os.path.join(self.registry_path, "params"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of lines 57-60, how about the following?

for dir in ["metrics", "model", "params"]:
    os.makedirs(os.path.join(self.registry_path, dir))

Copy link
Collaborator

@jaemin-shin jaemin-shin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Agree to Myungjin's comments too!

self.model_versions = defaultdict(lambda: 1)

def save_metrics(self, epoch: int, metrics: Optional[dict[str,
float]]) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better merge line 68 with 67. Setting up formatter would help

"""Save a model in a model registry."""
model_folder = os.path.join(self.registry_path, "model", name)
if not os.path.exists(model_folder):
os.mkdir(model_folder)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
os.mkdir(model_folder)
os.makedirs(model_folder, exist_ok=True)

You may consider merging 95 and 96 into one line like this

import tensorflow
return tensorflow.keras.models.load_model(model_path)

return None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we raise an error like ModuleNotFoundError, instead of return None?

@GustavBaumgart GustavBaumgart force-pushed the registry branch 2 times, most recently from 202f798 to f640cce Compare May 27, 2023 01:03
@GustavBaumgart GustavBaumgart marked this pull request as ready for review May 30, 2023 20:59
Copy link
Contributor

@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments; please consider to address them

if os.path.exists(self.registry_path):
shutil.rmtree(self.registry_path)

os.makedirs(self.registry_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem necessary because line 65 will create any intermediate directories in the path.

shutil.rmtree(self.registry_path)

os.makedirs(self.registry_path)
for directory in ["metrics", "model", "params"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ("metrics", "model" and "params") are used in several places in the code.
In this case, making them as constants may be better to reduce the chance of introducing a bug.

with open(filename, "w") as file:
csv_writer = csv.writer(file)
csv_writer.writerow(["round", metric])
csv_writer.writerow([epoch, metrics[metric]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think lines 74-83 can be simplified in the following way:

exists = os.path.exists(filename):
with open(filename, "a+") as file:
    csv_writer = csv.writer(file)
    csv_writer.writerow(["round", metric]) if not exists else None
    csv_writer.writerow([epoch, metrics[metric]])

Copy link
Contributor

@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments; please consider to address them

Copy link
Collaborator

@jaemin-shin jaemin-shin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for me! Left one minor comment.

if ml_framework == MLFramework.PYTORCH:
import torch

# is not safe but matches behavior of MLFlow registry
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what part of the code does this comment indicates? could be confusing as there are same comment at different location in line 124. Also, the comment itself could be more detailed to be helpful, by explaining which behavior of MLFlow registry it refers to

A local registry saves the metrics (which are currently implemented on a per-round basis) to a ~/flame-log directory.
The structure of this directory is flame-log --> job_id --> task_id --> {model, params, metrics}.
Each metric is saved in its own separate csv.

Additionally, the method signatures for __call__ were changed, and the method signatures for save_params were aligned to what the MLFlow registry has used.
mlflow_runname was moved from flame/common/util.py to flame/registry/mlflow.py.
Copy link
Contributor

@myungjin myungjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@GustavBaumgart GustavBaumgart merged commit 6ffabf8 into cisco-open:main May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants