Skip to content
This repository has been archived by the owner on Sep 18, 2024. It is now read-only.

refactor: user experience improvement after fine-tuning #522

Merged
merged 29 commits into from
Sep 8, 2022

Conversation

bwanglzu
Copy link
Member

@bwanglzu bwanglzu commented Sep 1, 2022

Closes #511 #514

  1. take out callback stubs, dependent on finetuner-stubs. DONE
  2. take out model stubs, dependent on finetuner-stubs. DONE
  3. add finetuner-commons as a soft dependency: pip install "finetuner[full]" DONE
  4. provide extra functionality based on finetuner-commons, includs get_model and preprocess_and_collate DONE:
# after fine-tuning
artifact = run.save_artifact('my-model')
# artifacts include model and preprocess/collate function
# re-build the model 
model = finetuner.get_model(artifact=artifact)
# preprocess, collate and encode
finetuner.encode(model=model, data=da)

  • This PR references an open issue
  • I have added a line about this change to CHANGELOG

@bwanglzu bwanglzu linked an issue Sep 1, 2022 that may be closed by this pull request
@bwanglzu bwanglzu changed the title refactor: take out callbacks refactor: user experiment improvement after fine-tuning Sep 1, 2022
@bwanglzu bwanglzu changed the title refactor: user experiment improvement after fine-tuning refactor: user experience improvement after fine-tuning Sep 1, 2022
@github-actions github-actions bot added the area/testing This issue/PR affects testing label Sep 6, 2022
@github-actions github-actions bot removed the area/testing This issue/PR affects testing label Sep 6, 2022
@bwanglzu
Copy link
Member Author

bwanglzu commented Sep 6, 2022

related PR in core: https://github.com/jina-ai/finetuner-core/pull/307

@guenthermi
Copy link
Member

For the clip models the stubs used here before are quite different from the stubs in finetuner-core (and finetuner-commons), i.e., there was only one stub instead of two for clip-text and clip-vision. This should be taken care of, when the code is refactored, e.g., in the get_row method:

def get_row(model_stub) -> Tuple[str, ...]:
"""Get table row."""
return (
model_stub.name,
model_stub.task,
str(model_stub.output_shape[1]),
model_stub.architecture,
model_stub.description,
)

@bwanglzu
Copy link
Member Author

bwanglzu commented Sep 7, 2022

@github-actions github-actions bot added the area/testing This issue/PR affects testing label Sep 7, 2022
@bwanglzu bwanglzu marked this pull request as ready for review September 8, 2022 06:47
@github-actions github-actions bot added size/l and removed size/m labels Sep 8, 2022
@@ -6,6 +6,6 @@ version = 0.5.2
# E203 is whitespace before ':' - which occurs in numpy slicing, e.g. in
# dists[2 * i : 2 * i + 2, :]
# W503 is line break before binary operator - happens when black splits up lines
ignore = E203, W503
ignore = E203, W503, F405, F403
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do these ignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import * from stubs and commons

setup.py Outdated
install_requires=_main_deps,
install_requires=[
'docarray[common]>=0.13.31',
'finetuner-stubs==0.0.1b1',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont have an official version yet?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

until we release and trigger core CD

@@ -0,0 +1 @@
from stubs.callback import * # noqa F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you create a directory here? why not a callback.py module?

@@ -0,0 +1,17 @@
from stubs.model import * # noqa F401
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, why not a model.py module?

@@ -74,12 +75,22 @@ def logs(self) -> str:
experiment_name=self._experiment_name, run_name=self._name
)

def stream_logs(self) -> Iterator[str]:
def stream_logs(self, interval: int = 5) -> Iterator[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's document the interval argument here

)
}
def _list_models() -> Dict[str, model.ModelStubType]:
from stubs import model as model_stub
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not import at the top? I think this importing inside the function can become a bad habit

@@ -161,7 +172,7 @@ def fit(
:param learning_rate: learning rate for the optimizer.
:param epochs: Number of epochs for fine-tuning.
:param batch_size: Number of items to include in a batch.
:param callbacks: List of callback stub objects. See the `finetuner.callback`
:param callbacks: List of callback stub objects.`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

` char in the end

select_model: Optional[str] = None,
gpu: bool = False,
logging_level: str = 'WARNING',
):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add type hint

@@ -296,3 +307,82 @@ def get_token() -> str:
:return: user token as string object.
"""
return ft.get_token()


def get_model(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like the get_model name, maybe build_engine?

Copy link
Member Author

@bwanglzu bwanglzu Sep 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, think about when we support pytorch, it's not ort engine anymore

:param select_model: Finetuner run artifacts might contain multiple models. In
such cases you can select which model to deploy using this argument. For CLIP
fine-tuning, you can choose either `clip-vision` or `clip-text`.
:param gpu: if specified to True, use cuda device for inference.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging level is missing from the docstring

Copy link
Member

@guenthermi guenthermi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor comments

"""Re-build the model based on the model inference session with ONNX.

:param artifact: Specify a finetuner run artifact. Can be a path to a local
directory, a path to a local zip file or a Hubble artifact ID. Individual
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
directory, a path to a local zip file or a Hubble artifact ID. Individual
directory, a path to a local zip file, or a Hubble artifact ID. Individual

data: DocumentArray,
batch_size: int = 32,
):
"""Process, collate and encode the `DocumentArray` with embeddings.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is "[Pre-]process"

..Note::
please install finetuner[full] to include all the dependencies.
"""
for i, batch in enumerate(data.batch(batch_size)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do you need the i

Copy link
Member

@guenthermi guenthermi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bwanglzu bwanglzu merged commit 3e84b18 into main Sep 8, 2022
@bwanglzu bwanglzu deleted the refactor-stubs-commons branch September 8, 2022 10:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants