Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support execution of metrics on a remote host #568

Merged
merged 125 commits into from
Mar 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
125 commits
Select commit Hold shift + click to select a range
73bbc6b
adding service and remote metric
assaftibm Jan 15, 2024
d4eb457
new MetricPipeline metrics.rag.mrr
matanor Jan 18, 2024
3c7f0d2
adjust field names
matanor Jan 18, 2024
f7a5aba
update Perplexity implementation
matanor Jan 18, 2024
5118513
add metrics.rag.context_relevancy
matanor Jan 18, 2024
70d0d4c
fix init
matanor Jan 18, 2024
ce4c52f
a new tool for running metrics on a dataframe
matanor Jan 18, 2024
cc48004
adding more metrics to rag and to evaluate
assaftibm Jan 21, 2024
06ceade
fix answer relevance
assaftibm Jan 22, 2024
0745cee
fix instance score
assaftibm Jan 22, 2024
cfaafc5
update to relative imports, as needed within unitxt
matanor Jan 22, 2024
041e8da
flip the order such that the prediction (e.g. the retrieved context) …
matanor Jan 22, 2024
c43864a
add comments
matanor Jan 22, 2024
bd47f0e
rename to eval_utils.py
matanor Jan 22, 2024
7d81efc
add an import of eval_utils.py
matanor Jan 22, 2024
f005398
save reference scores in a list
matanor Jan 22, 2024
c1379d2
add expected reference_scores to perplexity.py expected outputs
matanor Jan 22, 2024
4d4281f
add context_perplexity
matanor Jan 22, 2024
6d81ee7
add context_perplexity
matanor Jan 22, 2024
9491aa7
add new evaluate_rag example
matanor Jan 22, 2024
cc4e34a
add import of eval_utils
matanor Jan 22, 2024
a7712b0
add comments explaining review questions
matanor Jan 23, 2024
7bf9206
add context_preplexity.json
matanor Jan 23, 2024
df0985a
fix context perplexity
assaftibm Jan 23, 2024
a09caf5
service
assaftibm Jan 25, 2024
d705384
merge
assaftibm Jan 25, 2024
1e4d65e
Merge branch 'main' into service
matanor Jan 25, 2024
0349281
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Jan 29, 2024
8de9fae
move RemoteMetric from src/unitxt/test_utils/metrics.py to src/unitxt…
matanor Jan 30, 2024
82b68a4
update the api of the metric service, and the result returned by the …
matanor Jan 30, 2024
a454748
modify RemoteMetric not to inherit from GlobalMetric. to avoid runnin…
matanor Jan 30, 2024
09b6f18
add log prints to metric service
matanor Feb 1, 2024
bfc6411
support artifact_identifier in Artifact objects
matanor Feb 1, 2024
3f8fb9c
support for remote metrics
matanor Feb 1, 2024
3e12cb0
reorganize code
matanor Feb 1, 2024
cd8c025
add docstrings
matanor Feb 1, 2024
d18da61
add tests for reading the remote metrics config vars from the environ…
matanor Feb 1, 2024
784be4f
update update_instance_scores() and set_global_score()
matanor Feb 1, 2024
ed18abc
add missing return statement in wrap_inner_metric_pipeline_metric()
matanor Feb 4, 2024
af1e6b8
set fixed version to metric service requirements
matanor Feb 4, 2024
5fdb92b
assume the dockerfile runs from unitxt/service/metric
matanor Feb 4, 2024
0728dfb
use same dir imports for the service code
matanor Feb 4, 2024
6da9061
add metric service related command to make file
matanor Feb 4, 2024
14bb1eb
update HF env params location
matanor Feb 4, 2024
44c0e9f
add init_logger()
matanor Feb 4, 2024
db8a4ef
support build_number and release_version in metric service image names
matanor Feb 4, 2024
4e646d1
report request handling time in INFO logging level
matanor Feb 4, 2024
bc09936
update metric service commands to accept only one param tag_name
matanor Feb 4, 2024
fade3da
remove --proxy-headers
matanor Feb 4, 2024
761bc2c
restore --proxy-headers, since removing it did not solve the authenti…
matanor Feb 4, 2024
9d1a1b7
add locking around metric computation
matanor Feb 5, 2024
c656817
run main.py from docker
matanor Feb 5, 2024
875c739
downgrade to cuda11.6.1, to support running with older cuda drivers
matanor Feb 5, 2024
d75a103
use nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
matanor Feb 5, 2024
8a0773e
update ubuntu setup
matanor Feb 5, 2024
e312e36
use 11.8 cuda in image
matanor Feb 6, 2024
623971a
move unitxt imports
matanor Feb 6, 2024
b5c4b1b
update docker using dockerfile that works for another service
matanor Feb 6, 2024
1f33658
remove unitxt from requirements.txt
matanor Feb 6, 2024
dbaca07
restore installation of requirements
matanor Feb 6, 2024
4534e74
use dockerfile from sbert service
matanor Feb 6, 2024
032bd45
restore unitxt to requirements.txt
matanor Feb 6, 2024
f2ea3d1
comment out conda install commands
matanor Feb 6, 2024
3e234f4
fix copying of code
matanor Feb 6, 2024
9af95dc
add installation of cffi to fix "pyo3_runtime.PanicException: Python …
matanor Feb 6, 2024
5644d7a
add conda install of torch 1.12.1
matanor Feb 6, 2024
18db41a
move conda install to start of script
matanor Feb 7, 2024
c88d527
add comment
matanor Feb 7, 2024
2f17333
support GPU usage in compute_batch()
matanor Feb 7, 2024
6d97979
set batch_size to 16 in BertScore
matanor Feb 7, 2024
ffb7931
use latest unitxt in metric service requirements.txt
matanor Feb 7, 2024
7d8e1f7
replace unitxt requirement installation: remove it from the requireme…
matanor Feb 7, 2024
ea3d0b4
explicitly set the device for the Reward metric
matanor Feb 8, 2024
0f6d58e
update prints
matanor Feb 8, 2024
78d0be1
update prints
matanor Feb 8, 2024
170f499
explicitly set device in SentenceBert
matanor Feb 8, 2024
f0f064c
refactor
matanor Feb 8, 2024
d8bdc2b
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Feb 8, 2024
7f42e73
update following changes in service.metrics.client_config
matanor Feb 8, 2024
4985ba7
restore version from main
matanor Feb 8, 2024
ad4b8c8
revert changes to perplexity.py
matanor Feb 8, 2024
69f1821
clean dockerfile
matanor Feb 8, 2024
1a79e6d
clean dockerfile
matanor Feb 8, 2024
47d78ee
add docstrings and comments
matanor Feb 8, 2024
38ed68e
remove use of ApplyMetric
matanor Feb 8, 2024
e428178
add docstrings and explanations
matanor Feb 8, 2024
b1e81eb
remove prints
matanor Feb 8, 2024
df833fd
RemoteMetric must have a main_score
matanor Feb 8, 2024
b9616ab
add start_metrics_http_service()
matanor Feb 8, 2024
6305e2a
add pydantic required for the metric service api
matanor Feb 8, 2024
58104f6
move metric service api from api.py into unitxt: places in metric_uti…
matanor Feb 8, 2024
3a659af
update import of RemoteMetric
matanor Feb 8, 2024
81060d8
restore init of main_score to None, otherwise the main_score is consi…
matanor Feb 8, 2024
ccc2529
add test_remote_service_with_valid_response()
matanor Feb 8, 2024
56bdd4e
move client_config.py functionality into metric_utils.py
matanor Feb 8, 2024
2a3cf13
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Feb 8, 2024
87632b2
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Feb 11, 2024
d57ed6d
remove test_service.py
matanor Feb 11, 2024
6e53912
add doc strings, type hints
matanor Feb 11, 2024
b153a14
add doc strings, small code update
matanor Feb 11, 2024
28523ca
remove type hints that cause an import error
matanor Feb 11, 2024
5af1df1
Merge branch 'main' into service
matanor Feb 11, 2024
1a845d7
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Feb 15, 2024
da473ea
remove pydantic from the service requirements, it is only used in the…
matanor Feb 15, 2024
b91056e
update the BUILD_DIR env parameter
matanor Feb 15, 2024
06d93d0
remove the pydantic dependency
matanor Feb 25, 2024
7501694
move service code into src/unitxt/service.
matanor Feb 25, 2024
da41a1b
use plain dicts for request and response
matanor Feb 25, 2024
7e26d6a
add __init__.py files to new packages
matanor Feb 25, 2024
febca2a
update to run the server module
matanor Feb 25, 2024
b0ed177
remove use of buildx (not needed)
matanor Feb 25, 2024
dba6e9e
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Feb 25, 2024
63c968a
rename metric -> metric_name
matanor Feb 26, 2024
9e5f0c5
restore usage of first_step to disable confidence interval
matanor Feb 26, 2024
c4cf224
rename metric_artifact -> metric
matanor Feb 26, 2024
7dd5f03
disable confidence interval for remote metrics
matanor Feb 26, 2024
a2c950e
add disable_confidence_interval_calculation and set_n_resamples for R…
matanor Feb 26, 2024
3963e41
fix endpoint following rename of 'metric' -> 'metric_name'
matanor Feb 26, 2024
5770822
remove get_env_variable
matanor Feb 26, 2024
933456c
add an option to start the service using a command unitxt-metrics-ser…
matanor Feb 29, 2024
0ccf9c1
add an option to start the service using a command unitxt-metrics-ser…
matanor Feb 29, 2024
a62f638
Merge remote-tracking branch 'remotes/origin/main' into service
matanor Feb 29, 2024
8eaaf52
switch to using the new settings classes
matanor Feb 29, 2024
c3ebf69
add default to remote_metrics setting
matanor Feb 29, 2024
2e4f627
Merge branch 'main' into service
matanor Mar 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -63,3 +63,21 @@ metric:
build:
format
pypi

# command: make tag_name=${TAG_NAME} metric-service-build
# example: make tag_name=unitxt-service-metric:b1v0.1 metric-service-build
# Use the unitxt dir as the build context for docker, so the entire codebase
# can be copied into the image. This way the latest code changes are intergrated into
# the image, without requiring a formal unitxt release.
metric-service-build:
cd $(DIR) && docker build --tag $(tag_name) --file $(DIR)/src/unitxt/service/metrics/Dockerfile .

# command: make tag_name=${TAG_NAME} metric-service-run-bash
# example: make tag_name=unitxt-service-metric:b1v0.1 metric-service-run-bash
metric-service-run-bash:
docker run -it $(tag_name) /bin/bash

# command: make tag_name=${TAG_NAME} metric-service-run
# example: make tag_name=unitxt-service-metric:b1v0.1 metric-service-run
metric-service-run:
docker run -p 8000:8000 --memory=20g $(tag_name)
5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -91,4 +91,7 @@ line-ending = "auto"


[tool.ruff.lint.pydocstyle]
convention = "google"
convention = "google"

[tool.ruff.flake8-bugbear]
extend-immutable-calls = ["fastapi.Depends", "fastapi.params.Depends", "fastapi.Query", "fastapi.params.Query"]
5 changes: 5 additions & 0 deletions requirements/service.rqr
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
torch==1.12.1
fastapi==0.109.0
uvicorn[standard]==0.27.0.post1
python-jose[cryptography]==3.3.0
transformers
3 changes: 2 additions & 1 deletion requirements/tests.rqr
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@ transformers
sentence_transformers
ibm-cos-sdk
opendatasets
httpretty~=1.1.4
editdistance
rouge-score
nltk
sacrebleu
scikit-learn
jiwer
jiwer
1 change: 1 addition & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@
entry_points={
"console_scripts": [
"unitxt-explore=unitxt.ui:launch",
"unitxt-metrics-service=unitxt.service.metrics.main:start_metrics_http_service",
],
},
)
32 changes: 31 additions & 1 deletion src/unitxt/eval_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@

import pandas as pd

from .artifact import verbosed_fetch_artifact
from .metric_utils import get_remote_metrics_endpoint, get_remote_metrics_names
from .operator import SequentialOperator
from .stream import MultiStream

Expand All @@ -22,9 +24,16 @@
compute_conf_intervals: Optional[bool] = False,
):
global_scores = {}
remote_metrics = get_remote_metrics_names()
for metric_name in metric_names:
multi_stream = MultiStream.from_iterables({"test": dataset}, copying=True)
metrics_operator = SequentialOperator(steps=[metric_name])
if metric_name in remote_metrics:
metric = verbosed_fetch_artifact(metric_name)
metric_step = as_remote_metric(metric)

Check warning on line 32 in src/unitxt/eval_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/eval_utils.py#L31-L32

Added lines #L31 - L32 were not covered by tests
else:
# The SequentialOperator below will handle the load of the metric fromm its name
metric_step = metric_name
metrics_operator = SequentialOperator(steps=[metric_step])

if not compute_conf_intervals:
first_step = metrics_operator.steps[0]
Expand Down Expand Up @@ -59,3 +68,24 @@
compute_conf_intervals=compute_conf_intervals,
)
return pd.DataFrame(results), pd.DataFrame(global_scores)


def as_remote_metric(metric):
"""Wrap a metric with a RemoteMetric.

Currently supported is wrapping the inner metric within a MetricPipeline.
"""
from .metrics import MetricPipeline, RemoteMetric

Check warning on line 78 in src/unitxt/eval_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/eval_utils.py#L78

Added line #L78 was not covered by tests

remote_metrics_endpoint = get_remote_metrics_endpoint()
if isinstance(metric, MetricPipeline):
metric = RemoteMetric.wrap_inner_metric_pipeline_metric(

Check warning on line 82 in src/unitxt/eval_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/eval_utils.py#L80-L82

Added lines #L80 - L82 were not covered by tests
metric_pipeline=metric,
remote_metrics_endpoint=remote_metrics_endpoint,
)
else:
raise ValueError(

Check warning on line 87 in src/unitxt/eval_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/eval_utils.py#L87

Added line #L87 was not covered by tests
f"Unexpected remote metric type {type(metric)} for the metric named '{metric.artifact_identifier}'. "
f"Remotely executed metrics should be MetricPipeline objects."
)
return metric

Check warning on line 91 in src/unitxt/eval_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/eval_utils.py#L91

Added line #L91 was not covered by tests
94 changes: 93 additions & 1 deletion src/unitxt/metric_utils.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from typing import Iterable, List
import json
from typing import Any, Dict, Iterable, List, Optional

from datasets import Features, Value

from .dataclass import Dataclass
from .operator import (
MultiStreamOperator,
SequentialOperatorInitilizer,
Expand All @@ -17,6 +19,7 @@
)
from .register import _reset_env_local_catalogs, register_all_artifacts
from .schema import UNITXT_DATASET_SCHEMA
from .settings_utils import get_settings
from .stream import MultiStream, Stream


Expand Down Expand Up @@ -140,3 +143,92 @@

stream = multi_stream[split_name]
return list(stream)


"""
The API of a metric service:
- MetricRequest: A single input request to the metrics service.
- MetricResponse: A response returned from a metrics service.
"""


class InstanceInput(Dataclass):
"""A single instance inputted to a metric service."""

prediction: Any
references: List[Any]
additional_inputs: Optional[Dict] = None


class MetricRequest(Dataclass):
"""A request to a metrics service, includes a list of input instances."""

instance_inputs: List[InstanceInput]


class MetricResponse(Dataclass):
"""A response produced by a metrics service, includes the computed scores."""

# A list of instance score dictionaries. Each dictionary contains the
# score names and score values for a single instance.
instances_scores: List[Dict[str, Any]]
# The global scores dictionary, containing global score names and values.
# These are scores computed over the entire set of input instances, e.g.
# an average over a score computed per instance.
global_score: Dict[str, Any]


"""
Functionality for loading the remote metrics configuration from local environment variables.
"""

# A list of metrics to be executed remotely.
# For example: '["metrics.rag.context_relevance","metrics.rag.bert_k_precision"]'
# This value should be a valid json list
UNITXT_REMOTE_METRICS = "UNITXT_REMOTE_METRICS"

# The remote endpoint on which the remote metrics are available.
# For example, 'http://127.0.0.1:8000/compute'
UNITXT_REMOTE_METRICS_ENDPOINT = "UNITXT_REMOTE_METRICS_ENDPOINT"


def get_remote_metrics_names() -> List[str]:
"""Load the remote metrics names from an environment variable.

Returns:
List[str] - names of metrics to be executed remotely.
"""
settings = get_settings()
remote_metrics = settings.remote_metrics
if remote_metrics:
remote_metrics = json.loads(remote_metrics)

Check warning on line 204 in src/unitxt/metric_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/metric_utils.py#L204

Added line #L204 was not covered by tests
if not isinstance(remote_metrics, list):
raise RuntimeError(

Check warning on line 206 in src/unitxt/metric_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/metric_utils.py#L206

Added line #L206 was not covered by tests
f"Unexpected value {remote_metrics} for the '{UNITXT_REMOTE_METRICS}' environment variable. "
f"The value is expected to be a list of metric names in json format."
)
for remote_metric in remote_metrics:
if not isinstance(remote_metric, str):
raise RuntimeError(

Check warning on line 212 in src/unitxt/metric_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/metric_utils.py#L211-L212

Added lines #L211 - L212 were not covered by tests
f"Unexpected value {remote_metric} within the '{UNITXT_REMOTE_METRICS}' environment variable. "
f"The value is expected to be a string but its type is {type(remote_metric)}."
)
return remote_metrics


def get_remote_metrics_endpoint() -> str:
"""Load the remote metrics endpoint from an environment variable.

Returns:
str - The remote endpoint on which the remote metrics are available.
"""
settings = get_settings()
try:
remote_metrics_endpoint = settings.remote_metrics_endpoint
except AttributeError as e:
raise RuntimeError(

Check warning on line 229 in src/unitxt/metric_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/metric_utils.py#L225-L229

Added lines #L225 - L229 were not covered by tests
f"Unexpected None value for '{UNITXT_REMOTE_METRICS_ENDPOINT}'. "
f"Running remote metrics requires defining an "
f"endpoint in the environment variable '{UNITXT_REMOTE_METRICS_ENDPOINT}'."
) from e
return remote_metrics_endpoint

Check warning on line 234 in src/unitxt/metric_utils.py

View check run for this annotation

Codecov / codecov/patch

src/unitxt/metric_utils.py#L234

Added line #L234 was not covered by tests
Loading
Loading