Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/queue metrics #860

Merged
merged 18 commits into from
Jan 11, 2023
Merged

Conversation

alvarorsant
Copy link
Contributor

Fix in tests, duplicate prometheus registry error

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @alvarorsant,

Thanks for adding these changes!

Besides some naming conventions on the metrics, and some of the changes introduced in the tests, I think this looks great! We're almost there.

mlserver/parallel/dispatcher.py Show resolved Hide resolved
mlserver/parallel/dispatcher.py Outdated Show resolved Hide resolved
tests/batching/test_adaptive.py Outdated Show resolved Hide resolved
tests/metrics/test_endpoint.py Outdated Show resolved Hide resolved
tests/metrics/test_rest.py Outdated Show resolved Hide resolved
@adriangonz
Copy link
Contributor

Hey @alvarorsant ,

Just noticed some of the comments were marked as resolved, but I can't see any newer commits. Is there anything else that needs to be pushed before the next review cycle?

@alvarorsant
Copy link
Contributor Author

Hi @adriangonz,
I've pushed the changes, let's see if everything is OK.

Thanks in advance for you dedication.

Copy link
Contributor

@adriangonz adriangonz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Thanks for applying those changes and thanks a lot for your massive effort on this PR. I know there have been a few back and forths, so thanks for your patience! This is a great contribution to have in MLServer! 🚀

Once the tests and linter are green, this should be ready to go ahead. Let me know if you need any help with those. 👍

tests/batching/test_adaptive.py Outdated Show resolved Hide resolved
tests/metrics/test_endpoint.py Show resolved Hide resolved
@adriangonz
Copy link
Contributor

BTW on the CI checks, linter seems to be complaining about a few unused imports. You can run this locally by calling make lint.

flake8 .
./tests/grpc/test_servicers.py:3:1: F401 '.conftest.delete_registry' imported but unused
./tests/grpc/test_servicers.py:4:1: F401 '..metrics.conftest.prometheus_registry' imported but unused
./tests/grpc/test_servicers.py:23:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:32:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:41:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:50:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:61:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:77:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:98:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:127:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:153:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:164:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:174:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:186:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/test_servicers.py:188:5: F811 redefinition of unused 'delete_registry' from line 3
./tests/grpc/test_servicers.py:205:5: F811 redefinition of unused 'prometheus_registry' from line 4
./tests/grpc/conftest.py:17:1: F811 redefinition of unused 'CollectorRegistry' from line 7
./tests/grpc/test_model_repository.py:20:1: F401 '..metrics.conftest.prometheus_registry' imported but unused
./tests/grpc/test_model_repository.py:91:5: F811 redefinition of unused 'prometheus_registry' from line 20
./tests/grpc/test_model_repository.py:108:5: F811 redefinition of unused 'prometheus_registry' from line 20
./tests/metrics/conftest.py:4:1: F401 'prometheus_client.registry.REGISTRY' imported but unused
./tests/metrics/conftest.py:4:1: F401 'prometheus_client.registry.CollectorRegistry' imported but unused
./tests/kafka/test_server.py:10:1: F401 '..metrics.conftest.prometheus_registry' imported but unused
./tests/kafka/test_server.py:15:5: F811 redefinition of unused 'prometheus_registry' from line 10
./tests/kafka/test_server.py:37:5: F811 redefinition of unused 'prometheus_registry' from line 10
./tests/batching/test_adaptive.py:15:1: F401 'prometheus_client.registry.CollectorRegistry' imported but unused
./tests/batching/test_hooks.py:6:1: F401 'prometheus_client.registry.CollectorRegistry' imported but unused
./tests/rest/test_endpoints.py:14:1: F401 '..metrics.conftest.prometheus_registry' imported but unused
./tests/rest/test_endpoints.py:15:1: F401 '.conftest.delete_registry' imported but unused
./tests/rest/test_endpoints.py:19:21: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:26:22: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:34:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:42:25: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:54:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:71:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:90:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:112:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:122:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:134:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:146:5: F811 redefinition of unused 'delete_registry' from line 15
./tests/rest/test_endpoints.py:149:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_endpoints.py:163:5: F811 redefinition of unused 'prometheus_registry' from line 14
./tests/rest/test_custom.py:2:1: F401 '..metrics.conftest.prometheus_registry' imported but unused
./tests/rest/test_custom.py:7:5: F811 redefinition of unused 'prometheus_registry' from line 2
./tests/rest/test_custom.py:18:5: F811 redefinition of unused 'prometheus_registry' from line 2
./tests/rest/test_server.py:6:1: F401 '..metrics.conftest.prometheus_registry' imported but unused
./tests/rest/test_server.py:11:5: F811 redefinition of unused 'prometheus_registry' from line 6
./tests/rest/test_server.py:25:5: F811 redefinition of unused 'prometheus_registry' from line 6

Similarly, tests seem to fail due to a import error as well. You can kick these off locally by calling python -m pytest (or make test, but this will be a bit slower).

tests/grpc/conftest.py:22: in <module>
    from ..metrics.conftest import prometheus_registry  # noqa: F401
E   ImportError: cannot import name 'prometheus_registry' from 'tests.metrics.conftest' (/home/runner/work/MLServer/MLServer/tests/metrics/conftest.py)

@alvarorsant
Copy link
Contributor Author

how can I solve the last errors of linter?

@adriangonz
Copy link
Contributor

Hey @alvarorsant ,

It may be that you're running an older version of some of the artefacts.

Could you try to rebase from master?

@alvarorsant
Copy link
Contributor Author

Done!

@alvarorsant
Copy link
Contributor Author

If I do make lint locally, it works.
All done! ✨ 🍰 ✨
8 files reformatted, 278 files left unchanged.
flake8 .
mypy ./mlserver
Success: no issues found in 86 source files
for _runtime in ./runtimes/*;
do
mypy $_runtime || exit 1;
done
Success: no issues found in 8 source files
Success: no issues found in 21 source files
Success: no issues found in 31 source files
Success: no issues found in 7 source files
Success: no issues found in 16 source files
Success: no issues found in 6 source files
Success: no issues found in 7 source files
Success: no issues found in 7 source files
mypy ./benchmarking
Success: no issues found in 2 source files
mypy ./docs/examples
Success: no issues found in 4 source files

Check if something has changed after generation

git
--no-pager diff
--exit-code
.

@adriangonz
Copy link
Contributor

Hey @alvarorsant ,

It seems one of the rebases missed the latest auto-generated gRPC stubs from master. Not sure why they weren't showing on the PR's files diff.

I've pushed them manually to this branch now, so hopefully linter should pass in CI now.

@adriangonz
Copy link
Contributor

Hey @alvarorsant ,

It seems some of the runtime tests get stuck while running. I tried running these locally, and the errors I get look along the following lines:

======================================================= short test summary info ========================================================
ERROR runtimes/mlflow/tests/rest/test_endpoints.py::test_ping[-/health] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_endpoints.py::test_ping[file:-/ping] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_endpoints.py::test_ping[file:-/health] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_endpoints.py::test_version[] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_endpoints.py::test_version[file:] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_invalid_content_type[-] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_invalid_content_type[-None] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_invalid_content_type[-application/pdf] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_invalid_content_type[file:-] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_invalid_content_type[file:-None] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_invalid_content_type[file:-application/pdf] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_tensor[-application/json-payload0] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_tensor[-application/json-payload1] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_tensor[-application/json-payload2] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_tensor[file:-application/json-payload0] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_tensor[file:-application/json-payload1] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_tensor[file:-application/json-payload2] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_dataframe[-application/json-payload0] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_dataframe[-application/json-payload1] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_dataframe[file:-application/json-payload0] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...
ERROR runtimes/mlflow/tests/rest/test_invocations.py::test_invocations_dataframe[file:-application/json-payload1] - ValueError: Duplicated timeseries in CollectorRegistry: {'parallel_request_queue_bucket', 'parallel_request_queue_sum', 'parallel_r...

You should be able to run these locally with the command below (note that you can also do make test which will run the entire suite):

tox -e mlflow

@alvarorsant
Copy link
Contributor Author

It's already corrected

@alvarorsant
Copy link
Contributor Author

alvarorsant commented Dec 20, 2022

runtimes/mlflow/tests/rest/conftest.py:19: error: The return type of a generator function should be "Generator" or one of its supertypes [misc]
Found 1 error in 1 file (checked 16 source files) ¿?

I already used this method without a problem

@alvarorsant
Copy link
Contributor Author

Hi @adriangonz ,
please, could you check it out now?

@alvarorsant
Copy link
Contributor Author

alvarorsant commented Jan 10, 2023

Hi @adriangonz,
I consider the changes have been approved. In which release do they all come in?

@adriangonz
Copy link
Contributor

Hey @alvarorsant ,

It seems some tests have failed due to an issue that's now fixed in master. Could you rebase to ensure the latest is on the PR?

@alvarorsant
Copy link
Contributor Author

Done , I've already updated it

@adriangonz
Copy link
Contributor

adriangonz commented Jan 11, 2023

Hey @alvarorsant ,

It seems during your last merge from master, the old gRPC stubs came back again (i.e. the ones diverging from master). This is why, in general, I don't trust merges much and prefer to do rebases instead. They also tend to keep the history cleaner.

Given this PR has already been opened for a while, to keep things moving, I've pushed the new ones from master manually. Hope that's alright with you, and hope that fixes the linter.

@adriangonz
Copy link
Contributor

All seems good now! Thanks a lot for your effort on this one @alvarorsant 👍

Regarding releases, this will likely come out on the 1.3 release, which is due in march.

@adriangonz adriangonz merged commit f13a129 into SeldonIO:master Jan 11, 2023
@alvarorsant
Copy link
Contributor Author

Thanks @adriangonz

@alvarorsant alvarorsant deleted the feature/queue-metrics branch January 11, 2023 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants