Add worker_queue_request, batch_queue and worker_processes list metrics #826

alvarorsant · 2022-11-02T07:37:29Z

Add worker_queue_request, batch_queue and worker_processes_list metrics

adriangonz

Thanks for picking back up this effort 👍

I've added a couple comments below. Overall, I feel we should keep these changes small, and focus just on tracking the requests queues.

Besides that, and the review comments, could you also add a couple tests to ensure the new metrics get exposed correctly? You can model these off the tests under tests.metrics, like the following one:

MLServer/tests/metrics/test_rest.py

Lines 19 to 45 in f5297d4

    
           async def test_rest_metrics( 
        
               metrics_client: MetricsClient, 
        
               rest_client: RESTClient, 
        
               inference_request: InferenceRequest, 
        
               sum_model: MLModel, 
        
           ): 
        
               await rest_client.wait_until_ready() 
        
               metric_name = "rest_server_requests" 
        
               # Get metrics for gRPC server before sending any requests 
        
               metrics = await metrics_client.metrics() 
        
               rest_server_requests = find_metric(metrics, metric_name) 
        
               assert rest_server_requests is None 
        
               expected_handled = 5 
        
               await asyncio.gather( 
        
                   *[ 
        
                       rest_client.infer(sum_model.name, inference_request) 
        
                       for _ in range(expected_handled) 
        
                   ] 
        
               ) 
        
               metrics = await metrics_client.metrics() 
        
               rest_server_requests = find_metric(metrics, metric_name) 
        
               assert rest_server_requests is not None 
        
               assert len(rest_server_requests.samples) == 1 
        
               assert rest_server_requests.samples[0].value == expected_handled

adriangonz · 2022-11-03T10:52:15Z

mlserver/batching/adaptive.py

-        self._batching_task = schedule_with_callback(
-            self._batcher(), self._batching_task_callback
-        )
+        self._batching_task = asyncio.create_task(self._batcher())


What was the reason to replace the schedule_with_callback call? Under the hood, schedule_with_callback will essentially call the same things (it's essentially a helper to avoid duplicating these two lines).

I didn't touch that part. It must be a matter of updating issue to the master.

Is there any chance this was a leftover from the previous PR? Either way, could you try to rebase from master to see if it goes away?

adriangonz · 2022-11-03T10:54:24Z

mlserver/parallel/dispatcher.py

+    def _workers_processes_monitor(self, loop: AbstractEventLoop):
+        process_request_count.observe(float(len(asyncio.all_tasks(loop))))


This loop would only represent the main process' loop. So it wouldn't capture the workers' queued asyncio tasks.

For this first iteration, I'd keep it simple and focus just on tracking the queue.

So I will focus on only in queue requests in this PR. I will do a couple of tests.

Regarding to the tests, I noticed you mount an aiohttp server in a fixture however I'd need to test the metrics inside the code in mlserver folder and I don't know how to fit real metrics in a mocked server.

Wouldn't it be the same as in the current metrics tests? Those spin up an actual MLServer instance, send a couple requests, and then scrape the metrics endpoint to compare the metrics before and after (and assert that they've increased).

adriangonz · 2022-12-07T09:26:46Z

Closing in behalf of #860

Add worker_queue_request, batch_queue and worker_processes list metrics

30302f6

alvarorsant mentioned this pull request Nov 2, 2022

Get metrics from the Queues in the inference in parallel/pool #728

Closed

adriangonz suggested changes Nov 3, 2022

View reviewed changes

Merge branch 'SeldonIO:master' into feature/add-queue-metrics

1926a7c

adriangonz closed this Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add worker_queue_request, batch_queue and worker_processes list metrics #826

Add worker_queue_request, batch_queue and worker_processes list metrics #826

alvarorsant commented Nov 2, 2022

adriangonz left a comment

adriangonz Nov 3, 2022

alvarorsant Nov 7, 2022

adriangonz Nov 9, 2022

adriangonz Nov 3, 2022

alvarorsant Nov 7, 2022

alvarorsant Nov 9, 2022

adriangonz Nov 9, 2022

adriangonz commented Dec 7, 2022

	async def test_rest_metrics(
	metrics_client: MetricsClient,
	rest_client: RESTClient,
	inference_request: InferenceRequest,
	sum_model: MLModel,
	):
	await rest_client.wait_until_ready()
	metric_name = "rest_server_requests"

	# Get metrics for gRPC server before sending any requests
	metrics = await metrics_client.metrics()
	rest_server_requests = find_metric(metrics, metric_name)
	assert rest_server_requests is None

	expected_handled = 5
	await asyncio.gather(
	*[
	rest_client.infer(sum_model.name, inference_request)
	for _ in range(expected_handled)
	]
	)

	metrics = await metrics_client.metrics()
	rest_server_requests = find_metric(metrics, metric_name)
	assert rest_server_requests is not None
	assert len(rest_server_requests.samples) == 1
	assert rest_server_requests.samples[0].value == expected_handled

		def _workers_processes_monitor(self, loop: AbstractEventLoop):
		process_request_count.observe(float(len(asyncio.all_tasks(loop))))

Add worker_queue_request, batch_queue and worker_processes list metrics #826

Add worker_queue_request, batch_queue and worker_processes list metrics #826

Conversation

alvarorsant commented Nov 2, 2022

adriangonz left a comment

Choose a reason for hiding this comment

adriangonz Nov 3, 2022

Choose a reason for hiding this comment

alvarorsant Nov 7, 2022

Choose a reason for hiding this comment

adriangonz Nov 9, 2022

Choose a reason for hiding this comment

adriangonz Nov 3, 2022

Choose a reason for hiding this comment

alvarorsant Nov 7, 2022

Choose a reason for hiding this comment

alvarorsant Nov 9, 2022

Choose a reason for hiding this comment

adriangonz Nov 9, 2022

Choose a reason for hiding this comment

adriangonz commented Dec 7, 2022