[FEAT] Add integration with `huggingface_hub.utils.telemetry` #5218

davidberenstein1957 · 2024-07-12T12:55:41Z

Description

This PR adds changes to the server telemetry to gather metrics for API endpoint calls. This is the first iteration. Some new usage metrics can be included.

The metrics gathered include the user ID and some system info as the server ID (UUID generated once when starting the Argilla server)

Also, it deprecates the old telemetry KEY ("huggingface_hub includes an helper to send telemetry data. This information helps us debug issues and prioritize new features. Users can disable telemetry collection at any time by setting the HF_HUB_DISABLE_TELEMETRY=1 environment variable. Telemetry is also disabled in offline mode (i.e. when setting HF_HUB_OFFLINE=1)."

OUTDATED

Adds telemetry for:

General Idea:
I’ve structured data to come in through URLs/topics like dataset/settings/vectorsettings/create or dateset/records/suggestions/read along with some generalized metadata per URL/topics, like count or type of suggestion or setting.

To discuss:

What to do with list methods. I currently track list-like and send each individual with read, along with a read with a count. I did this because it might be interesting to get the total number of users, workspaces etc. Should we move this over to list as a separate CRUD action? Do we also want to capture each individual update
A similar logic applies to bulk operations. bulk_crud as separate CRUD actions?
I don't track user/dataset/workspace-specific list operations, like list_users_workspace or list_datasets_user.
I don't track metadata and vector updates on a record level, however, we DO keep track of operations on suggestions and responses.
@frascuchon was there a reason to include the header along with user/login operations? otherwise I will rewrite this a bit and include the user/login as user/read.

Follow up

[FEATURE] add FastAPI endpoint metrics to telemetry #5224 @frascuchon_

Closes #5204

Type of change

Improvement (change adding some improvement to an existing functionality)

How Has This Been Tested

NA

Checklist

I added relevant documentation
I followed the style guidelines of this project
I did a self-review of my code
I made corresponding changes to the documentation
I confirm My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

davidberenstein1957 · 2024-07-12T12:58:25Z

@dvsrepo added some initial work, still in progress but added you more so you could keep track.

argilla-server/src/argilla_server/telemetry.py

…ent),` instead of direct `_TELEMETRY_CLIENT` import

…ecord_subtopic

argilla-server/src/argilla_server/telemetry.py

dvsrepo · 2024-07-15T13:05:11Z

Looks good, just left two small comments!

…ssend_telemetry-to-the-argilla-server

argilla-server/src/argilla_server/api/handlers/v1/records.py

frascuchon · 2024-08-27T09:28:52Z

argilla-server/src/argilla_server/api/handlers/v1/workspaces.py

+    for workspace in workspaces:
+        await telemetry_client.track_crud_workspace(action="read", workspace=workspace)
+    await telemetry_client.track_crud_workspace(action="read", workspace=None, count=len(workspaces))


I don't understand these lines. For other list endpoints we just track the resource count, but here we track also the workspaces individually. What's the motivation?

@frascuchon It is to keep track of list like operations but not differentiate too much in the naming CRUD. If you think it is interesting for development, I will create a separate "list" action. Otherwise, I will leave it. I think the individual calls and list-like I forgot unintentionally.

I reviewed the code and see I did keep track of the list-like thing in other places too.

argilla-server/src/argilla_server/errors/error_handler.py

argilla/docs/reference/telemetry.md

…esponses

.github/workflows/argilla-frontend.deploy-environment.yml

argilla-server/pyproject.toml

jfcalvo · 2024-08-27T09:54:18Z

argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py

+    for field in dataset.fields:
+        await telemetry_client.track_crud_dataset_setting(
+            action="read", dataset=dataset, setting_name="fields", setting=field
+        )


Do we know how these telemetry requests are done? Are they synchronous? UDP?

We should check that we are not spending a lot of time executing these requests so we are not adding additional time to the API endpoint requests.

@jfcalvo underlying code can be found here. https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/utils/_telemetry.py

jfcalvo · 2024-08-27T10:26:25Z

argilla-server/src/argilla_server/api/handlers/v1/records.py

+    action = "create"
    if await Suggestion.get_by(db, record_id=record_id, question_id=suggestion_create.question_id):
        response.status_code = status.HTTP_200_OK
+        action = "update"


Maybe we can have an upsert action so you don't need to add logic trying to know if it's a create or an update?

We can also rename it to "upsert" but I wanted to avoid capturing all edge cases "publish", "search", "read", "list" "upsert" "update" because I thought it might be a bit much for metrics/telemetry. @frascuchon @jfcalvo if you feel it would help development, I can make a finer distinguishment.

rafactor: add "me" to user operations refactor: add "list" to like-like operations

argilla-server/src/argilla_server/telemetry.py

frascuchon · 2024-08-27T11:15:53Z

argilla-server/src/argilla_server/telemetry.py

-            context = self._system_info.copy()
+            user_agent.update(self._system_info)
+        if count is not None:
+            user_agent["count"] = count


what's the meaning of count and what is used for?

Count is used for list operations

Why not just send it as part of the data content for those actions measuring list?

# Description  This PR adds a middleware component to track the API endpoint's usage. **Type of change**  - New feature (non-breaking change which adds functionality) - Refactor (change restructuring the codebase without changing functionality) - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested**  **Checklist**  - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/) --------- Co-authored-by: José Francisco Calvo <jose@argilla.io>

…ssend_telemetry-to-the-argilla-server

argilla/mkdocs.yml

argilla/docs/reference/argilla-server/telemetry.md

…5445) # Description  This PR restores the server_id for telemetry purposes and also add the user.id and user.role when tracking API requests. **Type of change**  - Improvement (change adding some improvement to an existing functionality) - Documentation update **How Has This Been Tested**  **Checklist**  - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

# Description  This PR adds the track startup method defined in #5441 and include perstitent_storaged_enbled info as part of the system info **Type of change**  - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested**  **Checklist**  - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

davidberenstein1957 added 6 commits July 12, 2024 14:19

Update TelemetryClient to use huggingface_hub.utils

431f296

Update userCRUD telemetry tracking

2d4aef4

Update workspace CRUD telemetry tracking

71cf614

Update workspace telemetry from list_user_workspaces method

e1924e4

Fix arguments passed to track_crud_workspace in list_user_workspaces

11a1f85

Fix await to telemetry call

9adfc16

davidberenstein1957 linked an issue Jul 12, 2024 that may be closed by this pull request

[FEATURE] add huggingface_hub.utils.send_telemetry to the argilla-server #5204

Closed

davidberenstein1957 requested a review from dvsrepo July 12, 2024 12:57

davidberenstein1957 commented Jul 12, 2024

View reviewed changes

argilla-server/src/argilla_server/telemetry.py Outdated Show resolved Hide resolved

davidberenstein1957 commented Jul 12, 2024

View reviewed changes

argilla-server/src/argilla_server/telemetry.py Show resolved Hide resolved

davidberenstein1957 commented Jul 12, 2024

View reviewed changes

argilla-server/src/argilla_server/telemetry.py Outdated Show resolved Hide resolved

davidberenstein1957 added 9 commits July 12, 2024 16:40

Update `telemetry_client: TelemetryClient = Depends(get_telemetry_cli…

0109e44

…ent),` instead of direct `_TELEMETRY_CLIENT` import

Add telemetry methods, dataset, workspace, user, settings, records, r…

038d8f6

…ecord_subtopic

Add telemetry methods fields

dbcebdc

Add telemetry methods metadata_properties

5581837

Add telemetry methods questions

9d7316d

Add telemetry methods records

01d8af7

Add telemetry methods to responses

cea525c

Add telemetry methods to ùsers`

aa9c6ca

Add telemetry suggestions

b694ce1

dvsrepo reviewed Jul 15, 2024

View reviewed changes

argilla-server/src/argilla_server/telemetry.py Outdated Show resolved Hide resolved

davidberenstein1957 added 7 commits July 16, 2024 10:57

Update track_crud_dataset_setting processing

ebf139e

Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…

fc2055c

…ssend_telemetry-to-the-argilla-server

Add enable_telemetry check

77dd130

Remove disable_send

6979112

Deprecate ARGILLA_ENABLE_TELEMETRY env var

9bffe18

Update test_telemetry

e6763cc

Add enable telemetry to post_init

a94ab7c

frascuchon reviewed Aug 27, 2024

View reviewed changes

argilla-server/src/argilla_server/api/handlers/v1/records.py Outdated Show resolved Hide resolved

frascuchon reviewed Aug 27, 2024

View reviewed changes

argilla-server/src/argilla_server/errors/error_handler.py Outdated Show resolved Hide resolved

frascuchon reviewed Aug 27, 2024

View reviewed changes

argilla/docs/reference/telemetry.md Outdated Show resolved Hide resolved

frascuchon requested a review from jfcalvo August 27, 2024 09:37

davidberenstein1957 added 2 commits August 27, 2024 11:43

docs: update telemetry sections

659d9a2

update: usage from record_subtopic to record_suggestions and record_r…

c725352

…esponses

jfcalvo reviewed Aug 27, 2024

View reviewed changes

refactor: introduced track_error specific method

93d46b1

jfcalvo reviewed Aug 27, 2024

View reviewed changes

refactor: name search operation like "search"

f5901b9

rafactor: add "me" to user operations refactor: add "list" to like-like operations

frascuchon reviewed Aug 27, 2024

View reviewed changes

argilla-server/src/argilla_server/telemetry.py Outdated Show resolved Hide resolved

frascuchon reviewed Aug 27, 2024

View reviewed changes

frascuchon and others added 3 commits September 2, 2024 11:46

Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…

43ef896

…ssend_telemetry-to-the-argilla-server

chore: Remove all non-general endpoint telemetry related-code

8ce3621

frascuchon reviewed Sep 2, 2024

View reviewed changes

argilla/mkdocs.yml Outdated Show resolved Hide resolved

Update argilla/mkdocs.yml

c7c22f8

frascuchon reviewed Sep 2, 2024

View reviewed changes

argilla/docs/reference/argilla-server/telemetry.md Outdated Show resolved Hide resolved

frascuchon and others added 7 commits September 2, 2024 14:13

chore: Revert doc change

32f3baa

chore: revert doc changes

0c9c608

chore: Remove unused attribute

00b6caa

chore: Align the user.id registration

d93c27b

chore: review docs

cd45e6b

frascuchon changed the title ~~Add huggingface_hub.utils.telemetry~~ [FEAT] Add integration with huggingface_hub.utils.telemetry Sep 3, 2024

chore: Update CHANGELOG

0c51124

frascuchon merged commit ebd1b0f into develop Sep 3, 2024
12 checks passed

frascuchon deleted the feat/5204-feature-add-huggingface_hubutilssend_telemetry-to-the-argilla-server branch September 3, 2024 09:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Add integration with `huggingface_hub.utils.telemetry` #5218

[FEAT] Add integration with `huggingface_hub.utils.telemetry` #5218

davidberenstein1957 commented Jul 12, 2024 •

edited by frascuchon

Loading

davidberenstein1957 commented Jul 12, 2024 •

edited

Loading

dvsrepo commented Jul 15, 2024

frascuchon Aug 27, 2024

davidberenstein1957 Aug 27, 2024 •

edited

Loading

davidberenstein1957 Aug 27, 2024

jfcalvo Aug 27, 2024

davidberenstein1957 Aug 27, 2024

jfcalvo Aug 27, 2024

davidberenstein1957 Aug 27, 2024

frascuchon Aug 27, 2024

davidberenstein1957 Aug 27, 2024

frascuchon Sep 2, 2024

[FEAT] Add integration with huggingface_hub.utils.telemetry #5218

[FEAT] Add integration with huggingface_hub.utils.telemetry #5218

Conversation

davidberenstein1957 commented Jul 12, 2024 • edited by frascuchon Loading

Description

OUTDATED

davidberenstein1957 commented Jul 12, 2024 • edited Loading

dvsrepo commented Jul 15, 2024

Choose a reason for hiding this comment

davidberenstein1957 Aug 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[FEAT] Add integration with `huggingface_hub.utils.telemetry` #5218

[FEAT] Add integration with `huggingface_hub.utils.telemetry` #5218

davidberenstein1957 commented Jul 12, 2024 •

edited by frascuchon

Loading

davidberenstein1957 commented Jul 12, 2024 •

edited

Loading

davidberenstein1957 Aug 27, 2024 •

edited

Loading