[DRAFT] feat: simplify telemetry #5441

davidberenstein1957 · 2024-08-29T11:24:02Z

Description

Closes NA

…ent),` instead of direct `_TELEMETRY_CLIENT` import

…ecord_subtopic

…ssend_telemetry-to-the-argilla-server

rafactor: add "me" to user operations refactor: add "list" to like-like operations

feat: add dataset size tracking

codecov · 2024-08-29T14:55:16Z

Codecov Report

Attention: Patch coverage is 92.20779% with 6 lines in your changes missing coverage. Please review.

Project coverage is 91.30%. Comparing base (62b1c12) to head (d59ff6c).
Report is 93 commits behind head on feat/5204-feature-add-huggingface_hubutilssend_telemetry-to-the-argilla-server.

Files with missing lines	Patch %	Lines
argilla-server/src/argilla_server/telemetry.py	89.47%	4 Missing ⚠️
argilla-server/src/argilla_server/_app.py	33.33%	2 Missing ⚠️

Additional details and impacted files

@@                                                Coverage Diff                                                 @@
##           feat/5204-feature-add-huggingface_hubutilssend_telemetry-to-the-argilla-server    #5441      +/-   ##
==================================================================================================================
+ Coverage                                                                           91.13%   91.30%   +0.17%     
==================================================================================================================
  Files                                                                                 139      139              
  Lines                                                                                5716     5774      +58     
==================================================================================================================
+ Hits                                                                                 5209     5272      +63     
+ Misses                                                                                507      502       -5

Flag	Coverage Δ
argilla-server	`91.30% <92.20%> (+0.17%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

argilla-server/src/argilla_server/_app.py

argilla-server/src/argilla_server/telemetry.py

frascuchon · 2024-09-02T08:32:38Z

argilla-server/src/argilla_server/telemetry.py

        if dataset:
-            for attr in attributes:
+            for attr in _RELEVANT_ATTRIBUTES + ["distribution"]:


The track_crud_dataset method is used for dataset creation. In that case, no fields, questions, or other setting relationships will be provided, so this code is not necessary. Also, the distribution is not a relationship but a column, so the is_relationship_loaded is not needed. You could track the distribution by accessing it directly. I think this simplifies the code readability

frascuchon · 2024-09-02T08:33:13Z

argilla-server/src/argilla_server/telemetry.py

+            for attr in _RELEVANT_ATTRIBUTES:
+                if dataset.is_relationship_loaded(attr):
+                    obtained_attr_list = getattr(dataset, attr)
+                    await self.track_resource_size(
+                        crud_action=crud_action, setting_name=f"dataset/{attr}", count=len(obtained_attr_list)
+                    )
+


This info won't be available on dataset creation, so I would say to remove it.

frascuchon · 2024-09-02T08:34:21Z

argilla-server/src/argilla_server/telemetry.py

+        This method is used to track the creation, update, and deletion of dataset settings. These
+        settings include fields, questions, vectors settings, and metadata properties.
+
+        Args:


I think we use Parameters format instead.

frascuchon · 2024-09-02T08:43:29Z

argilla-server/src/argilla_server/api/handlers/v1/datasets/datasets.py

+            await telemetry_client.track_resource_size(
+                crud_action="read", setting_name="dataset", count=len(dataset_list)
+            )


I think adding the dataset count when listing the datasets for the owner is a bit of a convoluted way of computing the metric. It might make more sense to send this information somewhere else in the system. I think I should rethink these metrics and include them in a second iteration.

Also, not sure about the need for crud_action="read" here since is no related to a resource crud action (we compute a size)

frascuchon · 2024-09-02T08:47:50Z

argilla-server/src/argilla_server/api/handlers/v1/users.py

-    await telemetry_client.track_crud_user(action="list", user=None, is_oauth=False, count=len(users))
-    for user in users:
-        await telemetry_client.track_crud_user(action="read", user=user, is_oauth=False)
+    await telemetry_client.track_resource_size(crud_action="read", setting_name="user", count=len(users))


Similarly, computing this metric here can be not easy to understand later. Better if we reduce the number of metrics for the first iteration an think about these stats for a second iteration

frascuchon · 2024-09-02T08:48:37Z

argilla-server/src/argilla_server/api/handlers/v1/users.py

 ):
    await authorize(current_user, UserPolicy.list_workspaces)

    user = await User.get_or_raise(db, user_id)

    if user.is_owner:
        workspaces = await accounts.list_workspaces(db)
+        await telemetry_client.track_resource_size(crud_action="read", setting_name="workspace", count=len(workspaces))


same comments here.

frascuchon · 2024-09-02T08:58:29Z

argilla-server/src/argilla_server/telemetry.py

+            return {"type": "vector"}
+        elif isinstance(setting, dict):
+            return {"type": f"distribution_{setting['strategy']}_{setting['min_submitted']}"}
+        raise NotImplementedError("Expected a setting to be processed.")


Having this here is a bit scary. We don't want to break endpoint functionally because the telemetry module cannot extract info. Also, in general, we must be sure to not propagate errors when telemetry info.

frascuchon · 2024-09-02T09:01:05Z

argilla-server/src/argilla_server/telemetry.py

+        if crud_action:
+            user_agent["request.endpoint.method"] = crud_action


what's the purpose of this attribute? We only are tracking "create" actions, and my previous PR is already tracking endpoint method info.

frascuchon · 2024-09-02T09:03:39Z

argilla-server/src/argilla_server/telemetry.py


-    async def track_data(self, topic: str, user_agent: dict, include_system_info: bool = True, count: int = 1):
-        library_name = "argilla"
+    async def track_data(self, topic: str, count: int = None, crud_action: str = None, user_agent: dict = None) -> None:


Since the track_data is the base method to send telemetry info, I would say to simplify the signature and move the count and the crud_action as part of the data. Also, the name of the user agent reveals implementation details of how the data is sent and is not aligned with the method name.

async def track_data (self, topic, data:dict) ...

frascuchon · 2024-09-02T09:09:44Z

argilla-server/src/argilla_server/telemetry.py

+                    await self.track_resource_size(
+                        crud_action=crud_action, setting_name=f"dataset/{attr}", count=len(obtained_attr_list)
+                    )
+
    async def track_crud_dataset_setting(


Even if the telemetry library won't send data if disabled, we should prevent making unnecessary calls and data processing when telemetry is disabled. Also, if there is an error in our code, users have no way to disable the whole telemetry tracking.

# Description  This PR adds the track startup method defined in #5441 and include perstitent_storaged_enbled info as part of the system info **Type of change**  - Improvement (change adding some improvement to an existing functionality) **How Has This Been Tested**  **Checklist**  - I added relevant documentation - I followed the style guidelines of this project - I did a self-review of my code - I made corresponding changes to the documentation - I confirm My changes generate no new warnings - I have added tests that prove my fix is effective or that my feature works - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

davidberenstein1957 added 30 commits July 12, 2024 14:19

Update TelemetryClient to use huggingface_hub.utils

431f296

Update userCRUD telemetry tracking

2d4aef4

Update workspace CRUD telemetry tracking

71cf614

Update workspace telemetry from list_user_workspaces method

e1924e4

Fix arguments passed to track_crud_workspace in list_user_workspaces

11a1f85

Fix await to telemetry call

9adfc16

Update `telemetry_client: TelemetryClient = Depends(get_telemetry_cli…

0109e44

…ent),` instead of direct `_TELEMETRY_CLIENT` import

Add telemetry methods, dataset, workspace, user, settings, records, r…

038d8f6

…ecord_subtopic

Add telemetry methods fields

dbcebdc

Add telemetry methods metadata_properties

5581837

Add telemetry methods questions

9d7316d

Add telemetry methods records

01d8af7

Add telemetry methods to responses

cea525c

Add telemetry methods to ùsers`

aa9c6ca

Add telemetry suggestions

b694ce1

Update track_crud_dataset_setting processing

ebf139e

Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…

fc2055c

…ssend_telemetry-to-the-argilla-server

Add enable_telemetry check

77dd130

Remove disable_send

6979112

Deprecate ARGILLA_ENABLE_TELEMETRY env var

9bffe18

Update test_telemetry

e6763cc

Add enable telemetry to post_init

a94ab7c

Add UUID to str covnersion

a6f7c0f

Run tests with enabled telemetry

a70c590

Remove telemetry client

d762dc3

Fix tests errors

4addc7b

Update test_telemetry fixture

ac7601c

Update disable telemetry env var

c72c4b0

Fix tests dataset creation

538c268

Fix failing tests due to unloaded DatabaseModels

0b167eb

davidberenstein1957 added 5 commits August 27, 2024 12:15

refactor: introduced track_error specific method

93d46b1

refactor: name search operation like "search"

f5901b9

rafactor: add "me" to user operations refactor: add "list" to like-like operations

feat: simplify telemetry

5540dce

fix: resolve failing tests

b892cfd

feat: add record creation

c4085ee

feat: add dataset size tracking

davidberenstein1957 marked this pull request as ready for review August 29, 2024 17:42

davidberenstein1957 requested a review from frascuchon August 29, 2024 17:43

frascuchon reviewed Aug 30, 2024

View reviewed changes

argilla-server/src/argilla_server/_app.py Outdated Show resolved Hide resolved

frascuchon reviewed Aug 30, 2024

View reviewed changes

argilla-server/src/argilla_server/telemetry.py Outdated Show resolved Hide resolved

davidberenstein1957 added 3 commits August 30, 2024 15:09

fix: replace startup shutdown with lifespan

7117939

refactor: cleanup some code

ab5c6ef

fix: application lifespan

d59ff6c

frascuchon reviewed Sep 2, 2024

View reviewed changes

frascuchon mentioned this pull request Sep 2, 2024

[FEAT] argilla server: track servert startup #5447

Merged

frascuchon added this to the v2.2.0 milestone Sep 3, 2024

Base automatically changed from feat/5204-feature-add-huggingface_hubutilssend_telemetry-to-the-argilla-server to develop September 3, 2024 09:13

Merge branch 'develop' into feat/telemetry-simplified

020c617

frascuchon changed the title ~~feat: simplify telemetry~~ [DRAFT] feat: simplify telemetry Sep 9, 2024

davidberenstein1957 closed this Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DRAFT] feat: simplify telemetry #5441

[DRAFT] feat: simplify telemetry #5441

davidberenstein1957 commented Aug 29, 2024

codecov bot commented Aug 29, 2024 •

edited

Loading

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

frascuchon Sep 2, 2024

		if crud_action:
		user_agent["request.endpoint.method"] = crud_action

[DRAFT] feat: simplify telemetry #5441

[DRAFT] feat: simplify telemetry #5441

Conversation

davidberenstein1957 commented Aug 29, 2024

Description

codecov bot commented Aug 29, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Aug 29, 2024 •

edited

Loading