Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] add huggingface_hub.utils.send_telemetry to the argilla-server #5204

Closed
davidberenstein1957 opened this issue Jul 11, 2024 · 0 comments · Fixed by #5218
Closed
Assignees
Labels
area: architecture Indicates that an issue or pull request is related to the architecture
Milestone

Comments

@davidberenstein1957
Copy link
Member

Is your feature request related to a problem? Please describe.

We want to integrate with Hugging Face Hub telemetry.

Describe the solution you'd like

  • update the documentation
  • update env var
  • integrate default server methods

Describe alternatives you've considered

Additional context

@davidberenstein1957 davidberenstein1957 added this to the v2.0.0 milestone Jul 11, 2024
@davidberenstein1957 davidberenstein1957 self-assigned this Jul 11, 2024
@davidberenstein1957 davidberenstein1957 linked a pull request Jul 12, 2024 that will close this issue
14 tasks
@davidberenstein1957 davidberenstein1957 modified the milestones: v2.0.0, v2.1.0 Jul 22, 2024
@nataliaElv nataliaElv added the area: architecture Indicates that an issue or pull request is related to the architecture label Aug 1, 2024
frascuchon added a commit that referenced this issue Sep 3, 2024
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

This PR adds changes to the server telemetry to gather metrics for API
endpoint calls. This is the first iteration. Some new usage metrics can
be included.

The metrics gathered include the user ID and some system info as the
server ID (UUID generated once when starting the Argilla server)

Also, it deprecates the old telemetry KEY ("huggingface_hub includes an
helper to send telemetry data. This information helps us debug issues
and prioritize new features. Users can disable telemetry collection at
any time by setting the HF_HUB_DISABLE_TELEMETRY=1 environment variable.
Telemetry is also disabled in offline mode (i.e. when setting
HF_HUB_OFFLINE=1)."

### OUTDATED

Adds telemetry for:

- [x] users
- [x] workspaces
- [x] datasets
- [x] login users
- [x] server errors
- [x] records
- [x] responses
- [x] suggestions
- [x] metadata
- [x] vectors
- [x] deprecate old telemetry KEY ("huggingface_hub includes an helper
to send telemetry data. This information helps us debug issues and
prioritize new features. Users can disable telemetry collection at any
time by setting the HF_HUB_DISABLE_TELEMETRY=1 environment variable.
Telemetry is also disabled in offline mode (i.e. when setting
HF_HUB_OFFLINE=1)."
- [x] write documentation done int #5253 
- [x] add automatic task distribution should be done after #5136
- [x] include gradio-app/gradio#8884

General Idea:
I’ve structured data to come in through URLs/topics like
`dataset/settings/vectorsettings/create` or
`dateset/records/suggestions/read` along with some generalized metadata
per URL/topics, like `count` or `type` of suggestion or setting.

To discuss:
- What to do with `list` methods. I currently track `list-like` and send
each individual with `read`, along with a `read` with a count. I did
this because it might be interesting to get the total number of users,
workspaces etc. Should we move this over to `list` as a separate CRUD
action? Do we also want to capture each individual update
- A similar logic applies to bulk operations. `bulk_crud` as separate
CRUD actions?
- I don't track user/dataset/workspace-specific list operations, like
list_users_workspace or list_datasets_user.
- I don't track metadata and vector updates on a record level, however,
we DO keep track of operations on suggestions and responses.
- @frascuchon was there a reason to include the `header` along with
`user/login` operations? otherwise I will rewrite this a bit and include
the `user/login` as `user/read`.

Follow up
- #5224 @frascuchon_ 

Closes #5204

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->
NA

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Paco Aranda <frascuchon@gmail.com>
Co-authored-by: José Francisco Calvo <jose@argilla.io>
Co-authored-by: Francisco Aranda <francis@argilla.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: architecture Indicates that an issue or pull request is related to the architecture
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants