Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] add FastAPI endpoint metrics to telemetry #5224

Closed
davidberenstein1957 opened this issue Jul 15, 2024 · 0 comments
Closed

[FEATURE] add FastAPI endpoint metrics to telemetry #5224

davidberenstein1957 opened this issue Jul 15, 2024 · 0 comments
Assignees
Milestone

Comments

@davidberenstein1957
Copy link
Member

davidberenstein1957 commented Jul 15, 2024

Is your feature request related to a problem? Please describe.

We don't have insights on metrics for argilla endpoints.

Describe the solution you'd like

Implements a general https://fastapi.tiangolo.com/tutorial/middleware/

Describe alternatives you've considered

NA

Additional context

https://docs.sentry.io/platforms/python/integrations/fastapi/

@davidberenstein1957 davidberenstein1957 self-assigned this Jul 15, 2024
@davidberenstein1957 davidberenstein1957 changed the title [FEATURE] add endpoint metrics to telemetry [FEATURE] add FastAPI endpoint metrics to telemetry Jul 15, 2024
@davidberenstein1957 davidberenstein1957 added this to the v2.1.0 milestone Jul 23, 2024
@davidberenstein1957 davidberenstein1957 modified the milestones: v2.1.0, v2.2.0 Aug 27, 2024
frascuchon added a commit that referenced this issue Sep 3, 2024
# Description
<!-- Please include a summary of the changes and the related issue.
Please also include relevant motivation and context. List any
dependencies that are required for this change. -->

This PR adds changes to the server telemetry to gather metrics for API
endpoint calls. This is the first iteration. Some new usage metrics can
be included.

The metrics gathered include the user ID and some system info as the
server ID (UUID generated once when starting the Argilla server)

Also, it deprecates the old telemetry KEY ("huggingface_hub includes an
helper to send telemetry data. This information helps us debug issues
and prioritize new features. Users can disable telemetry collection at
any time by setting the HF_HUB_DISABLE_TELEMETRY=1 environment variable.
Telemetry is also disabled in offline mode (i.e. when setting
HF_HUB_OFFLINE=1)."

### OUTDATED

Adds telemetry for:

- [x] users
- [x] workspaces
- [x] datasets
- [x] login users
- [x] server errors
- [x] records
- [x] responses
- [x] suggestions
- [x] metadata
- [x] vectors
- [x] deprecate old telemetry KEY ("huggingface_hub includes an helper
to send telemetry data. This information helps us debug issues and
prioritize new features. Users can disable telemetry collection at any
time by setting the HF_HUB_DISABLE_TELEMETRY=1 environment variable.
Telemetry is also disabled in offline mode (i.e. when setting
HF_HUB_OFFLINE=1)."
- [x] write documentation done int #5253 
- [x] add automatic task distribution should be done after #5136
- [x] include gradio-app/gradio#8884

General Idea:
I’ve structured data to come in through URLs/topics like
`dataset/settings/vectorsettings/create` or
`dateset/records/suggestions/read` along with some generalized metadata
per URL/topics, like `count` or `type` of suggestion or setting.

To discuss:
- What to do with `list` methods. I currently track `list-like` and send
each individual with `read`, along with a `read` with a count. I did
this because it might be interesting to get the total number of users,
workspaces etc. Should we move this over to `list` as a separate CRUD
action? Do we also want to capture each individual update
- A similar logic applies to bulk operations. `bulk_crud` as separate
CRUD actions?
- I don't track user/dataset/workspace-specific list operations, like
list_users_workspace or list_datasets_user.
- I don't track metadata and vector updates on a record level, however,
we DO keep track of operations on suggestions and responses.
- @frascuchon was there a reason to include the `header` along with
`user/login` operations? otherwise I will rewrite this a bit and include
the `user/login` as `user/read`.

Follow up
- #5224 @frascuchon_ 

Closes #5204

**Type of change**
<!-- Please delete options that are not relevant. Remember to title the
PR according to the type of change -->

- Improvement (change adding some improvement to an existing
functionality)

**How Has This Been Tested**
<!-- Please add some reference about how your feature has been tested.
-->
NA

**Checklist**
<!-- Please go over the list and make sure you've taken everything into
account -->

- I added relevant documentation
- I followed the style guidelines of this project
- I did a self-review of my code
- I made corresponding changes to the documentation
- I confirm My changes generate no new warnings
- I have added tests that prove my fix is effective or that my feature
works
- I have added relevant notes to the CHANGELOG.md file (See
https://keepachangelog.com/)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Paco Aranda <frascuchon@gmail.com>
Co-authored-by: José Francisco Calvo <jose@argilla.io>
Co-authored-by: Francisco Aranda <francis@argilla.io>
@frascuchon frascuchon modified the milestones: v2.2.0, v2.1.0 Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants