Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT] feat: simplify telemetry #5441

Closed
wants to merge 65 commits into from
Closed
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
431f296
Update `TelemetryClient` to use `huggingface_hub.utils`
davidberenstein1957 Jul 12, 2024
2d4aef4
Update `user``CRUD` telemetry tracking
davidberenstein1957 Jul 12, 2024
71cf614
Update `workspace` CRUD telemetry tracking
davidberenstein1957 Jul 12, 2024
e1924e4
Update `workspace` telemetry from `list_user_workspaces` method
davidberenstein1957 Jul 12, 2024
11a1f85
Fix arguments passed to `track_crud_workspace` in `list_user_workspaces`
davidberenstein1957 Jul 12, 2024
9adfc16
Fix `await` to `telemetry` call
davidberenstein1957 Jul 12, 2024
0109e44
Update `telemetry_client: TelemetryClient = Depends(get_telemetry_cli…
davidberenstein1957 Jul 12, 2024
038d8f6
Add telemetry methods, dataset, workspace, user, settings, records, r…
davidberenstein1957 Jul 15, 2024
dbcebdc
Add telemetry methods `fields`
davidberenstein1957 Jul 15, 2024
5581837
Add telemetry methods `metadata_properties`
davidberenstein1957 Jul 15, 2024
9d7316d
Add telemetry methods `questions`
davidberenstein1957 Jul 15, 2024
01d8af7
Add telemetry methods `records`
davidberenstein1957 Jul 15, 2024
cea525c
Add telemetry methods to `responses`
davidberenstein1957 Jul 15, 2024
aa9c6ca
Add telemetry methods to ùsers`
davidberenstein1957 Jul 15, 2024
b694ce1
Add telemetry `suggestions`
davidberenstein1957 Jul 15, 2024
ebf139e
Update `track_crud_dataset_setting` processing
davidberenstein1957 Jul 16, 2024
fc2055c
Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…
davidberenstein1957 Jul 16, 2024
77dd130
Add `enable_telemetry` check
davidberenstein1957 Jul 16, 2024
6979112
Remove `disable_send`
davidberenstein1957 Jul 16, 2024
9bffe18
Deprecate `ARGILLA_ENABLE_TELEMETRY` env var
davidberenstein1957 Jul 16, 2024
e6763cc
Update `test_telemetry`
davidberenstein1957 Jul 16, 2024
a94ab7c
Add enable telemetry to post_init
davidberenstein1957 Jul 16, 2024
a6f7c0f
Add `UUID` to `str` covnersion
davidberenstein1957 Jul 16, 2024
a70c590
Run tests with enabled telemetry
davidberenstein1957 Jul 16, 2024
d762dc3
Remove `telemetry` client
davidberenstein1957 Jul 16, 2024
4addc7b
Fix tests errors
davidberenstein1957 Jul 16, 2024
ac7601c
Update `test_telemetry` fixture
davidberenstein1957 Jul 16, 2024
c72c4b0
Update disable telemetry env var
davidberenstein1957 Jul 16, 2024
538c268
Fix tests dataset creation
davidberenstein1957 Jul 17, 2024
0b167eb
Fix failing tests due to unloaded DatabaseModels
davidberenstein1957 Jul 17, 2024
4fcbbd6
Add tests telemetry crud datasets
davidberenstein1957 Jul 17, 2024
8428939
Update tests coverage telemetry tracking
davidberenstein1957 Jul 17, 2024
daea0b2
Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…
davidberenstein1957 Jul 17, 2024
18c5d0f
Update argilla-server/src/argilla_server/settings.py
davidberenstein1957 Jul 17, 2024
0230bfa
Remove Python version from sytem info
davidberenstein1957 Jul 18, 2024
cbacf35
Update tests to also check assert call track_data
davidberenstein1957 Jul 18, 2024
2473c82
Add documentation for telemetry information (#5253)
davidberenstein1957 Jul 22, 2024
35c9e43
Add async telemetry client
davidberenstein1957 Jul 22, 2024
df9a0fc
Update `test_telemetry`
davidberenstein1957 Jul 22, 2024
c64cc29
Update list-like to basic CRUD
davidberenstein1957 Jul 23, 2024
8b0c752
Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…
davidberenstein1957 Aug 19, 2024
0ed746b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Aug 19, 2024
fef47b8
Resolved utcnow() deprecation
davidberenstein1957 Aug 19, 2024
b982497
Merge branch 'feat/5204-feature-add-huggingface_hubutilssend_telemetr…
davidberenstein1957 Aug 19, 2024
e332696
Update to always add error type to overview
davidberenstein1957 Aug 19, 2024
a1c83a1
Add dataset distribution tracking to telemetry
davidberenstein1957 Aug 19, 2024
17b0403
Revert UTC change
davidberenstein1957 Aug 19, 2024
f6e9f38
fix failing tests
davidberenstein1957 Aug 19, 2024
68d6e53
fix failing tests
davidberenstein1957 Aug 19, 2024
de181a9
Remove server id from telemetry to be more GDPR compliant
davidberenstein1957 Aug 19, 2024
b823e2e
Update tlemetry workflow
davidberenstein1957 Aug 20, 2024
bb2dfd5
Merge branch 'develop' into feat/5204-feature-add-huggingface_hubutil…
frascuchon Aug 26, 2024
cad43a1
chore: add huggingface_hub to dependencies
davidberenstein1957 Aug 26, 2024
5fd9492
Merge branch 'feat/5204-feature-add-huggingface_hubutilssend_telemetr…
davidberenstein1957 Aug 26, 2024
659d9a2
docs: update telemetry sections
davidberenstein1957 Aug 27, 2024
c725352
update: usage from record_subtopic to record_suggestions and record_r…
davidberenstein1957 Aug 27, 2024
93d46b1
refactor: introduced track_error specific method
davidberenstein1957 Aug 27, 2024
f5901b9
refactor: name search operation like "search"
davidberenstein1957 Aug 27, 2024
5540dce
feat: simplify telemetry
davidberenstein1957 Aug 29, 2024
b892cfd
fix: resolve failing tests
davidberenstein1957 Aug 29, 2024
c4085ee
feat: add record creation
davidberenstein1957 Aug 29, 2024
7117939
fix: replace startup shutdown with lifespan
davidberenstein1957 Aug 30, 2024
ab5c6ef
refactor: cleanup some code
davidberenstein1957 Aug 30, 2024
d59ff6c
fix: application lifespan
davidberenstein1957 Aug 30, 2024
020c617
Merge branch 'develop' into feat/telemetry-simplified
davidberenstein1957 Sep 4, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/argilla-frontend.deploy-environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ jobs:
ADMIN_API_KEY=${{ steps.credentials.outputs.admin }}
ANNOTATOR_PASSWORD=${{ steps.credentials.outputs.annotator }}
ANNOTATOR_API_KEY=${{ steps.credentials.outputs.annotator }}
ARGILLA_ENABLE_TELEMETRY=0
HF_HUB_DISABLE_TELEMETRY=1
API_BASE_URL=https://dev.argilla.io/

- name: Post credentials in Slack
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/argilla-server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
- 5432:5432

env:
ARGILLA_ENABLE_TELEMETRY: 0
HF_HUB_DISABLE_TELEMETRY: 1

steps:
- name: Checkout Code 🛎
Expand Down
13 changes: 8 additions & 5 deletions argilla-server/pdm.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion argilla-server/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@ dependencies = [
"typer >= 0.6.0, < 0.10.0", # spaCy only supports typer<0.10.0
"packaging>=23.2",
"psycopg2-binary>=2.9.9",
# For Telemetry
"huggingface_hub>=0.13,<1",
]

[project.optional-dependencies]
Expand Down Expand Up @@ -106,7 +108,7 @@ log_format = "%(asctime)s %(name)s %(levelname)s %(message)s"
log_date_format = "%Y-%m-%d %H:%M:%S"
log_cli = "True"
testpaths = ["tests"]
env = ["ARGILLA_ENABLE_TELEMETRY=0"]
env = ["HF_HUB_DISABLE_TELEMETRY=0"]

[tool.coverage.run]
concurrency = ["greenlet", "thread", "multiprocessing"]
Expand Down
9 changes: 8 additions & 1 deletion argilla-server/src/argilla_server/_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
from argilla_server.search_engine import get_search_engine
from argilla_server.settings import settings
from argilla_server.static_rewrite import RewriteStaticFiles
from argilla_server.telemetry import get_telemetry_client

_LOGGER = logging.getLogger("argilla")

Expand All @@ -50,7 +51,9 @@
show_telemetry_warning()
await configure_database()
await configure_search_engine()
await get_telemetry_client().track_server_startup()

Check warning on line 54 in argilla-server/src/argilla_server/_app.py

View check run for this annotation

Codecov / codecov/patch

argilla-server/src/argilla_server/_app.py#L54

Added line #L54 was not covered by tests
yield
await get_telemetry_client().track_server_shutdown()

Check warning on line 56 in argilla-server/src/argilla_server/_app.py

View check run for this annotation

Codecov / codecov/patch

argilla-server/src/argilla_server/_app.py#L56

Added line #L56 was not covered by tests


def create_server_app() -> FastAPI:
Expand Down Expand Up @@ -174,10 +177,14 @@
" https://docs.argilla.io/latest/reference/argilla-server/telemetry/\n\n"
"Telemetry is currently enabled. If you want to disable it, you can configure\n"
"the environment variable before relaunching the server:\n\n"
f'{"#set ARGILLA_ENABLE_TELEMETRY=0" if os.name == "nt" else "$>export ARGILLA_ENABLE_TELEMETRY=0"}'
f'{"#set HF_HUB_DISABLE_TELEMETRY=1" if os.name == "nt" else "$>export HF_HUB_DISABLE_TELEMETRY=1"}'
)
_LOGGER.warning(message)

message += "\n\n "
message += "#set HF_HUB_DISABLE_TELEMETRY=1" if os.name == "nt" else "$>export HF_HUB_DISABLE_TELEMETRY=1"
message += "\n"


async def _create_oauth_allowed_workspaces(db: AsyncSession):
from argilla_server.security.settings import settings as security_settings
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,14 +22,14 @@
from argilla_server.api.policies.v1 import DatasetPolicy, MetadataPropertyPolicy, authorize, is_authorized
from argilla_server.api.schemas.v1.datasets import (
Dataset as DatasetSchema,
UsersProgress,
)
from argilla_server.api.schemas.v1.datasets import (
DatasetCreate,
DatasetMetrics,
DatasetProgress,
Datasets,
DatasetUpdate,
UsersProgress,
)
from argilla_server.api.schemas.v1.fields import Field, FieldCreate, Fields
from argilla_server.api.schemas.v1.metadata_properties import (
Expand Down Expand Up @@ -73,12 +73,16 @@ async def list_current_user_datasets(
db: AsyncSession = Depends(get_async_db),
workspace_id: Optional[UUID] = None,
current_user: User = Security(auth.get_current_user),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
):
await authorize(current_user, DatasetPolicy.list(workspace_id))

if not workspace_id:
if current_user.is_owner:
dataset_list = await datasets.list_datasets(db)
await telemetry_client.track_resource_size(
crud_action="read", setting_name="dataset", count=len(dataset_list)
)
Comment on lines +83 to +85
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding the dataset count when listing the datasets for the owner is a bit of a convoluted way of computing the metric. It might make more sense to send this information somewhere else in the system. I think I should rethink these metrics and include them in a second iteration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, not sure about the need for crud_action="read" here since is no related to a resource crud action (we compute a size)

else:
await current_user.awaitable_attrs.datasets
dataset_list = current_user.datasets
Expand All @@ -90,7 +94,10 @@ async def list_current_user_datasets(

@router.get("/datasets/{dataset_id}/fields", response_model=Fields)
async def list_dataset_fields(
*, db: AsyncSession = Depends(get_async_db), dataset_id: UUID, current_user: User = Security(auth.get_current_user)
*,
db: AsyncSession = Depends(get_async_db),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
):
dataset = await Dataset.get_or_raise(db, dataset_id, options=[selectinload(Dataset.fields)])

Expand All @@ -101,7 +108,10 @@ async def list_dataset_fields(

@router.get("/datasets/{dataset_id}/vectors-settings", response_model=VectorsSettings)
async def list_dataset_vector_settings(
*, db: AsyncSession = Depends(get_async_db), dataset_id: UUID, current_user: User = Security(auth.get_current_user)
*,
db: AsyncSession = Depends(get_async_db),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
):
dataset = await Dataset.get_or_raise(db, dataset_id, options=[selectinload(Dataset.vectors_settings)])

Expand All @@ -112,7 +122,10 @@ async def list_dataset_vector_settings(

@router.get("/me/datasets/{dataset_id}/metadata-properties", response_model=MetadataProperties)
async def list_current_user_dataset_metadata_properties(
*, db: AsyncSession = Depends(get_async_db), dataset_id: UUID, current_user: User = Security(auth.get_current_user)
*,
db: AsyncSession = Depends(get_async_db),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
):
dataset = await Dataset.get_or_raise(db, dataset_id, options=[selectinload(Dataset.metadata_properties)])

Expand All @@ -127,7 +140,10 @@ async def list_current_user_dataset_metadata_properties(

@router.get("/datasets/{dataset_id}", response_model=DatasetSchema)
async def get_dataset(
*, db: AsyncSession = Depends(get_async_db), dataset_id: UUID, current_user: User = Security(auth.get_current_user)
*,
db: AsyncSession = Depends(get_async_db),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
):
dataset = await Dataset.get_or_raise(db, dataset_id)

Expand Down Expand Up @@ -184,18 +200,24 @@ async def get_dataset_users_progress(
async def create_dataset(
*,
db: AsyncSession = Depends(get_async_db),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_create: DatasetCreate,
current_user: User = Security(auth.get_current_user),
):
await authorize(current_user, DatasetPolicy.create(dataset_create.workspace_id))

return await datasets.create_dataset(db, dataset_create.dict())
dataset = await datasets.create_dataset(db, dataset_create.dict())

await telemetry_client.track_crud_dataset(crud_action="create", dataset=dataset)

return dataset


@router.post("/datasets/{dataset_id}/fields", status_code=status.HTTP_201_CREATED, response_model=Field)
async def create_dataset_field(
*,
db: AsyncSession = Depends(get_async_db),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_id: UUID,
field_create: FieldCreate,
current_user: User = Security(auth.get_current_user),
Expand All @@ -204,7 +226,11 @@ async def create_dataset_field(

await authorize(current_user, DatasetPolicy.create_field(dataset))

return await datasets.create_field(db, dataset, field_create)
field = await datasets.create_field(db, dataset, field_create)

await telemetry_client.track_crud_dataset_setting(crud_action="create", setting_name="fields", setting=field)

return field


@router.post(
Expand All @@ -214,6 +240,7 @@ async def create_dataset_metadata_property(
*,
db: AsyncSession = Depends(get_async_db),
search_engine: SearchEngine = Depends(get_search_engine),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_id: UUID,
metadata_property_create: MetadataPropertyCreate,
current_user: User = Security(auth.get_current_user),
Expand All @@ -222,7 +249,13 @@ async def create_dataset_metadata_property(

await authorize(current_user, DatasetPolicy.create_metadata_property(dataset))

return await datasets.create_metadata_property(db, search_engine, dataset, metadata_property_create)
metadata_property = await datasets.create_metadata_property(db, search_engine, dataset, metadata_property_create)

await telemetry_client.track_crud_dataset_setting(
crud_action="create", setting_name="metadata_properties", setting=metadata_property
)

return metadata_property


@router.post(
Expand All @@ -232,6 +265,7 @@ async def create_dataset_vector_settings(
*,
db: AsyncSession = Depends(get_async_db),
search_engine: SearchEngine = Depends(get_search_engine),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_id: UUID,
vector_settings_create: VectorSettingsCreate,
current_user: User = Security(auth.get_current_user),
Expand All @@ -240,15 +274,20 @@ async def create_dataset_vector_settings(

await authorize(current_user, DatasetPolicy.create_vector_settings(dataset))

return await datasets.create_vector_settings(db, search_engine, dataset, vector_settings_create)
vector_setting = await datasets.create_vector_settings(db, search_engine, dataset, vector_settings_create)

await telemetry_client.track_crud_dataset_setting(
crud_action="create", setting_name="vectors_settings", setting=vector_setting
)

return vector_setting


@router.put("/datasets/{dataset_id}/publish", response_model=DatasetSchema)
async def publish_dataset(
*,
db: AsyncSession = Depends(get_async_db),
search_engine: SearchEngine = Depends(get_search_engine),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
) -> Dataset:
Expand All @@ -265,14 +304,7 @@ async def publish_dataset(

await authorize(current_user, DatasetPolicy.publish(dataset))

dataset = await datasets.publish_dataset(db, search_engine, dataset)

telemetry_client.track_data(
action="PublishedDataset",
data={"questions": list(set([question.settings["type"] for question in dataset.questions]))},
)

return dataset
return await datasets.publish_dataset(db, search_engine, dataset)


@router.delete("/datasets/{dataset_id}", response_model=DatasetSchema)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,17 @@
from argilla_server.database import get_async_db
from argilla_server.models import Dataset, User
from argilla_server.security import auth
from argilla_server.telemetry import TelemetryClient, get_telemetry_client

router = APIRouter()


@router.get("/datasets/{dataset_id}/questions", response_model=Questions)
async def list_dataset_questions(
*, db: AsyncSession = Depends(get_async_db), dataset_id: UUID, current_user: User = Security(auth.get_current_user)
*,
db: AsyncSession = Depends(get_async_db),
dataset_id: UUID,
current_user: User = Security(auth.get_current_user),
):
dataset = await Dataset.get_or_raise(db, dataset_id, options=[selectinload(Dataset.questions)])

Expand All @@ -44,6 +48,7 @@ async def list_dataset_questions(
async def create_dataset_question(
*,
db: AsyncSession = Depends(get_async_db),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_id: UUID,
question_create: QuestionCreate,
current_user: User = Security(auth.get_current_user),
Expand All @@ -61,4 +66,8 @@ async def create_dataset_question(

await authorize(current_user, DatasetPolicy.create_question(dataset))

return await questions.create_question(db, dataset, question_create)
question = await questions.create_question(db, dataset, question_create)

await telemetry_client.track_crud_dataset_setting(crud_action="create", setting_name="questions", setting=question)

return question
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import re
from typing import Any, Dict, List, Optional, Tuple, Union
from uuid import UUID

Expand Down Expand Up @@ -46,7 +45,7 @@
)
from argilla_server.contexts import datasets, search
from argilla_server.database import get_async_db
from argilla_server.enums import RecordSortField, ResponseStatusFilter, SortOrder
from argilla_server.enums import RecordSortField, ResponseStatusFilter
from argilla_server.errors.future import MissingVectorError, NotFoundError, UnprocessableEntityError
from argilla_server.errors.future.base_errors import MISSING_VECTOR_ERROR_CODE
from argilla_server.models import Dataset, Field, Record, User, VectorSettings
Expand All @@ -58,7 +57,6 @@
get_search_engine,
)
from argilla_server.security import auth
from argilla_server.telemetry import TelemetryClient, get_telemetry_client
from argilla_server.utils import parse_query_param, parse_uuids

LIST_DATASET_RECORDS_LIMIT_DEFAULT = 50
Expand Down Expand Up @@ -305,7 +303,6 @@ async def search_current_user_dataset_records(
*,
db: AsyncSession = Depends(get_async_db),
search_engine: SearchEngine = Depends(get_search_engine),
telemetry_client: TelemetryClient = Depends(get_telemetry_client),
dataset_id: UUID,
body: SearchRecordsQuery,
include: Optional[RecordIncludeParam] = Depends(parse_record_include_param),
Expand Down
Loading