Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(DENG-4577): add monitoring bigeye_usage view to allow us to get more insight into bq usage and costs from bigeye #6260

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kik-kik
Copy link
Contributor

@kik-kik kik-kik commented Sep 25, 2024

feat(DENG-4577): add monitoring bigeye_usage view to allow us to get more insight into bq usage and costs from bigeye

Description

This aims to provide a data structure to enable us to build a dashboard in Looker to allow us to monitor BigEye related spending and usage in BigQuery.

┆Issue is synchronized with this Jira Task

@kik-kik kik-kik requested review from Marlene-M-Hirose and wwyc and removed request for Marlene-M-Hirose September 25, 2024 13:51
@@ -48,6 +48,7 @@ jobs_by_project AS (
user_email,
REGEXP_EXTRACT(query, r'Username: (.*?),') AS username,
REGEXP_EXTRACT(query, r'Query ID: (\w+), ') AS query_id,
REGEXP_EXTRACT(query, r'\'metric_id:(\d+)\'') AS bigeye_metric_id,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Marlene-M-Hirose @wwyc was there a specific reason why we didn't just pass the query through here? There is another bit of information I could extract here for BigEye, but wasn't sure if I should be polluting this query with BigEye specific details? If I had query available here I could just do it downstream in my bigeye monitoring view.

Copy link
Contributor

@Marlene-M-Hirose Marlene-M-Hirose Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kik-kik , I do not think adding this specific information here is good. This table is supposed to be focused on BigQuery usage, not for monitoring bigeye. I would prefer to keep bigeye separate from this table.

We didn't add the query because we didn't need the query. We only needed the query_id and the username from the query.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot
Copy link

Integration report for "feat: add extract of bigeye table id checked in the bq information_schema"

sql.diff

Click to expand!
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring: bigeye_usage
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/metadata.yaml	2024-09-25 15:20:14.000000000 +0000
@@ -0,0 +1,13 @@
+friendly_name: Bigeye Usage
+description: |-
+  Please provide a description for the query
+owners: []
+labels: {}
+bigquery: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+references:
+  view.sql:
+  - moz-fx-data-shared-prod.monitoring.bigquery_usage
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/view.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigeye_usage/view.sql	2024-09-25 15:17:39.000000000 +0000
@@ -0,0 +1,21 @@
+SELECT
+  submission_date,
+  reference_project_id,
+  reference_dataset_id,
+  reference_table_id,
+  creation_date,
+  task_duration,
+  total_terabytes_processed,
+  total_terabytes_billed,
+  total_slot_ms,
+  cost,
+  job_id,
+  user_email AS service_account,
+  bigeye_metric_id,
+  bigeye_monitored_table_id,
+FROM
+  `moz-fx-data-shared-prod.monitoring.bigquery_usage`
+WHERE
+  DATE(submission_date) = "2024-09-24"
+  AND user_type = "bigeye"
+  AND total_slot_ms IS NOT NULL
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigquery_usage/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigquery_usage/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigquery_usage/view.sql	2024-09-25 15:17:42.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring/bigquery_usage/view.sql	2024-09-25 15:17:39.000000000 +0000
@@ -18,6 +18,8 @@
       THEN "search-terms"
     WHEN user_email LIKE "%airflow%"
       THEN "airflow"
+    WHEN user_email LIKE "%bigeye%"
+      THEN "bigeye"
     WHEN ENDS_WITH(user_email, "mozilla.com")
       THEN "individual"
     WHEN ENDS_WITH(user_email, "mozillafoundation.org")
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql	2024-09-25 15:17:42.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/query.sql	2024-09-25 15:21:33.000000000 +0000
@@ -48,6 +48,12 @@
       user_email,
       REGEXP_EXTRACT(query, r'Username: (.*?),') AS username,
       REGEXP_EXTRACT(query, r'Query ID: (\w+), ') AS query_id,
+      REGEXP_EXTRACT(query, r'\'metric_id:(\d+)\'') AS bigeye_metric_id,
+      REPLACE(
+        REGEXP_EXTRACT(LOWER(query), r'`__grain_start__` from `([\w\d_\-\.`]+) as'),
+        "`",
+        ""
+      ) AS bigeye_monitored_table_id,
     FROM
       `{{project}}.region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` AS jp
     LEFT JOIN
@@ -103,6 +109,8 @@
   jo.error_reason,
   jo.error_message,
   jo.resource_warning,
+  jo.bigeye_metric_id,
+  jo.bigeye_monitored_table_id,
   @submission_date AS submission_date,
 FROM
   jobs_by_org AS jo
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/schema.yaml	2024-09-25 15:17:42.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/monitoring_derived/bigquery_usage_v2/schema.yaml	2024-09-25 15:17:39.000000000 +0000
@@ -126,6 +126,16 @@
   description: The warning message that appears if the resource usage is above the internal threshold of the system
 
 - mode: NULLABLE
+  name: bigeye_metric_id
+  type: STRING
+  description: Metric ID corresponding to the metric executed by BigEye (Data Observability Platform).
+
+- mode: NULLABLE
+  name: bigeye_monitored_table_id
+  type: STRING
+  description: Fully qualified table id of the table BigEye run a monitor against (volume and freshness).
+
+- mode: NULLABLE
   name: submission_date
   type: DATE
   description: Date Airflow DAG is executed, and partition date

Link to full diff

@kik-kik kik-kik self-assigned this Sep 26, 2024
@kik-kik kik-kik added the enhancement New feature or request label Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants