Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add schema.yaml to urlbar_events (sql_generator) #4595

Merged
merged 6 commits into from
Nov 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions sql_generators/urlbar_events/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@
import os
from enum import Enum
from pathlib import Path
import shutil

import click
from jinja2 import Environment, FileSystemLoader

from bigquery_etl.cli.utils import use_cloud_function_option
from bigquery_etl.format_sql.formatter import reformat
from bigquery_etl.util.common import render, write_sql
from bigquery_etl.schema import SCHEMA_FILE, Schema

THIS_PATH = Path(os.path.dirname(__file__))
TABLE_NAME = "urlbar_events"
Expand Down Expand Up @@ -53,18 +55,20 @@ def generate(target_project, output_dir, use_cloud_function):
app_name=browser.name,
)
)
full_table_id = f"{target_project}.{browser.name}_derived.{TABLE_NAME}_v2"
full_view_id = f"{target_project}.{browser.name}.{TABLE_NAME}"

write_sql(
output_dir=output_dir,
full_table_id=f"{target_project}.{browser.name}_derived.{TABLE_NAME}_v2",
full_table_id=full_table_id,
basename="query.sql",
sql=query_sql,
skip_existing=False,
)

write_sql(
output_dir=output_dir,
full_table_id=f"{target_project}.{browser.name}_derived.{TABLE_NAME}_v2",
full_table_id=full_table_id,
basename="metadata.yaml",
sql=render(
metadata_template,
Expand All @@ -78,7 +82,7 @@ def generate(target_project, output_dir, use_cloud_function):

write_sql(
output_dir=output_dir,
full_table_id=f"{target_project}.{browser.name}.{TABLE_NAME}",
full_table_id=full_view_id,
basename="view.sql",
sql=reformat(
view_template.render(
Expand All @@ -88,3 +92,10 @@ def generate(target_project, output_dir, use_cloud_function):
),
skip_existing=False,
)

final_path = Path(os.path.join(output_dir, *list(full_table_id.split(".")[-2:])))
source_schema_path = THIS_PATH / "templates" / "schema.yaml"
final_schema_path = final_path / "schema.yaml"

if os.path.exists(source_schema_path):
shutil.copyfile(source_schema_path, final_schema_path)
157 changes: 157 additions & 0 deletions sql_generators/urlbar_events/templates/schema.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
---
fields:
- name: submission_date
type: DATE
mode: NULLABLE
- name: glean_client_id
type: STRING
mode: NULLABLE
- name: legacy_telemetry_client_id
type: STRING
mode: NULLABLE
- name: sample_id
type: INTEGER
mode: NULLABLE
- name: event_name
type: STRING
mode: NULLABLE
description: Name of the 'urlbar' event represented by this row- 'engagement' or
'abandonment'
- name: event_timestamp
type: INTEGER
mode: NULLABLE
description: Glean event timestamp
- name: event_id
type: STRING
mode: NULLABLE
description: Row identifier UUID. When unnesting the results column, use
'COUNT(DISTINCT event_id)' to count events.
- name: experiments
type: RECORD
mode: REPEATED
fields:
- name: key
type: STRING
mode: NULLABLE
- name: value
type: RECORD
mode: NULLABLE
fields:
- name: branch
type: STRING
mode: NULLABLE
- name: extra
type: RECORD
mode: NULLABLE
fields:
- name: enrollment_id
type: STRING
mode: NULLABLE
- name: type
type: STRING
mode: NULLABLE
- name: seq
type: INTEGER
mode: NULLABLE
description: ping_info.seq from the events ping. Use together with
event_timestamp for event sequencing.
- name: normalized_channel
type: STRING
mode: NULLABLE
- name: normalized_country_code
type: STRING
mode: NULLABLE
- name: normalized_engine
type: STRING
mode: NULLABLE
description: Normalized default search engine
- name: pref_data_collection
type: BOOLEAN
mode: NULLABLE
description: Has the user opted into Firefox Suggest data collection, aka
Suggest Online.
- name: pref_sponsored_suggestions
type: BOOLEAN
mode: NULLABLE
description: Are Firefox Suggest sponsored suggestions enabled
- name: pref_fx_suggestions
type: BOOLEAN
mode: NULLABLE
description: Is Firefox Suggest enabled (nonsponsored suggestions)
- name: engagement_type
type: STRING
mode: NULLABLE
description: How the user selected the result. Eg. 'click', 'enter'.
- name: interaction
type: STRING
mode: NULLABLE
description: How the user started the search action. Eg. 'typed', 'pasted'.
- name: num_chars_typed
type: INTEGER
mode: NULLABLE
- name: num_chars_typed
type: INTEGER
mode: NULLABLE
description: Length of the query string typed by the user
- name: num_total_results
type: INTEGER
mode: NULLABLE
description: Number of results displayed
- name: selected_position
type: INTEGER
mode: NULLABLE
description: Rank of the selected result, starting from 1, if any.
- name: selected_result
type: STRING
mode: NULLABLE
description: Raw type identifier for the selected result, if any. Eg.
'search_suggest', 'bookmark'.
- name: results
type: RECORD
mode: REPEATED
description: Array listing info about each result displayed.
fields:
- name: position
type: INTEGER
mode: NULLABLE
description: Display rank of this result, starting from 1.
- name: result_type
type: STRING
mode: NULLABLE
description: Raw type identifier for this result.
- name: product_result_type
type: STRING
mode: NULLABLE
description: Product type identifier for this result.
- name: result_group
type: STRING
mode: NULLABLE
description: Result group this result belongs to. Eg. 'heuristic', 'suggest'.
- name: product_selected_result
type: STRING
mode: NULLABLE
description: Product type identifier for the selected result, if any. Eg.
'wikipedia_enhanced', 'default_partner_search_suggestion'.
- name: event_action
type: STRING
mode: NULLABLE
description: Action taken by the user which generated the event- 'engaged',
'abandoned', or 'annoyance'.
- name: is_terminal
type: BOOLEAN
mode: NULLABLE
description: Did the event action cause the search session to end? Filter on
'is_terminal = TRUE' to count unique search sessions.
- name: engaged_result_type
type: STRING
mode: NULLABLE
description: Raw type identifier for the selected result, if any.
- name: product_engaged_result_type
type: STRING
mode: NULLABLE
description: Product type identifier for the selected result, if any.
- name: annoyance_signal_type
type: STRING
mode: NULLABLE
description: Annoyance option selected, if any. This uses the value of
'engagement_type' when 'event_action' is annoyance. Eg. 'dismiss', 'help'.