Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gclid_conversions table & view #4558

Merged
merged 8 commits into from
Nov 20, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
CREATE OR REPLACE VIEW
`moz-fx-data-shared-prod.mozilla_org.gclid_conversions`
AS
SELECT
*
FROM
`moz-fx-data-shared-prod.mozilla_org_derived.gclid_conversions_v1`
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#fail
{{ is_unique(["gclid", "activity_date"]) }}

Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
friendly_name: Gclid Conversions
description: |-
All conversions for GCLIDs.
Partitioned by activity_date; a single GCLID
can have multiple entries in this table,
if they perform different conversion actions
on different days.
owners:
- frank@mozilla.com
labels:
incremental: true
owner1: frank@mozilla.com
scheduling:
dag_name: bqetl_mozilla_org_derived
fbertsch marked this conversation as resolved.
Show resolved Hide resolved
date_partition_parameter: activity_date
parameters: ["conversion_window:INT64:30", "activity_date:DATE:{{ds}}"]
bigquery:
time_partitioning:
type: day
field: 'activity_date'
require_partition_filter: false
expiration_days: null
references: {}
deprecated: false
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
WITH gclids_to_ga_ids AS (
SELECT
gclid,
ga_client_id,
stub_session_id,
FROM
mozilla_org_derived.ga_sessions_v1
CROSS JOIN
UNNEST(all_reported_stub_session_ids) AS stub_session_id
WHERE
session_date >= DATE_SUB(@activity_date, INTERVAL @conversion_window DAY)
AND gclid IS NOT NULL
),
ga_ids_to_dl_token AS (
SELECT
ga_client_id,
stub_session_id,
dl_token,
FROM
stub_attribution_service_derived.dl_ga_triplets_v1
WHERE
ga_client_id IS NOT NULL
AND stub_session_id IS NOT NULL
Comment on lines +14 to +23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for clarity, I'd probably move the joins up from the final SELECT to here instead of doing multiple lookups at the end.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ask you about this during our 1:1, I'm not 100% sure what you mean

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking it over, I'd prefer to keep as-is (even though it's not required here due to inner joins, I do prefer to keep the filtering logic separate from the joins).

),
dl_token_to_telemetry_id AS (
SELECT
client_id AS telemetry_client_id,
first_seen_date,
attribution_dltoken AS dl_token,
FROM
telemetry_derived.clients_first_seen_v2
),
telemetry_id_to_activity AS (
SELECT
client_id AS telemetry_client_id,
@activity_date AS activity_date,
search_count_all > 0 AS did_search,
ad_clicks_count_all > 0 AS did_click_ad,
TRUE AS was_active,
FROM
telemetry_derved.clients_daily_v6
WHERE
submission_date = @activity_date
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is that we want to filter based on activity date: if an activity happens that we care about (i.e. a conversion event), then we find it here. We look back 30 days to see if that event has an associated GCLID - after 30 days, conversion events don't count! (that's the conversion_window parameter). This is standard practice for marketing campaigns.

)
SELECT
activity_date,
gclid,
COALESCE(LOGICAL_OR(did_search), FALSE) AS did_search,
COALESCE(LOGICAL_OR(did_click_ad), FALSE) AS did_click_ad,
COALESCE(
LOGICAL_OR(was_active AND activity_date > first_seen_date),
FALSE
) AS did_returned_second_day
FROM
gclids_to_ga_ids
INNER JOIN
ga_ids_to_dl_token
USING
(ga_client_id, stub_session_id)
INNER JOIN
dl_token_to_telemetry_id
USING
(dl_token)
INNER JOIN
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inner join by design: We only care about gclids who have activity

telemetry_id_to_activity
USING
(telemetry_client_id)
GROUP BY
gclid
HAVING
did_search
OR did_click_ad
OR did_returned_second_day
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
fields:
- name: activity_date
mode: NULLABLE
type: DATE
description: "The date that the conversion action took place."
- name: gclid
mode: NULLABLE
type: STRING
description: "A Google Click ID, which uniquely represent an ad impression for Google ads."
- name: did_search
mode: NULLABLE
type: BOOLEAN
description: "Whether the GCLID did a search on this day."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: I'd write whether the client that came from the GCLID did the thing. It's subtle but the GCLID is an impression but not a "user"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, thanks for pointing this out

fbertsch marked this conversation as resolved.
Show resolved Hide resolved
- name: did_click_ad
mode: NULLABLE
type: BOOLEAN
description: "Whether the GCLID clicked an ad on this day."
fbertsch marked this conversation as resolved.
Show resolved Hide resolved
- name: did_returned_second_day
mode: NULLABLE
type: BOOLEAN
description: "Whether the GCLID returned for a second (or later) day."
fbertsch marked this conversation as resolved.
Show resolved Hide resolved