-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DS-3326] Add state to subplat subscriptions v2 #5836
base: main
Are you sure you want to change the base?
Changes from 3 commits
97a0ed7
6d37825
11a5e36
0467bfa
931c938
4346584
eaba223
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -8,7 +8,7 @@ description: |- | |||||
|
||||||
This table's schema closely mirrors Stripe's customers API (https://stripe.com/docs/api/customers/object). | ||||||
Fields which Fivetran doesn't sync and fields we've specifically chosen not to include (e.g. PII) have been omitted. | ||||||
In particular, all address fields except country have been omitted because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset. | ||||||
In particular, postal codes have been omitted from address because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this line still relevant? From what I can tell, the source tables are moz-confidential There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, it's still relevant, and I'd recommend saying "all address fields except country and state have been omitted", because it's referring to not just what's omitted in the ETL but also what's omitted from the Fivetran sync (e.g. we've configured Fivetran to not sync street addresses).
Suggested change
(the same applies to similar changes made in other
The ultimate source tables for these ETLs are in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah ok I saw that the redash service account had access to |
||||||
owners: | ||||||
- srose@mozilla.com | ||||||
labels: | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -38,6 +38,10 @@ fields: | |
type: STRING | ||
mode: NULLABLE | ||
description: Two-letter country code (ISO 3166-1 alpha-2). | ||
- name: state | ||
type: STRING | ||
mode: NULLABLE | ||
description: Two-letter state/province code (only for US and CA). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
- name: created | ||
type: TIMESTAMP | ||
mode: NULLABLE | ||
|
@@ -200,6 +204,10 @@ fields: | |
type: STRING | ||
mode: NULLABLE | ||
description: Two-letter country code (ISO 3166-1 alpha-2). | ||
- name: state | ||
type: STRING | ||
mode: NULLABLE | ||
description: Two-letter state/province code (only for US and CA). | ||
- name: tax_exempt | ||
type: STRING | ||
mode: NULLABLE | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -40,18 +40,64 @@ WITH original_changelog AS ( | |||||
JSON_VALUE(customer.metadata.userid) AS userid, | ||||||
TO_HEX(SHA256(JSON_VALUE(customer.metadata.userid))) AS userid_sha256 | ||||||
) AS metadata, | ||||||
-- Limit address data to just country since the metadata includes FxA user IDs | ||||||
-- and this is in a Mozilla-confidential dataset. | ||||||
STRUCT(customer.address.country) AS address, | ||||||
-- Limit address data to country and state since the metadata includes FxA user IDs | ||||||
-- and this is in a Mozilla-confidential dataset. State is only for US and CA. | ||||||
STRUCT( | ||||||
customer.address.country, | ||||||
IF( | ||||||
customer.address.country IN ("US", "CA"), | ||||||
COALESCE( | ||||||
NULLIF(customer.address.state, ""), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
A relevant comment when I added equivalent logic for US & Canadian state codes in the older VPN ETLs:
|
||||||
us_zip_code_prefixes.state_code, | ||||||
ca_postal_districts.province_code | ||||||
), | ||||||
NULL | ||||||
) AS state | ||||||
) AS address, | ||||||
( | ||||||
SELECT AS STRUCT | ||||||
customer.shipping.* REPLACE (STRUCT(customer.shipping.address.country) AS address) | ||||||
customer.shipping.* REPLACE ( | ||||||
STRUCT( | ||||||
customer.shipping.address.country, | ||||||
IF( | ||||||
customer.shipping.address.country IN ("US", "CA"), | ||||||
COALESCE( | ||||||
NULLIF(customer.shipping.address.state, ""), | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
While this isn't technically required at the moment since all existing customer shipping address state values for the US and Canada are two-letter codes like we want, I figure it's wise to guard against non-two-letter values possibly appearing in the future. |
||||||
us_shipping_zip_code_prefixes.state_code, | ||||||
ca_shipping_postal_districts.province_code | ||||||
), | ||||||
NULL | ||||||
) AS state | ||||||
) AS address | ||||||
) | ||||||
) AS shipping | ||||||
) | ||||||
) AS customer, | ||||||
ROW_NUMBER() OVER customer_changes_asc AS customer_change_number | ||||||
FROM | ||||||
`moz-fx-data-shared-prod`.stripe_external.customers_changelog_v1 | ||||||
-- try to get state using postal code if state field is unavailable | ||||||
LEFT JOIN | ||||||
`moz-fx-data-shared-prod.static.us_zip_code_prefixes_v1` AS us_shipping_zip_code_prefixes | ||||||
ON customer.shipping.address.country = "US" | ||||||
AND LEFT( | ||||||
customer.shipping.address.postal_code, | ||||||
3 | ||||||
) = us_shipping_zip_code_prefixes.zip_code_prefix | ||||||
LEFT JOIN | ||||||
`moz-fx-data-shared-prod.static.us_zip_code_prefixes_v1` AS us_zip_code_prefixes | ||||||
ON customer.address.country = "US" | ||||||
AND LEFT(customer.address.postal_code, 3) = us_zip_code_prefixes.zip_code_prefix | ||||||
LEFT JOIN | ||||||
`moz-fx-data-shared-prod.static.ca_postal_districts_v1` AS ca_shipping_postal_districts | ||||||
ON customer.shipping.address.country = "CA" | ||||||
AND UPPER( | ||||||
LEFT(customer.shipping.address.postal_code, 1) | ||||||
) = ca_shipping_postal_districts.postal_district_code | ||||||
LEFT JOIN | ||||||
`moz-fx-data-shared-prod.static.ca_postal_districts_v1` AS ca_postal_districts | ||||||
ON customer.address.country = "CA" | ||||||
AND UPPER(LEFT(customer.address.postal_code, 1)) = ca_postal_districts.postal_district_code | ||||||
WINDOW | ||||||
customer_changes_asc AS ( | ||||||
PARTITION BY | ||||||
|
@@ -68,7 +114,7 @@ pre_fivetran_changelog AS ( | |||||
TIMESTAMP '2022-03-25 00:02:29' AS `timestamp`, | ||||||
STRUCT( | ||||||
id, | ||||||
STRUCT(address.country) AS address, | ||||||
STRUCT(address.country, address.state) AS address, | ||||||
created, | ||||||
CAST(NULL AS STRING) AS default_source_id, | ||||||
CAST( | ||||||
|
@@ -103,7 +149,7 @@ pre_fivetran_changelog AS ( | |||||
CAST(NULL AS STRING) AS userid, | ||||||
JSON_VALUE(metadata.userid_sha256) AS userid_sha256 | ||||||
) AS metadata, | ||||||
CAST(NULL AS STRUCT<address STRUCT<country STRING>>) AS shipping, | ||||||
CAST(NULL AS STRUCT<address STRUCT<country STRING, state STRING>>) AS shipping, | ||||||
CAST(NULL AS STRING) AS tax_exempt | ||||||
) AS customer, | ||||||
1 AS customer_change_number | ||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhat awkwardly named because I don't want it to be ambiguous and interpreted as the state of a subscription
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's less than ideal either way, but personally I'd lean toward naming these
state_code
and we can clarify the meaning in the column description (FYI "status" is the term being used elsewhere in the ETLs for the state/status of a subscription).Also, even when the other columns don't have descriptions please go ahead and add descriptions for these columns being added, as we should consistently clarify the meaning of the column and note that it's only populated for the US and Canada.