Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DS-3326] Add state to subplat subscriptions v2 #5836

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ fields:
- name: country_name
type: STRING
mode: NULLABLE
- name: country_state_code
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat awkwardly named because I don't want it to be ambiguous and interpreted as the state of a subscription

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's less than ideal either way, but personally I'd lean toward naming these state_code and we can clarify the meaning in the column description (FYI "status" is the term being used elsewhere in the ETLs for the state/status of a subscription).

Also, even when the other columns don't have descriptions please go ahead and add descriptions for these columns being added, as we should consistently clarify the meaning of the column and note that it's only populated for the US and Canada.

type: STRING
mode: NULLABLE
- name: services
type: RECORD
mode: REPEATED
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,9 @@ fields:
- name: country_name
type: STRING
mode: NULLABLE
- name: country_state_code
type: STRING
mode: NULLABLE
- name: services
type: RECORD
mode: REPEATED
Expand Down Expand Up @@ -241,6 +244,9 @@ fields:
- name: country_name
type: STRING
mode: NULLABLE
- name: country_state_code
type: STRING
mode: NULLABLE
- name: services
type: RECORD
mode: REPEATED
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ SELECT
) AS customer_subscription_number,
history.subscription.country_code,
COALESCE(countries.name, history.subscription.country_code, 'Unknown') AS country_name,
history.subscription.country_state_code,
history.subscription.services,
history.subscription.provider_product_id,
history.subscription.product_name,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ fields:
- name: country_name
type: STRING
mode: NULLABLE
- name: country_state_code
type: STRING
mode: NULLABLE
- name: services
type: RECORD
mode: REPEATED
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ fields:
- name: country_name
type: STRING
mode: NULLABLE
- name: country_state_code
type: STRING
mode: NULLABLE
- name: services
type: RECORD
mode: REPEATED
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ description: |-

This table's schema closely mirrors Stripe's customers API (https://stripe.com/docs/api/customers/object).
Fields which Fivetran doesn't sync and fields we've specifically chosen not to include (e.g. PII) have been omitted.
In particular, all address fields except country have been omitted because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset.
In particular, postal codes have been omitted from address because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line still relevant? From what I can tell, the source tables are moz-confidential

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line still relevant?

Yes, it's still relevant, and I'd recommend saying "all address fields except country and state have been omitted", because it's referring to not just what's omitted in the ETL but also what's omitted from the Fivetran sync (e.g. we've configured Fivetran to not sync street addresses).

Suggested change
In particular, postal codes have been omitted from address because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset.
In particular, all address fields except country and state have been omitted because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset.

(the same applies to similar changes made in other metadata.yaml files)

From what I can tell, the source tables are moz-confidential

The ultimate source tables for these ETLs are in moz-fx-data-shared-prod.stripe_external and moz-fx-data-bq-fivetran.stripe, which aren't Moz-confidential.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I saw that the redash service account had access to stripe_external so I assumed it was accessible from stmo but that role doesn't include data viewer permissions. I also didn't see that data platform group has special access

owners:
- srose@mozilla.com
labels:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ fields:
type: STRING
mode: NULLABLE
description: Two-letter country code (ISO 3166-1 alpha-2).
- name: state
type: STRING
mode: NULLABLE
description: Two-letter state/province code (only for US and CA).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • This statement isn't entirely true as we aren't enforcing that they be two-letter codes, but I feel like we should, since the non-two-letter values generally look to be ones we don't want. I'll add suggestions in the relevant ETLs for how I think that should be done.
  • I'd suggest spelling out "Canada" here and in other descriptions/comments to remove ambiguity (those of us in the US tend to most associate "CA" with California as that's its state code).

- name: created
type: TIMESTAMP
mode: NULLABLE
Expand Down Expand Up @@ -200,6 +204,10 @@ fields:
type: STRING
mode: NULLABLE
description: Two-letter country code (ISO 3166-1 alpha-2).
- name: state
type: STRING
mode: NULLABLE
description: Two-letter state/province code (only for US and CA).
- name: tax_exempt
type: STRING
mode: NULLABLE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ description: |-

This table's schema closely mirrors Stripe's customers API (https://stripe.com/docs/api/customers/object).
Fields which Fivetran doesn't sync and fields we've specifically chosen not to include (e.g. PII) have been omitted.
In particular, all address fields except country have been omitted because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset.
In particular, postal codes have been omitted from address because Firefox Account user IDs are present and this is in a Mozilla-confidential dataset.
owners:
- srose@mozilla.com
labels:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,18 +40,64 @@ WITH original_changelog AS (
JSON_VALUE(customer.metadata.userid) AS userid,
TO_HEX(SHA256(JSON_VALUE(customer.metadata.userid))) AS userid_sha256
) AS metadata,
-- Limit address data to just country since the metadata includes FxA user IDs
-- and this is in a Mozilla-confidential dataset.
STRUCT(customer.address.country) AS address,
-- Limit address data to country and state since the metadata includes FxA user IDs
-- and this is in a Mozilla-confidential dataset. State is only for US and CA.
STRUCT(
customer.address.country,
IF(
customer.address.country IN ("US", "CA"),
COALESCE(
NULLIF(customer.address.state, ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NULLIF(customer.address.state, ""),
IF(LENGTH(customer.address.state) = 2, customer.address.state, NULL)

A relevant comment when I added equivalent logic for US & Canadian state codes in the older VPN ETLs:

Most billing address state values for US and Canadian customers are two-letter codes, but there are some with state names instead, including some which are clearly not US/Canadian states. So although this excludes somewhat legit values like "Colorado" for a few customers, I felt it was worth it to exclude clearly wrong values like "Bayamón" and keep the data consistent for BI.

us_zip_code_prefixes.state_code,
ca_postal_districts.province_code
),
NULL
) AS state
) AS address,
(
SELECT AS STRUCT
customer.shipping.* REPLACE (STRUCT(customer.shipping.address.country) AS address)
customer.shipping.* REPLACE (
STRUCT(
customer.shipping.address.country,
IF(
customer.shipping.address.country IN ("US", "CA"),
COALESCE(
NULLIF(customer.shipping.address.state, ""),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NULLIF(customer.shipping.address.state, ""),
IF(LENGTH(customer.shipping.address.state) = 2, customer.shipping.address.state, NULL)

While this isn't technically required at the moment since all existing customer shipping address state values for the US and Canada are two-letter codes like we want, I figure it's wise to guard against non-two-letter values possibly appearing in the future.

us_shipping_zip_code_prefixes.state_code,
ca_shipping_postal_districts.province_code
),
NULL
) AS state
) AS address
)
) AS shipping
)
) AS customer,
ROW_NUMBER() OVER customer_changes_asc AS customer_change_number
FROM
`moz-fx-data-shared-prod`.stripe_external.customers_changelog_v1
-- try to get state using postal code if state field is unavailable
LEFT JOIN
`moz-fx-data-shared-prod.static.us_zip_code_prefixes_v1` AS us_shipping_zip_code_prefixes
ON customer.shipping.address.country = "US"
AND LEFT(
customer.shipping.address.postal_code,
3
) = us_shipping_zip_code_prefixes.zip_code_prefix
LEFT JOIN
`moz-fx-data-shared-prod.static.us_zip_code_prefixes_v1` AS us_zip_code_prefixes
ON customer.address.country = "US"
AND LEFT(customer.address.postal_code, 3) = us_zip_code_prefixes.zip_code_prefix
LEFT JOIN
`moz-fx-data-shared-prod.static.ca_postal_districts_v1` AS ca_shipping_postal_districts
ON customer.shipping.address.country = "CA"
AND UPPER(
LEFT(customer.shipping.address.postal_code, 1)
) = ca_shipping_postal_districts.postal_district_code
LEFT JOIN
`moz-fx-data-shared-prod.static.ca_postal_districts_v1` AS ca_postal_districts
ON customer.address.country = "CA"
AND UPPER(LEFT(customer.address.postal_code, 1)) = ca_postal_districts.postal_district_code
WINDOW
customer_changes_asc AS (
PARTITION BY
Expand All @@ -68,7 +114,7 @@ pre_fivetran_changelog AS (
TIMESTAMP '2022-03-25 00:02:29' AS `timestamp`,
STRUCT(
id,
STRUCT(address.country) AS address,
STRUCT(address.country, address.state) AS address,
created,
CAST(NULL AS STRING) AS default_source_id,
CAST(
Expand Down Expand Up @@ -103,7 +149,7 @@ pre_fivetran_changelog AS (
CAST(NULL AS STRING) AS userid,
JSON_VALUE(metadata.userid_sha256) AS userid_sha256
) AS metadata,
CAST(NULL AS STRUCT<address STRUCT<country STRING>>) AS shipping,
CAST(NULL AS STRUCT<address STRUCT<country STRING, state STRING>>) AS shipping,
CAST(NULL AS STRING) AS tax_exempt
) AS customer,
1 AS customer_change_number
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,10 @@ fields:
type: STRING
mode: NULLABLE
description: Two-letter country code (ISO 3166-1 alpha-2).
- name: state
type: STRING
mode: NULLABLE
description: Two-letter state/province code (only for US and CA).
- name: created
type: TIMESTAMP
mode: NULLABLE
Expand Down Expand Up @@ -210,6 +214,10 @@ fields:
type: STRING
mode: NULLABLE
description: Two-letter country code (ISO 3166-1 alpha-2).
- name: state
type: STRING
mode: NULLABLE
description: Two-letter state/province code (only for US and CA).
- name: tax_exempt
type: STRING
mode: NULLABLE
Expand Down
Loading