Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve US subscription state mappings (DENG-2099) #4701

Merged
merged 3 commits into from
Dec 14, 2023

Conversation

sean-rose
Copy link
Contributor

@sean-rose sean-rose commented Dec 14, 2023

DENG-2099: Add the geo information(state for US) to the VPN subscription data

The US ZIP codes public data source I started using in #4675 turned out to be missing at least 2,656 ZIP codes (maybe because it was last updated in February 2020?), and unfortunately I haven't been able to find any other good data source for ZIP codes:

  • It seems like to get all ZIP codes from USPS you have to pay them money.
  • This public list from USPS is missing over 5k ZIP codes.
  • In the past we've used Stripe US customer billing addresses that have both ZIP codes and states specified to fill in states elsewhere, but those are missing at least 16k ZIP codes and are generally no longer being set in Stripe going forward.

This PR instead implements an approach of using ZIP code prefixes which may be less precise in some cases, but will be more comprehensive (4,386 more US subscriptions will get their state location mapped), and hopefully won't be too much hassle to maintain.


Checklist for reviewer:

  • Commits should reference a bug or github issue, if relevant (if a bug is referenced, the pull request should include the bug number in the title).
  • If the PR comes from a fork, trigger integration CI tests by running the Push to upstream workflow and provide the <username>:<branch> of the fork as parameter. The parameter will also show up
    in the logs of the manual-trigger-required-for-fork CI task together with more detailed instructions.
  • If adding a new field to a query, ensure that the schema and dependent downstream schemas have been updated.
  • When adding a new derived dataset, ensure that data is not available already (fully or partially) and recommend extending an existing dataset in favor of creating new ones. Data can be available in the bigquery-etl repository, looker-hub or in looker-spoke-default.

For modifications to schemas in restricted namespaces (see CODEOWNERS):

┆Issue is synchronized with this Jira Task

@@ -0,0 +1,3 @@
Mappings from three-digit US ZIP code prefixes to US state codes.

Data source: https://simple.wikipedia.org/wiki/List_of_ZIP_Code_prefixes
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the first time I've come across Simple English Wikipedia. I cited the Simple English Wikipedia version because it was the easiest place to quickly get the data in a reasonable format, compared to the equivalent page on normal Wikipedia which has a highly formatted table that'd be much harder to copy the data from.

S,SK
T,AB
V,BC
X,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

District X covers both NT and NU, so we can't tell which one.

@sean-rose sean-rose enabled auto-merge (squash) December 14, 2023 16:51
@sean-rose sean-rose merged commit 314a394 into main Dec 14, 2023
15 of 18 checks passed
@sean-rose sean-rose deleted the DENG-2099-improve-state-mapping branch December 14, 2023 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants