Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate Firestore ingestion optimizations #11

Open
rviscomi opened this issue Oct 25, 2023 · 3 comments
Open

Investigate Firestore ingestion optimizations #11

rviscomi opened this issue Oct 25, 2023 · 3 comments
Assignees

Comments

@rviscomi
Copy link
Member

We're using Firestore as the intermediary storage layer for the API. The problem is that we're only able to import 500 rows of data at a time, so it's taking a very long time and creating issues with the initial backfill.

Investigate whether it's possible to import the entire table in one go, or at least in larger batches. This will speed up the backfill and monthly import jobs and also simplify the pipeline.

@tunetheweb
Copy link
Member

Hmmm from a quick Google it does look like it's limited to 500 "operations":

@maceto
Copy link
Collaborator

maceto commented Dec 4, 2023

Hi @rviscomi @tunetheweb,

I think we can close this issue, with Giancarlo help we were able to incorporate the process into DataFlow pipeline. For the full historical process takes severals hours and for last month updates takes under 25 mins.

Dataflow does a great job processing in parallel and inserting into Firestore.

@rviscomi
Copy link
Member Author

rviscomi commented Dec 4, 2023

Awesome! Were does that Dataflow code live, and are there any remaining documentation tasks worth tracking?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants