Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-source-drupal): Use the collection count from JSON:API extras to enable parallel API requests for cold builds #32883

Merged
merged 8 commits into from
Aug 26, 2021

Conversation

KyleAMathews
Copy link
Contributor

@KyleAMathews KyleAMathews commented Aug 23, 2021

Otherwise, we have to wait to start querying each page until the previous one finishes. This change lets us query all collection pages in parallel. So instead of fetching one collection page at a time, we can fetch up to the maximum concurrency (default 20).

For a test site with ~3200 entities and a warm Drupal cache (and no CDN cache), this PR dropped sourcing time from ~14s to 4s.

On a very large production Drupal site (~600k entities). Fetching time for a cold build dropped from 2 hours to 30 minutes 🚀

TODOs

  • document in README
  • during the initial build, log out every 50 requests w/ reporter.verbose the queue length & execution rate

…tras to construct URLs

Otherwise, we have to wait to start querying each page until the previous one finishes. This change
lets us query all pages in parallel. So instead of fetching one collection page at a time, we can fetch up to the maximum concurrency (default 20).

For a test site with ~3200 entities, this PR dropped sourcing time from ~14s to 4s.
@gatsbot gatsbot bot added the status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer label Aug 23, 2021
@KyleAMathews KyleAMathews changed the title feat(gatsby-source-drupal): Use the collection count from JSON:API extras to construct URLs feat(gatsby-source-drupal): Use the collection count from JSON:API extras to enable parallel API requests Aug 23, 2021
drupalninja
drupalninja previously approved these changes Aug 23, 2021
@KyleAMathews KyleAMathews added topic: source-drupal Related to Gatsby's integration with Drupal and removed status: triage needed Issue or pull request that need to be triaged and assigned to a reviewer labels Aug 23, 2021
@benrobertsonio
Copy link
Contributor

@KyleAMathews - should we include an addition to the readme to help people discover this update? ie - push people to enable the proper JSON API Extras settings?

Copy link
Contributor

@wardpeet wardpeet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've asked a few questions here

packages/gatsby-source-drupal/src/gatsby-node.js Outdated Show resolved Hide resolved
packages/gatsby-source-drupal/src/gatsby-node.js Outdated Show resolved Hide resolved
packages/gatsby-source-drupal/src/gatsby-node.js Outdated Show resolved Hide resolved
@KyleAMathews
Copy link
Contributor Author

Yeah definitely. I'll add it once we do some more real-world testing to verify this change.

@KyleAMathews KyleAMathews changed the title feat(gatsby-source-drupal): Use the collection count from JSON:API extras to enable parallel API requests feat(gatsby-source-drupal): Use the collection count from JSON:API extras to enable parallel API requests for cold builds Aug 23, 2021
smthomas
smthomas previously approved these changes Aug 26, 2021
Copy link
Contributor

@smthomas smthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me! Just one small comment nit.

packages/gatsby-source-drupal/src/gatsby-node.js Outdated Show resolved Hide resolved
@KyleAMathews KyleAMathews merged commit 568d4ce into master Aug 26, 2021
@KyleAMathews KyleAMathews deleted the parallel-fetches-drupal branch August 26, 2021 04:37
wardpeet pushed a commit that referenced this pull request Aug 27, 2021
…tras to enable parallel API requests for cold builds (#32883)

* feat(gatsby-source-drupal): Use the collection count from JSON:API extras to construct URLs

Otherwise, we have to wait to start querying each page until the previous one finishes. This change
lets us query all pages in parallel. So instead of fetching one collection page at a time, we can fetch up to the maximum concurrency (default 20).

For a test site with ~3200 entities, this PR dropped sourcing time from ~14s to 4s.

* use new browser-based URL parser

* Comment code more

* Use the page size the site has set instead of assuming 50

* Use the original type that's set as that's always there

* Log out updates while sourcing

* Encourage people to enable this setting in the README

* Update gatsby-node.js
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: source-drupal Related to Gatsby's integration with Drupal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants