Skip to content

Commit

Permalink
Automate crawler (#4385)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?

[Asana
task](https://app.asana.com/0/1200099998847559/1205223844458111/f)

Adds Git workflow to trigger an Algolia crawl on PR merges. 

To trigger the crawler, add the `trigger-crawl` label to a PR. Once the
PR is merged, the GitHub Action will:
- Check if the `trigger-crawl` label is set
- If so, wait 8 minutes to allow time for production build to complete
-  Start the Algolia crawl

[Example run from this
PR](https://github.com/dbt-labs/docs.getdbt.com/actions/runs/6723222857/job/18272826476)

## Web Team Testing

To test:
- Open sandbox PR
[here](dbtlabs-sandbox/react-app-test#10)
- This PR has the `trigger-crawl` label set.
- Open the [workflow
runs](https://github.com/dbtlabs-sandbox/react-app-test/actions/workflows/crawler.yml)
in separate tab
- Open Algolia crawler dashboard
- Merge PR, and verify workflow starts and finishes successfully
- Once the workflow is complete, a new crawl should be running for the
docs site (this is set to crawl the live site, so no issue with letting
it run through.)

## Notes

Rather than using the `sleep 480` command to wait 8 minutes before
triggering the crawl, I looked into using this [git
action](https://github.com/dorshinar/get-deployment-url) to watch for a
Vercel deployment. However, this watches for preview deploys, so if a
previous deploy preview was built from an earlier commit, this step will
complete instantly and will allow the crawl to start before the live
docs site is rebuilt.

There's another [git
action](https://github.com/UnlyEd/github-action-await-vercel#2-dynamically-resolve-the-vercel-deployment-url)
for watching Vercel deployments. However their section on dynamically
resolving a deployment URL has an example workflow which i'd rather not
do due to the complexity of it.
  • Loading branch information
JKarlavige authored Nov 3, 2023
2 parents 4e1244a + bcb6ded commit d08a2d4
Showing 1 changed file with 33 additions and 0 deletions.
33 changes: 33 additions & 0 deletions .github/workflows/crawler.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
name: Algolia Crawler
on:
pull_request:
types:
- closed

jobs:
algolia_recrawl:
# Comment out the if check below if running on every merge to current branch
if: |
contains(github.event.pull_request.labels.*.name, 'trigger-crawl')
&& github.event.pull_request.merged == true
name: Trigger Algolia Crawl
runs-on: ubuntu-latest
steps:
# Checkout repo
- name: Checkout Repo
uses: actions/checkout@v3

# Wait 8 minutes to allow Vercel build to complete
- run: sleep 480

# Once deploy URL is found, trigger Algolia crawl
- name: Run Algolia Crawler
uses: algolia/algoliasearch-crawler-github-actions@v1
id: crawler_push
with:
crawler-user-id: ${{ secrets.CRAWLER_USER_ID }}
crawler-api-key: ${{ secrets.CRAWLER_API_KEY }}
algolia-app-id: ${{ secrets.ALGOLIA_APP_ID }}
algolia-api-key: ${{ secrets.ALGOLIA_API_KEY }}
site-url: 'https://docs.getdbt.com'
crawler-name: ${{ secrets.CRAWLER_NAME }}

0 comments on commit d08a2d4

Please sign in to comment.