Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

News Corp Australia: Scraping updates for more recent website editions #3071

Merged
merged 2 commits into from
Jul 14, 2023

Commits on Jul 3, 2023

  1. News Corp Australia: Scraping updates for more recent website editions

    - Update the selectors for breadcrumb navigation (for section title),
      and stop using lastChild which can be a text node.
    - Use page URL directly as the URL field, because URL in JSON data may
      point to a different domain.
    - Prefer title scraped from page body, because the one from JSON data
      may fail to match the former, possibly due to social media SEO.
    - More robust determination of authors, accounting for different data
      formats across different sites.
    
    See also:
    https://forums.zotero.org/discussion/105950/attempt-to-save-using-embedded-metadata
    zoe-translates committed Jul 3, 2023
    Configuration menu
    Copy the full SHA
    054146e View commit details
    Browse the repository at this point in the history

Commits on Jul 14, 2023

  1. Configuration menu
    Copy the full SHA
    da7e3b1 View commit details
    Browse the repository at this point in the history