News Corp Australia: Scraping updates for more recent website editions #3071

- Update the selectors for breadcrumb navigation (for section title), and stop using lastChild which can be a text node. - Use page URL directly as the URL field, because URL in JSON data may point to a different domain. - Prefer title scraped from page body, because the one from JSON data may fail to match the former, possibly due to social media SEO. - More robust determination of authors, accounting for different data formats across different sites. See also: https://forums.zotero.org/discussion/105950/attempt-to-save-using-embedded-metadata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

News Corp Australia: Scraping updates for more recent website editions #3071

News Corp Australia: Scraping updates for more recent website editions #3071

Commits on Jul 3, 2023

Commits on Jul 14, 2023