Web crawler and scraper based on Scrapy and Playwright's headless browser.
-
Updated
Sep 28, 2024 - Python
Web crawler and scraper based on Scrapy and Playwright's headless browser.
🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
Scrapfly Python SDK for headless browsers and proxy rotation
🎭 Playwright integration for Scrapy
A package acting as a wrapper around the headless mode of existing web browsers to generate images from URLs and from HTML+CSS strings or files.
📸 GitHub action for web screenshot
An embeddable headless browser package for Python that provides a simplified interface for interacting with web pages using Selenium and Selenium Hub.
💯 Teach puppeteer new tricks through plugins.
PageFlash is a powerful headless browser WordPress plugin designed to provide you with a fast and efficient web browsing experience within your WordPress site. Say goodbye to page reloads and enjoy seamless navigation through web content with this plugin.
Example of username and password proxy authentication for use in Selenium
Job posting data scraped from Indeed.com. This data is used in django web for testing purpose.
Run Selenium with Python via Github Actions using Headless or Non-Headless browsers!
Use this tutorial and learn how to perform web scraping using a headless browser.
Automated Selenium-based scraper for extracting and analyzing job listings from Glassdoor
Automated Selenium-based scraper for extracting data from Myntra
An example that shows how to use the Nightmare headless browser to capture web-based visualizations under Node.js.
Dare2024.com Solver is a Python automation script for seamlessly solving Dare2024.com quizzes. Impress your friends with correct answers effortlessly. Compatible with all dare2024.com versions and future updates.
DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by ArchiveBox.io under the hood.
Check this website for phantomjs official github: https://github.com/ariya/phantomjs; Check this out for the original author for dockerize phantomjs: https://dustinblackman.com/posts/stop-using-phantomjs
Add a description, image, and links to the headless-browser topic page so that developers can more easily learn about it.
To associate your repository with the headless-browser topic, visit your repo's landing page and select "manage topics."