The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
-
Updated
Feb 27, 2024 - Scala
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Squidwarc is a high fidelity, user scriptable, archival crawler that uses Chrome or Chromium with or without a head
A Rails engine supporting the discovery of web archives.
A dockerized, queued high fidelity web archiver based on Squidwarc
Save web pages as Safari webarchive files from the command line
Links on the web break all the time, robustify them!
Rails application for the Archives Unleashed Cloud.
A toolkit for developing algorithms that sample mementos from a web archive collection.
Add-On for Google Sheets to help those working with web archives.
A Python utility for publishing a social media story built from archived web pages to multiple services.
Parser for WARC (aka WebArchive) files
Docker image for the Archives Unleashed Toolkit
Seeder - Czech webarchive curating tool and public site
A library for interacting with web archive collections at Archive-It, Trove, Pandora, and more.
Create webarchive entries on archive.org from your raindrop.io bookmarks list using waybackpy
Repository for collecting scripts to help capture MyConvento newsroom press-releases from the MyConvento PR management suite. The README provides an analysis of the MyConvento URL architecture for users hoping to develop a solution for themselves.
Add a description, image, and links to the webarchives topic page so that developers can more easily learn about it.
To associate your repository with the webarchives topic, visit your repo's landing page and select "manage topics."