Getting Started

Overview

The Archives Unleashed team has developed web archive search and data analysis tools, including the Toolkit, Cloud, Jupyter Notebook, and Warclight. All of our tools are open-source and free to use.

Tool Skill What it Does What You Need Ideal For
logo Beginner The Archives Unleashed Cloud is now closed. It was a web-based platform for working with Archive-It collections.
logo Beginner/ Intermediate The Archives Unleashed Notebooks are Jupyter Notebooks that can help you work with the output of the Archives Unleashed Toolkit. Once you have them up and running, you can use your web browser to work through interactive tutorials! Explore your data through rich visualizations! You need to install the “dependencies” for the notebooks. While you can follow instructions, it does require running commands in your “command line.” An intermediate level of technical knowledge is recommended. We suggest this tutorial. Researchers who want to explore their web archival collections.
logo Advanced Warclight is a search engine that lets users discover web archives. Think of it like the library catalogue meeting the WARC file! While it is easy to use, setting it up on your own collections requires an advanced level of knowledge. You need a lot of WARCs that would benefit from this search engine. If you don’t know what WARCs are, this is not the tool for you. Librarians and archivists who have been collecting web archives and want to enhance collection discoverability.
logo Advanced The Archives Unleashed Toolkit is an Apache Spark-based platform for analyzing web archives at scale. When you use the Archives Unleashed Cloud, you are using the Toolkit in the back end! As you can see from the documentation page, the Toolkit is very powerful. However, it is an advanced tool that requires a high-level of technical knowledge to use — or at least, patience and effort. We do have a hands-on walkthrough here. You would need WARCs that would benefit from computationally exploring them at scale. Researchers who want to explore their WARCs at scale and need more flexibility than the Cloud provides.

Tools in Action: The Notebook

Check out the full page here.

Archives Unleashed Jupyter Notebooks are a prototype method for working with the derivatives generated by the Archives Unleashed Cloud. They allow you to interactively explore and filter the domain count information, extracted full text, and network visualization data generated by the Cloud.

AUK Notebook screenshot AUK Notebook screenshot AUK Notebook screenshot


Tools in Action: Warclight

Check out the full page here.

Warclight is an open-source search engine that supports the discovery of web archives held in the WARC and ARC formats. It allows faceted full-text search, record view, and other advanced discovery options.

Warclight screenshot


Tools in Action: The Toolkit

Check out the full page here.

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives built around Apache Spark.

Spark Terminal