The Archives Unleashed Toolkit

The Archives Unleashed Toolkit is an open-source platform for analyzing web archives built on Apache Spark, which provides powerful tools for analytics and data processing.

Check out the code on GitHub along with helpful user documentation to get you started using the Archives Unleashed Toolkit. If you want hack on the Archives Unleashed Toolkit, check out our Java and Scala API documentation.

The Archives Unleashed Toolkit can also be used in conjunction with Spark Notebooks, and Apache Zepplin.

If you want to learn more about Apache Spark, we highly recommend Spark: The Definitive Guide