Archive Research Compute Hub (ARCH)

Cloud portal

Introduction

The Archive Research Compute Hub, or ARCH, is our new interface for web archive analysis that we are creating with the Internet Archive.

ARCH unlocks the research potential of web archival collections with its ability to generate over a dozen datasets (including domain frequency statistics, hyperlink network graphs, and extracted full-text). Users are able to select their Archive-It collections for exploration and prompt dataset creation with the push of a button. Additionally, ARCH boasts several in-browser visualizations for exploring collection content and potential data outputs.

Cloud portal

We are currently entering the final round of our UX testing in January/February 2022. Along with sharing and verifying new implementations with testers, these final rounds will also serve as an opportunity to stress test ARCH with a wider audience before our public launch in the Spring of 2022.

As always, if you want to keep updated on the status of our project, there are many ways to get involved.

Citing Archives Unleashed

Your citations help to further the recognition of using open-source tools for scientific inquiry, assists in growing the web archiving community, and acknowledges the efforts of contributors to this project.

How to cite the Archives Unleashed Toolkit or Cloud in your research:

Citation Icon Nick Ruest, Jimmy Lin, Ian Milligan, and Samantha Fritz. 2020. The Archives Unleashed Project: Technology, Process, and Community to Improve Scholarly Access to Web Archives. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ‘20). Association for Computing Machinery, New York, NY, USA, 157–166. DOI: https://doi.org/10.1145/3383583.3398513