Archive-It upgrades to version 7.0

by the Archive-It team

We are pleased to announce the release of Archive-It 7.0 to our community of web archiving partners. This release includes several critical upgrades to web crawling, archival replay, and data reporting tools, and lays foundation for still more developments on the roadmap. Read the full Archive-It 7.0 Release Notes for information about and documentation of each new feature.

For release with version 7.0, Archive-It engineers developed Crawlboss–a back-end manager for all of the information about web crawls and their configurations. All Archive-It partners will see the benefits of this in their crawl reports, which now include more information about seeds and their crawl settings. The same technical and provenance information can also be retrieved and repurposed through the Archive-It Partner API. Paired with Archive-It 7.0’s introduction of seed-specific data de-duplication for all web crawls, these enhancements make the sites that constitute a partners’ collection more discrete and portable–easier to manage, move, and preserve individually.

Screenshot of new seed information available in Archive-It 7.0 crawl reports

Crawl reports now include seed types and settings

Crawlboss also introduces the audio/video download utility youtube-dl to all web crawls. Previously exclusive to Archive-It’s browser-based web capture tool Brozzler, youtube-dl now also enhances the traditional Heritrix web crawler’s ability to archive challenging audio and video elements. Leveraging the additional A/V data and metadata enabled by this upgrade, Archive-It Wayback can retrieve more time-based media from the archives for front-end access.

Screenshot of new A/V replay tool in Archive-It Wayback

Archive-It Wayback now includes a lightbox viewer for time-based media

To enhance access to all of the myriad formats of web data that partners collect in the meantime, Archive-It 7.0 uses OutbackCDX–a widely supported engine for indexing the contents of web archive collections. OutbackCDX generates and updates these indexes faster and more automatically than Archive-It’s legacy server could, meaning less wait time between collecting and sharing.

Together, Crawlboss and OutbackCDX lay the foundation for even more upgrades to Archive-It’s web archiving stack, including high partner priorities like more direct access controls over web captures, options for moving them among different collections, a Python-based Wayback replay tool, and always more opportunity to integrate with other software systems. Watch this space and the Archive-It development roadmap for these updates and more like them.

Archive-It upgrades to version 7.0

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112