[nznog] Whole of Domain (.nz) Web Harvest 2016

4 Nov 2015

      Hi All,

It’s that time of year again. The National Library of New Zealand is in the final planning stages for the 2016 Whole of Domain web harvest.

Per previous years we’ll be walking through the .nz domain pulling back a ~20Tbs snapshot of the live web.

The work will be contacted out to the Internet Archive, and I’ll contact this list with the final details once we’ve finished the various planning parts.

At the moment it looks like we’ll undertake the crawl around the middle of Jan, and from past experience it usually takes 3 to 4 weeks to complete.

We expect to try and grab embedded large binaries (video, audio etc), but otherwise we will generally be respectful of robots.txt.

We will have crawl agent string that I will inform you of, so if you have any concerns with our crawling activity we can be blocked. :(

I wanted to give you early sight of the crawl, and invite any questions or thoughts from the mailing list.

We would also like to acknowledge the assistance and support of dnc.org.nz and nzrs.net.nz – without their efforts we would not be able to make heritage snapshots of the live web for research use today and into the future.

Best,

Jay

Jay Gattuso | Digital Preservation Analyst | Preservation, Research and Consultancy
National Library of New Zealand | Te Puna Mātauranga o Aotearoa
PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064
jay.gattuso(a)dia.govt.nz<mailto:jay.gattuso(a)natlib.govt.nz>