Whole of Domain Web Harvest 2015

23 Dec 2014

      Hi NZNoggers,

I write to let you know of our intent to commence the next NZ Whole of Domain web harvest on the 12th of Jan 2015.

The crawl will take approximately 3 weeks to complete.

We will be following the same model as previous years, with the Internet Archive acting on our behalf to undertake the crawl.

We will be adhering to robots.txt in the main, however when content blocked by robots.txt is embedded in a page that’s not blocked by robots.txt we will attempt to capture that content.

The metadata the crawler will be using is:-

metadata.userAgentTemplate=Mozilla/5.0 (compatible; NLNZ_IAHarvester2015

+(a)OPERATOR_CONTACT_URL@)

metadata.operatorContactUrl=https://natlib.govt.nz/publishers-and-authors/web-harvesting/domain-harvest

The contact URL gives some information that describes why we are doing this, and what we’re doing.

If you have any questions or concerns, please drop me a line – I’ll be monitoring this email address over the Xmas period.

Many thanks,

Jay

Jay Gattuso | Digital Preservation Analyst | Preservation, Research and Consultancy
National Library of New Zealand | Te Puna Mātauranga o Aotearoa
PO Box 1467 Wellington 6140 New Zealand | +64 (0)4 474 3064
jay.gattuso(a)dia.govt.nz<mailto:jay.gattuso(a)natlib.govt.nz>

Jay Gattuso

tags

participants (1)