Hi all: Many of you were affected by the 2008 web harvest or have expressed an interest in the 2010 harvest so I am sending you the message below about the outcome of our recent consultation on the Options paper on the 2010 whole of domain web harvest. The decisions on the key issues raised are: Notification - The harvest is scheduled to begin on 12 May 2010. There will be a five-week notification period - The Library will use several channels to communicate about the harvest, including its corporate website, the LibraryTechNZ blog, a Twitter account, various mailing lists and forums, and media releases. Robots.txt - In 2008 the Library made the decision to ignore the robots.txt convention. - For the 2010 harvest, where a robots.txt file exists the harvester will honour robots.txt except when downloading images and other elements that are embedded in other web pages. - Website owners can set specific rules for the Library’s harvester, which will have the user agent string: NLNZHarvester2010 - If you have a very restrictive robots.txt file in place already, we would appreciate it if you could provide a more permissive rule for NLNZHarvester2010 to help us capture a complete copy of your website Location of harvester - After consultation with New Zealand telecom vendors we have decided to run the harvest from the United States using the Internet Archive’s hardware and network infrastructure, as we did in 2008. More information about these decisions is available from our website: http://www.natlib.govt.nz/about-us/current-initiatives/web-harvest-2010 http://www.natlib.govt.nz/catalogues/library-documents/web-harvest-consultat... Thanks to all of you have offered us advice in various fora over the last few months (and years). Gordon ...................... New Zealand web harvest 2010 More information at http://www.natlib.govt.nz/about-us/current-initiatives/web-harvest-2010 This account is run by Courtney Johnston (Web Manager) and Gordon Paynter (Programme Manager Digitisation)