New subject: Fwd: Notification of web harvest & consultation report

7 Apr 2010

      Hi all:

Many of you were affected by the 2008 web harvest or have expressed an
interest in the 2010 harvest so I am sending you the message below about
the outcome of our recent consultation on the Options paper on the 2010
whole of domain web harvest. 

The decisions on the key issues raised are:

Notification

-  The harvest is scheduled to begin on 12 May 2010. There will be a
five-week notification period
-  The Library will use several channels to communicate about the
harvest, including its corporate website, the LibraryTechNZ blog, a
Twitter account, various mailing lists and forums, and media releases.

Robots.txt

-  In 2008 the Library made the decision to ignore the robots.txt
convention. 
-  For the 2010 harvest, where a robots.txt file exists the harvester
will honour robots.txt except when downloading images and other elements
that are embedded in other web pages.  
-  Website owners can set specific rules for the Library’s harvester,
which will have the user agent string: NLNZHarvester2010
-  If you have a very restrictive robots.txt file in place already, we
would appreciate it if you could provide a more permissive rule for
NLNZHarvester2010 to help us capture a complete copy of your website

Location of harvester

-  After consultation with New Zealand telecom vendors we have decided
to run the harvest from the United States using the Internet Archive’s
hardware and network infrastructure, as we did in 2008.

More information about these decisions is available from our website:
http://www.natlib.govt.nz/about-us/current-initiatives/web-harvest-2010

http://www.natlib.govt.nz/catalogues/library-documents/web-harvest-consultat...

Thanks to all of you have offered us advice in various fora over the
last few months (and years).

Gordon

......................
New Zealand web harvest 2010
More information at
http://www.natlib.govt.nz/about-us/current-initiatives/web-harvest-2010

This account is run by Courtney Johnston (Web Manager) and Gordon
Paynter (Programme Manager Digitisation)

Fwd: Notification of web harvest & consultation report

Gordon Paynter

Nathan Ward

Regan Murphy

Nathan Ward

Regan Murphy

TreeNet Admin

Leon Strong

Gordon Paynter

tags

participants (5)