National Library of New Zealand Whole of Domain Harvest Consultation Meeting
Hi Everyone. As promised here are the salient points/outcomes from the Whole of Domain Harvest 2008 meeting with National Library. If anyone has any specific questions about the meeting or the topics discussed, then please feel free to contact me offline. Date: 11 November 08, National Library of New Zealand Subject: National Library of New Zealand Whole of Domain Harvest Consultation Meeting Present: National Library of New Zealand DNC, Internet NZ, Citylink, LIAC, NZNOG MED Apologies : ISPANZ Actions: Future Harvests Technical Solutions/Process: . Present harvest proposals with regard to robots/txt policy at a future date. . National Library of New Zealand and DNC to discuss request for the .nz zonefiles in future harvests and also consider an archive opt in clause as part of registration. . Discuss issue of the collector being added to blacklists with Internet Archive. . Consider the possibility of content providers sending their websites through for legal deposit in future. . With regard to the Internet Archive using 2 IP addresses (1) Honouring robots.txt(1) Not honouring robots.txt. Ask the NZNOG community whether they noticed significant differences in the behaviours between the two crawlers. Future Harvests Communications: . In future it is crucial that people are informed prior to any harvest. The discussion group include key people who will help to disseminate that message and are willing to inform their customers. . To develop a follow up plan covering technical issues, communication plans for future web harvests which they will share with the group in the coming months. . To present at the Network Operators Conference in January. . To share the analysis of the harvest details once it is available. The National Library would like to pass on their thanks to all attendees for their participation in the meeting and the excellent contributions to the discussion which will inform future web harvest activity.
Dean Pemberton wrote:
. With regard to the Internet Archive using 2 IP addresses (1) Honouring robots.txt(1) Not honouring robots.txt. Ask the NZNOG community whether they noticed significant differences in the behaviours between the two crawlers.
I know it's bad form to reply to your own posts - but there is a point to this one. It turns out that the two harvest IPs NLNZ was using had different behaviours. One honored robots.txt and the other..... well not so much =) Did any of you who noticed the harvest seem to be getting hit by one of these more than the other? The answer to this, and how much you were hit as a ratio, might help with feedback on how NLNZ can do this better next time. Answers onlist if they are relevant, offlist to me if they are not, or live to Gordon who will be presenting re the harvest at the NZNOG conference =) Thanks Dean
participants (1)
-
Dean Pemberton