I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not? If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\ Is there an ombudsman in the NZ govt that has this sort of thing in their portfolio? just my tuppence worth Russell Sharpe _____ From: Philip Seccombe [mailto:philip(a)turnstone.co.nz] Sent: Monday, 27 October 2008 10:14 To: TreeNet Admin Cc: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] NLNZHarvester2008 I'm guessing multiple domains pointing at the same data meant the 80gig was replicated to 260gig? At what point can you say that international standards (ie robots.txt) were not observed thus they should reimburse you for bandwidth costs? 260gig of spam is going to be a fantastic thing for someone to look through in the future, it really does show the true culture of New Zealand....</sarcasm> Philip Seccombe -----Original Message----- From: TreeNet Admin [mailto:admin(a)treenetnz.com] Sent: Sun 10/26/2008 7:08 PM To: Philip Seccombe Cc: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] NLNZHarvester2008 Philip Seccombe wrote:
I would be very interested in how much data was collected in this.
I'd also be interested if it was a basic harvest or there was some smart archiving done for duplicate files etc Eg for a couple of customers they have between 4 and 8 or so sites all pointing at the same place
Also as a computer tech I'll put up a download directory to grab programs from for cleaning customers pc (eg spyware utils, general apps, service packs), a quick look is showing 1 gig of data there, and we have that as .co.nz .net.nz and as different domains just so if I tell a customer over the phone to download something to fix they won't make a mistake. Funny you'll be using probably 4gig just on my spyware apps and service packs because somehow its document heritage to New Zealand...of programs made mostly in the states :)
Ah well, in 30 years I guess someone will be interested to see what the internet looked like back in 2008. It's also probably a cheaper option than our government spending $100 million to hire people to decide what should and shouldn't be kept
Cheaper? only for NatLib. We who host are paying the bill for this. And no, they are not doing any smart filtering for duplication. They managed to download 260GB+ of international WHOIS and spam archives from an 80GB disk drive here before the harvest IPs got firewalled. I'm not pleased. PS. natlib: robots.txt is often expressly setup to prevent this type of 'accident'. AYJ -- This message was scanned by Turnstone Spam Filter and is believed to be clean. Click here to report this message as spam. http://spamfilter.turnstone.co.nz/cgi-bin/learn-msg.cgi?id=6BB6E28035.49CC9