Re: [nznog] NLNZHarvester2008

25 Oct 2008

      Philip Seccombe wrote:
...
I would be very interested in how much data was collected in this.
I'd also be interested if it was a basic harvest or there was some smart
archiving done for duplicate files etc
Eg for a couple of customers they have between 4 and 8 or so sites all
pointing at the same place
Also as a computer tech I'll put up a download directory to grab
programs from for cleaning customers pc (eg spyware utils, general apps,
service packs), a quick look is showing 1 gig of data there, and we have
that as .co.nz .net.nz and as different domains just so if I tell a
customer over the phone to download something to fix they won't make a
mistake.
Funny you'll be using probably 4gig just on my spyware apps and service
packs because somehow its document heritage to New Zealand...of programs
made mostly in the states :)
Ah well, in 30 years I guess someone will be interested to see what the
internet looked like back in 2008. It's also probably a cheaper option
than our government spending $100 million to hire people to decide what
should and shouldn't be kept
Philip
Cheaper? only for NatLib. We who host are paying the bill for this.

And no, they are not doing any smart filtering for duplication. They 
managed to download 260GB+ of international WHOIS and spam archives from 
an 80GB disk drive here before the harvest IPs got firewalled. I'm not 
pleased.

PS. natlib: robots.txt is often expressly setup to prevent this type of 
'accident'.

AYJ

Re: [nznog] NLNZHarvester2008

TreeNet Admin