Re: [nznog] NLNZHarvester2008
Hi all: Thanks to those of you who have contacted me after reading this thread. If you have not read our FAQ, we have addressed several of the issues that were raised here in that document: http://www.natlib.govt.nz/about-us/news/20-october-2008-web-harvest-faqs There are also a few new issues raised here that I’ll be adding to the FAQ shortly, particularly around notifying webmasters in advance and harvesting from an international service, both of which we could have handled better: 1. Several people have asked about notification, and some you have made practical and workable suggestions about how we can handle this better next time. In the current crawl, we could not see a good way to do so without effectively becoming spammers. In hindsight we could have communicated better with webmasters. When we decide to run the harvest again, we will make more of an effort to publicise the harvest in mailing lists and groups frequented by webmasters (such as this one). 2. Others ask why we are harvesting from the USA and not New Zealand? We have contracted the Internet Archive to conduct the harvest because they are the single most experienced provider of large-scale crawling services in the world. An unfortunate offshoot of this is that their servers are based in the USA. We hope that after observing the experts at work we'll be able to manage future harvests from within New Zealand. At the very least we have learned that we should locate some of the harvest servers in New Zealand. 3. Bmannign asks whether this is a recurring or once-off harvest? While we have not planned any further harvests at this time, it is likely that domain harvests will become a feature of the Library’s overall web harvesting programme. Analysis of the current harvest and research into various access issues will help determine frequency. Finally, As we have seen in other forums, a lot of you object to our robots.txt policy. Again, I can only say we understand your point of view, but in the context of this crawl we believe it is important that we harvest as much of the domain as possible, in order to preserve the web as it is today for the New Zealanders and researchers of the future. Simon Lyall illustrates our dilemma exactly above: people use robots.txt for different reasons, and in a perfect world (or possibly even an imperfect one where robots.txt developed into a standard) we would know why each robots.txt rule was written, and crawl more appropriately. Thanks, Gordon -- Gordon Paynter Technical Analyst National Digital Library National Library of New Zealand
On 21/10/2008, at 1:10 PM, Gordon Paynter wrote:
1. Several people have asked about notification, and some you have made practical and workable suggestions about how we can handle this better next time.
In the current crawl, we could not see a good way to do so without effectively becoming spammers. In hindsight we could have communicated better with webmasters. When we decide to run the harvest again, we will make more of an effort to publicise the harvest in mailing lists and groups frequented by webmasters (such as this one).
Can I suggest something here? Speak to the Most Evil Media too about the crawling. This is neither fool nor fail proof, but the tech media should be interested (and I can see a few stories about the archive already) and help publicise your intentions to those who don't participate in mailing lists.
2. Others ask why we are harvesting from the USA and not New Zealand?
We have contracted the Internet Archive to conduct the harvest because they are the single most experienced provider of large-scale crawling services in the world.
An unfortunate offshoot of this is that their servers are based in the USA.
We hope that after observing the experts at work we'll be able to manage future harvests from within New Zealand. At the very least we have learned that we should locate some of the harvest servers in New Zealand.
Umm, yes. :)
--
Juha Saarinen juha(a)saarinen.org http://www.techsploder.com
On Tue, Oct 21, 2008 at 2:10 PM, Gordon Paynter < Gordon.Paynter(a)natlib.govt.nz> wrote:
2. Others ask why we are harvesting from the USA and not New Zealand?
We have contracted the Internet Archive to conduct the harvest because they are the single most experienced provider of large-scale crawling services in the world.
Is there a particular reason why a NZ based proxy can not be used? I imagine this is not a short term, but a continuing project. Constant use of international bandwidth could be expensive for some NZ based sites. Second, or non NZ based sites with NZ oriented content what is the Natlib's policy on archiving this material? Is access to this material covered by NZ copyright laws, thus allowing Natlib to archive this material? Or is it covered by the laws of the country in which it is hosted, thus making this archival process a legal question. Particular material in the US which have pretty stick (and stupid IMO) copyright laws. Thirdly, what is the legal status on NZ based copyright material which is hosted overseas? Has Natlib read the recent Government ICT bulletin on use of offshore hosting services? Nicholas
While I'm sure the good people at Natlib are on top of it, but seeing as this is being discussed to the n'th degree. My question would be, who will hold this archived material long term ? I sincerely hope it will be physically (and diversely) stored and served in New Zealand not stored at a faceless overseas company that can change ownership, collapse or be otherwise invaded. If we are paying to have this material preserved (which IMHO is entirely appropriate), it needs to be here in a government owned (or suitably contracted and legally covered private) repository. This is the only way to ensure that it will survive long term.
"Tony Wicks"
21/10/08 8:07 p.m. >>> While I'm sure the good people at Natlib are on top of it, but seeing as
Hi Tony: The archived material will be hosted at the National Library in New Zealand (i.e. in our machine room). I'll update the FAQ with this information when I get a chance. Gordon -- Gordon Paynter Technical Analyst National Digital Library National Library of New Zealand +64 4 474 3114 this is being discussed to the n'th degree. My question would be, who will hold this archived material long term ? I sincerely hope it will be physically (and diversely) stored and served in New Zealand not stored at a faceless overseas company that can change ownership, collapse or be otherwise invaded. If we are paying to have this material preserved (which IMHO is entirely appropriate), it needs to be here in a government owned (or suitably contracted and legally covered private) repository. This is the only way to ensure that it will survive long term.
Gordon, Are you guys going to tell us how much data was collected in the end? I also note that the crawler stepped off .nz and went after links that were in .nz pages. eg. This page has links in it to pointclark.net which were also crawled. http://www.crra.org.nz/content/view/17/7/ Cheers Don Gordon Paynter wrote:
Hi Tony:
The archived material will be hosted at the National Library in New Zealand (i.e. in our machine room).
I'll update the FAQ with this information when I get a chance.
Gordon
-- Gordon Paynter Technical Analyst National Digital Library National Library of New Zealand +64 4 474 3114
"Tony Wicks"
21/10/08 8:07 p.m. >>> While I'm sure the good people at Natlib are on top of it, but seeing as this is being discussed to the n'th degree. My question would be, who will hold this archived material long term ? I sincerely hope it will be physically (and diversely) stored and served in New Zealand not stored at a faceless overseas company that can change ownership, collapse or be otherwise invaded. If we are paying to have this material preserved (which IMHO is entirely appropriate), it needs to be here in a government owned (or suitably contracted and legally covered private) repository. This is the only way to ensure that it will survive long term. _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
On 22/10/2008, at 1:17 PM, Don Gould wrote:
Gordon,
Are you guys going to tell us how much data was collected in the end?
I also note that the crawler stepped off .nz and went after links that were in .nz pages.
eg. This page has links in it to pointclark.net which were also crawled. http://www.crra.org.nz/content/view/17/7/
Why this happens is explained in the FAQ. -- Nathan Ward
I would be very interested in how much data was collected in this. I'd also be interested if it was a basic harvest or there was some smart archiving done for duplicate files etc Eg for a couple of customers they have between 4 and 8 or so sites all pointing at the same place Also as a computer tech I'll put up a download directory to grab programs from for cleaning customers pc (eg spyware utils, general apps, service packs), a quick look is showing 1 gig of data there, and we have that as .co.nz .net.nz and as different domains just so if I tell a customer over the phone to download something to fix they won't make a mistake. Funny you'll be using probably 4gig just on my spyware apps and service packs because somehow its document heritage to New Zealand...of programs made mostly in the states :) Ah well, in 30 years I guess someone will be interested to see what the internet looked like back in 2008. It's also probably a cheaper option than our government spending $100 million to hire people to decide what should and shouldn't be kept Philip -----Original Message----- From: Don Gould [mailto:don(a)bowenvale.co.nz] Sent: Wednesday, 22 October 2008 1:17 p.m. To: Gordon Paynter Cc: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] NLNZHarvester2008 Gordon, Are you guys going to tell us how much data was collected in the end? I also note that the crawler stepped off .nz and went after links that were in .nz pages. eg. This page has links in it to pointclark.net which were also crawled. http://www.crra.org.nz/content/view/17/7/ Cheers Don Gordon Paynter wrote:
Hi Tony:
The archived material will be hosted at the National Library in New Zealand (i.e. in our machine room).
I'll update the FAQ with this information when I get a chance.
Gordon
-- Gordon Paynter Technical Analyst National Digital Library National Library of New Zealand +64 4 474 3114
"Tony Wicks"
21/10/08 8:07 p.m. >>> While I'm sure the good people at Natlib are on top of it, but seeing as this is being discussed to the n'th degree. My question would be, who will hold this archived material long term ? I sincerely hope it will be physically (and diversely) stored and served in New Zealand not stored at a faceless overseas company that can change ownership, collapse or be otherwise invaded. If we are paying to have this material preserved (which IMHO is entirely appropriate), it needs to be here in a government owned (or suitably contracted and legally covered private) repository. This is the only way to ensure that it will survive long term. _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog -- This message was scanned by Turnstone Spam Filter and is believed to be clean. Click here to report this message as spam. http://spamfilter.turnstone.co.nz/cgi-bin/learn-msg.cgi?id=5F6B928027.01 403
Philip Seccombe wrote:
I would be very interested in how much data was collected in this.
I'd also be interested if it was a basic harvest or there was some smart archiving done for duplicate files etc Eg for a couple of customers they have between 4 and 8 or so sites all pointing at the same place
Also as a computer tech I'll put up a download directory to grab programs from for cleaning customers pc (eg spyware utils, general apps, service packs), a quick look is showing 1 gig of data there, and we have that as .co.nz .net.nz and as different domains just so if I tell a customer over the phone to download something to fix they won't make a mistake. Funny you'll be using probably 4gig just on my spyware apps and service packs because somehow its document heritage to New Zealand...of programs made mostly in the states :)
Ah well, in 30 years I guess someone will be interested to see what the internet looked like back in 2008. It's also probably a cheaper option than our government spending $100 million to hire people to decide what should and shouldn't be kept
Philip
Cheaper? only for NatLib. We who host are paying the bill for this. And no, they are not doing any smart filtering for duplication. They managed to download 260GB+ of international WHOIS and spam archives from an 80GB disk drive here before the harvest IPs got firewalled. I'm not pleased. PS. natlib: robots.txt is often expressly setup to prevent this type of 'accident'. AYJ
I'm guessing multiple domains pointing at the same data meant the 80gig was replicated to 260gig? At what point can you say that international standards (ie robots.txt) were not observed thus they should reimburse you for bandwidth costs? 260gig of spam is going to be a fantastic thing for someone to look through in the future, it really does show the true culture of New Zealand....</sarcasm> Philip Seccombe -----Original Message----- From: TreeNet Admin [mailto:admin(a)treenetnz.com] Sent: Sun 10/26/2008 7:08 PM To: Philip Seccombe Cc: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] NLNZHarvester2008 Philip Seccombe wrote:
I would be very interested in how much data was collected in this.
I'd also be interested if it was a basic harvest or there was some smart archiving done for duplicate files etc Eg for a couple of customers they have between 4 and 8 or so sites all pointing at the same place
Also as a computer tech I'll put up a download directory to grab programs from for cleaning customers pc (eg spyware utils, general apps, service packs), a quick look is showing 1 gig of data there, and we have that as .co.nz .net.nz and as different domains just so if I tell a customer over the phone to download something to fix they won't make a mistake. Funny you'll be using probably 4gig just on my spyware apps and service packs because somehow its document heritage to New Zealand...of programs made mostly in the states :)
Ah well, in 30 years I guess someone will be interested to see what the internet looked like back in 2008. It's also probably a cheaper option than our government spending $100 million to hire people to decide what should and shouldn't be kept
Philip
Cheaper? only for NatLib. We who host are paying the bill for this. And no, they are not doing any smart filtering for duplication. They managed to download 260GB+ of international WHOIS and spam archives from an 80GB disk drive here before the harvest IPs got firewalled. I'm not pleased. PS. natlib: robots.txt is often expressly setup to prevent this type of 'accident'. AYJ -- This message was scanned by Turnstone Spam Filter and is believed to be clean. Click here to report this message as spam. http://spamfilter.turnstone.co.nz/cgi-bin/learn-msg.cgi?id=6BB6E28035.49CC9
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not? If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\ Is there an ombudsman in the NZ govt that has this sort of thing in their portfolio? just my tuppence worth Russell Sharpe _____ From: Philip Seccombe [mailto:philip(a)turnstone.co.nz] Sent: Monday, 27 October 2008 10:14 To: TreeNet Admin Cc: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] NLNZHarvester2008 I'm guessing multiple domains pointing at the same data meant the 80gig was replicated to 260gig? At what point can you say that international standards (ie robots.txt) were not observed thus they should reimburse you for bandwidth costs? 260gig of spam is going to be a fantastic thing for someone to look through in the future, it really does show the true culture of New Zealand....</sarcasm> Philip Seccombe -----Original Message----- From: TreeNet Admin [mailto:admin(a)treenetnz.com] Sent: Sun 10/26/2008 7:08 PM To: Philip Seccombe Cc: nznog(a)list.waikato.ac.nz Subject: Re: [nznog] NLNZHarvester2008 Philip Seccombe wrote:
I would be very interested in how much data was collected in this.
I'd also be interested if it was a basic harvest or there was some smart archiving done for duplicate files etc Eg for a couple of customers they have between 4 and 8 or so sites all pointing at the same place
Also as a computer tech I'll put up a download directory to grab programs from for cleaning customers pc (eg spyware utils, general apps, service packs), a quick look is showing 1 gig of data there, and we have that as .co.nz .net.nz and as different domains just so if I tell a customer over the phone to download something to fix they won't make a mistake. Funny you'll be using probably 4gig just on my spyware apps and service packs because somehow its document heritage to New Zealand...of programs made mostly in the states :)
Ah well, in 30 years I guess someone will be interested to see what the internet looked like back in 2008. It's also probably a cheaper option than our government spending $100 million to hire people to decide what should and shouldn't be kept
Philip
Cheaper? only for NatLib. We who host are paying the bill for this. And no, they are not doing any smart filtering for duplication. They managed to download 260GB+ of international WHOIS and spam archives from an 80GB disk drive here before the harvest IPs got firewalled. I'm not pleased. PS. natlib: robots.txt is often expressly setup to prevent this type of 'accident'. AYJ -- This message was scanned by Turnstone Spam Filter and is believed to be clean. Click here to report this message as spam. http://spamfilter.turnstone.co.nz/cgi-bin/learn-msg.cgi?id=6BB6E28035.49CC9
On Mon, 27 Oct 2008, Russell Sharpe wrote:
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not?
If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\
If you were serious about trolling you would of course be looking at sections 250 (2) and 252 (1) of the Crimes Act [1] and trying to work out if certain people at the library are liable for imprisonment for a term "not exceeding 2 years" or "not exceeding 7 years". This email does not constitute legal advice :) [1] - http://www.legislation.govt.nz/ -- Simon Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT.
Part 10 (comprising sections 217 to 305) was substituted by a new Part 10 (comprising sections 217 to 272), as from 1 October 2003, by section 15 Crimes Amendment Act 2003 (2003 No 39). I haven't looked at these.. Yet... Russell Sharpe -----Original Message----- From: Simon Lyall [mailto:simon(a)darkmere.gen.nz] Sent: Monday, 27 October 2008 21:25 To: nznog Subject: Re: [nznog] NLNZHarvester2008 On Mon, 27 Oct 2008, Russell Sharpe wrote:
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not?
If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\
If you were serious about trolling you would of course be looking at sections 250 (2) and 252 (1) of the Crimes Act [1] and trying to work out if certain people at the library are liable for imprisonment for a term "not exceeding 2 years" or "not exceeding 7 years". This email does not constitute legal advice :) [1] - http://www.legislation.govt.nz/ -- Simon Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT. _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
"252 Accessing computer system without authorisation"(1) Every one is liable to imprisonment for a term not exceeding 2 years who intentionally accesses, directly or indirectly, any computer system without authorisation, knowing that he or she is not authorised to access that computer system, or being reckless as to whether or not he or she is authorised to access that computer system. "(2) To avoid doubt, subsection (1) does not apply if a person who is authorised to access a computer system accesses that computer system for a purpose other than the one for which that person was given access. "(3) To avoid doubt, subsection (1) does not apply if access to a computer system is gained by a law enforcement agency- "(a) under the execution of an interception warrantor search warrant; or "(b) under the authority of any Act or rule of the common law. That is it.. Did any get any request for authourisation??? Natlib is not a law enforcement agency... But under "(b) under the authority of any Act or rule of the common law. That's what we need to look for.. Anyone??? Russell Sharpe -----Original Message----- From: Simon Lyall [mailto:simon(a)darkmere.gen.nz] Sent: Monday, 27 October 2008 21:25 To: nznog Subject: Re: [nznog] NLNZHarvester2008 On Mon, 27 Oct 2008, Russell Sharpe wrote:
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not?
If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\
If you were serious about trolling you would of course be looking at sections 250 (2) and 252 (1) of the Crimes Act [1] and trying to work out if certain people at the library are liable for imprisonment for a term "not exceeding 2 years" or "not exceeding 7 years". This email does not constitute legal advice :) [1] - http://www.legislation.govt.nz/ -- Simon Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT. _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
Section 3 seems pretty limited in its application and specifies law enforcement agency - as Russell states NatLib is not a law enforcement agency there fore it is not covered by this clause nor its subsections. Ian Cousins (I'm not a lawyer either!) Russell Sharpe wrote:
"252 Accessing computer system without authorisation"(1) Every one is liable to imprisonment for a term not exceeding 2 years who intentionally accesses, directly or indirectly, any computer system without authorisation, knowing that he or she is not authorised to access that computer system, or being reckless as to whether or not he or she is authorised to access that computer system.
"(2) To avoid doubt, subsection (1) does not apply if a person who is authorised to access a computer system accesses that computer system for a purpose other than the one for which that person was given access.
"(3) To avoid doubt, subsection (1) does not apply if access to a computer system is gained by a law enforcement agency-
"(a) under the execution of an interception warrantor search warrant; or
"(b) under the authority of any Act or rule of the common law.
That is it.. Did any get any request for authourisation??? Natlib is not a law enforcement agency... But under "(b) under the authority of any Act or rule of the common law.
That's what we need to look for.. Anyone???
Russell Sharpe
-----Original Message----- From: Simon Lyall [mailto:simon(a)darkmere.gen.nz] Sent: Monday, 27 October 2008 21:25 To: nznog Subject: Re: [nznog] NLNZHarvester2008
On Mon, 27 Oct 2008, Russell Sharpe wrote:
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not?
If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\
If you were serious about trolling you would of course be looking at sections 250 (2) and 252 (1) of the Crimes Act [1] and trying to work out if certain people at the library are liable for imprisonment for a term "not exceeding 2 years" or "not exceeding 7 years".
This email does not constitute legal advice :)
[1] - http://www.legislation.govt.nz/
-- Simon Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
On Mon, 2008-10-27 at 22:10 +1300, Russell Sharpe wrote:
"252 Accessing computer system without authorisation"(1) Every one is liable to imprisonment for a term not exceeding 2 years who intentionally accesses, directly or indirectly, any computer system without authorisation, knowing that he or she is not authorised to access that computer system, or being reckless as to whether or not he or she is authorised to access that computer system.
"(2) To avoid doubt, subsection (1) does not apply if a person who is authorised to access a computer system accesses that computer system for a purpose other than the one for which that person was given access.
"(3) To avoid doubt, subsection (1) does not apply if access to a computer system is gained by a law enforcement agency-
"(a) under the execution of an interception warrantor search warrant; or
"(b) under the authority of any Act or rule of the common law.
That is it.. Did any get any request for authourisation??? Natlib is not a law enforcement agency... But under "(b) under the authority of any Act or rule of the common law.
That's what we need to look for.. Anyone???
Doesn't (2) apply? If it's a public website, everyone has access (is authorised), so they're merely accessing it for a different purpose. IANAL also :-) Richard
Hi, I suggest you boys join some sort of law list, this has no operational value, please take it offlist. Thanks, Patrick On 27/10/2008, at 10:28 PM, Richard Hector wrote:
On Mon, 2008-10-27 at 22:10 +1300, Russell Sharpe wrote:
"252 Accessing computer system without authorisation"(1) Every one is liable to imprisonment for a term not exceeding 2 years who intentionally accesses, directly or indirectly, any computer system without authorisation, knowing that he or she is not authorised to access that computer system, or being reckless as to whether or not he or she is authorised to access that computer system.
"(2) To avoid doubt, subsection (1) does not apply if a person who is authorised to access a computer system accesses that computer system for a purpose other than the one for which that person was given access.
"(3) To avoid doubt, subsection (1) does not apply if access to a computer system is gained by a law enforcement agency-
"(a) under the execution of an interception warrantor search warrant; or
"(b) under the authority of any Act or rule of the common law.
That is it.. Did any get any request for authourisation??? Natlib is not a law enforcement agency... But under "(b) under the authority of any Act or rule of the common law.
That's what we need to look for.. Anyone???
Doesn't (2) apply? If it's a public website, everyone has access (is authorised), so they're merely accessing it for a different purpose.
IANAL also :-)
Richard
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
I haven't seen any mention of beer or networking in the last few posts. Stick a fork in it, I think this thread is done Dean Patrick Jordan-Smith wrote:
Hi,
I suggest you boys join some sort of law list, this has no operational value, please take it offlist.
Thanks, Patrick
On 27/10/2008, at 10:28 PM, Richard Hector wrote:
On Mon, 2008-10-27 at 22:10 +1300, Russell Sharpe wrote:
"252 Accessing computer system without authorisation"(1) Every one is liable to imprisonment for a term not exceeding 2 years who intentionally accesses, directly or indirectly, any computer system without authorisation, knowing that he or she is not authorised to access that computer system, or being reckless as to whether or not he or she is authorised to access that computer system.
"(2) To avoid doubt, subsection (1) does not apply if a person who is authorised to access a computer system accesses that computer system for a purpose other than the one for which that person was given access.
"(3) To avoid doubt, subsection (1) does not apply if access to a computer system is gained by a law enforcement agency-
"(a) under the execution of an interception warrantor search warrant; or
"(b) under the authority of any Act or rule of the common law.
That is it.. Did any get any request for authourisation??? Natlib is not a law enforcement agency... But under "(b) under the authority of any Act or rule of the common law.
That's what we need to look for.. Anyone???
Doesn't (2) apply? If it's a public website, everyone has access (is authorised), so they're merely accessing it for a different purpose.
IANAL also :-)
Richard
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
One would argue that by sticking something online / making something available to the public, you implicitly give permission to all and sundry to access the information. Hate to play devils advocate, but if your "charged by the meg" and you don't have alerts setup over certain thresholds, if you get a bill that slams you, it's your own fault. robots.txt is (AFAIK) always been "optional", if you don't want someone to access something, filters and restrictions are available to you. Posturing, and blowing wind isn't going to change anything, chock it up to a lesson. If you got hammered with a nasty bill, reconsider your hosting options. Russell Sharpe wrote:
"252 Accessing computer system without authorisation"(1) Every one is liable to imprisonment for a term not exceeding 2 years who intentionally accesses, directly or indirectly, any computer system without authorisation, knowing that he or she is not authorised to access that computer system, or being reckless as to whether or not he or she is authorised to access that computer system.
"(2) To avoid doubt, subsection (1) does not apply if a person who is authorised to access a computer system accesses that computer system for a purpose other than the one for which that person was given access.
"(3) To avoid doubt, subsection (1) does not apply if access to a computer system is gained by a law enforcement agency-
"(a) under the execution of an interception warrantor search warrant; or
"(b) under the authority of any Act or rule of the common law.
That is it.. Did any get any request for authourisation??? Natlib is not a law enforcement agency... But under "(b) under the authority of any Act or rule of the common law.
That's what we need to look for.. Anyone???
Russell Sharpe
-----Original Message----- From: Simon Lyall [mailto:simon(a)darkmere.gen.nz] Sent: Monday, 27 October 2008 21:25 To: nznog Subject: Re: [nznog] NLNZHarvester2008
On Mon, 27 Oct 2008, Russell Sharpe wrote:
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not?
If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\
If you were serious about trolling you would of course be looking at sections 250 (2) and 252 (1) of the Crimes Act [1] and trying to work out if certain people at the library are liable for imprisonment for a term "not exceeding 2 years" or "not exceeding 7 years".
This email does not constitute legal advice :)
[1] - http://www.legislation.govt.nz/
-- Simon Lyall | Very Busy | Web: http://www.darkmere.gen.nz/ "To stay awake all night adds a day to your life" - Stilgar | eMT.
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
_______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog
[ my own personal opinion, and quite probably not that of my employers ]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Leon Strong wrote:
One would argue that by sticking something online / making something available to the public, you implicitly give permission to all and sundry to access the information.
That's what the lawyers would say, the judge would agree, and the case would be thrown out. A website is an implicit invitation to view. If a person with a browser and no login credentials can access the content, you've made it available to the world. People suggesting that maybe NLNZ has breached the Crimes Act by crawling while ignoring robots.txt (as opposed to finding ways around authorisation restrictions) have a *VERY* tenuous grip on reality. - -- Matthew Poole "Don't use force. Get a bigger hammer." -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJBig2TdEtTmUCdpwRArrGAKCHJCSSmCAAPmPKp6+sv39ru+G5yACcDxw8 NtO4Acldt1K9QH5uZGvZe8g= =gvec -----END PGP SIGNATURE-----
Subscribers to this list should please note paragraph seven of the Acceptable Use Policy: 7. Postings of a political, philosophical or legal nature are discouraged. Discussion of the role of robots.txt is appropriate to this list. Would-be legal opinions are not, especially from people who are no more lawyers than I am. - Donald Neal NZNOG List Administrator -- Donald Neal | "Go to the bloody lectures! ... if you Research Officer | don't turn up, you will get further behind WAND | ... why lay out the cash and throw away the The University of Waikato | product?" - Prof. Steve Jones, UCL
Apologies Donald... Should have included that. IANMALTDN is my new default... ;-) But surely this is an issue that will have operational impact on network operators? Cheers Paul -----Original Message----- From: Donald Neal [mailto:dmneal(a)wand.net.nz] Sent: Tuesday, 28 October 2008 9:55 a.m. To: 'nznog' Subject: Re: [nznog] NLNZHarvester2008 Subscribers to this list should please note paragraph seven of the Acceptable Use Policy: 7. Postings of a political, philosophical or legal nature are discouraged. Discussion of the role of robots.txt is appropriate to this list. Would-be legal opinions are not, especially from people who are no more lawyers than I am. - Donald Neal NZNOG List Administrator -- Donald Neal | "Go to the bloody lectures! ... if you Research Officer | don't turn up, you will get further behind WAND | ... why lay out the cash and throw away the The University of Waikato | product?" - Prof. Steve Jones, UCL _______________________________________________ NZNOG mailing list NZNOG(a)list.waikato.ac.nz http://list.waikato.ac.nz/mailman/listinfo/nznog ----------------------------------------------------------------------------------------------- Unless otherwise stated, any views or opinions expressed are solely those of the author and do not represent those of Vodafone New Zealand Limited.
I would be curious who has actually tracked this traffic (themselves or via upstream ISP) and come up with a cost to their organisation? Are they invoicing natlib? if not.. why not? If I were to do this sort of trolling.. personally, I would resolve addresses to ip's and sort out what is local NZ IP's and off shore and troll from the appropriate source... as much as I feel the ignorance of robots.txt is against the goodwill and security of the big bad world :\ ________________________________ I do find it quite amusing as to how many people in this country seem to have such a simplistic view on contracts that they think that anyone that may have caused them trouble they can randomly invoice for their time/expenses. You can invoice people with whom you have a service/supply contract, any one else your only course of reproach is court action. While the actions of NATLIB in this case may or may not be questionable with the ignoring of the robots.txt file this does not give anyone the right to think that they can invoice them for accessing their publicly available website. If YOU as a web hoster have decided to serve websites, the basis for how you have contracted your bandwidth is YOUR problem. If you want to be protected against unforeseen spikes in traffic get flat rate hosting, not data charged hosting. Frankly, I have always thought hosting websites on a data charged basis is a very risky and short sighted option, any one in the world is quite entitled to drag whatever traffic they want off your site as much as they wish and cost you money. It makes no difference who or what is sucking traffic off your website, if you have chosen to host websites on a data charged basis that's your choice and you need to live with the consequences. If you want piece of mind get a flat rate option (yes they are available in NZ), otherwise stop winging. My 2c
On 28/10/2008, at 8:02 AM, Tony Wicks wrote:
[Words] ... stop winging.
Best bit of advice yet in this thread. Natlib could obviously have handled this in a smoover manner[1], but seriously, people are getting a bit carried away here. Robots.txt is not an authentication mechanism, and threatening to sue/ invoice people who view information from your publicly accessible website (regardless of if they did so over a domestic or international link) when the whole point of putting that information on said public website and then advertising it domestically and internationally is to present that very information for people to view ... complaining about it when someone does that seems a bit precious. I think your reaction should maybe be to roll your eyes, possibly even mutter something like "Dammit!" and maybe then reconsider your hosting options if someone doing one complete scan of your site can influence your charges so much that it hurts. JSR [1] They could have said things like "Damn, girl!" and "Aw-w Yeah-h- h!" and maybe worn a cool hat. -- John S Russell Big Geek. Doing Geek Stuff.
John Russell wrote:
On 28/10/2008, at 8:02 AM, Tony Wicks wrote:
[Words] ... stop winging.
Best bit of advice yet in this thread.
http://www.youtube.com/watch?v=mh6pZQX22CQ Warning: may contain language that non-beer drinkers may find offensive. Also as it's a downloaded video some people may find the cost of viewing this excessive.
participants (21)
-
Andy Linton
-
Brislen, Paul, VF-NZ
-
Dean Pemberton
-
Don Gould
-
Donald Neal
-
Gordon Paynter
-
John Russell
-
Juha Saarinen
-
Leon Strong
-
Mark Harris
-
Matthew Poole
-
Nathan Ward
-
Nicholas Lee
-
Patrick Jordan-Smith
-
Philip Seccombe
-
Phonenet
-
Richard Hector
-
Russell Sharpe
-
Simon Lyall
-
Tony Wicks
-
TreeNet Admin