On Tue, 21 Oct 2008 14:29:44 +1400, "Alex Hague"
I'm from a small (but growing) Kiwi website that has community generated content etc. I think that there is an additional point that has been missed so far: the web is no longer static. The uncertainty principal begins to apply - by them crawling entire sites they may begin to interact with the content on the sites inadvertently.
As far as I'm aware, the big crawlers don't perform POST, PUT, DELETE queries. Seeing as HTTP requires GET to be idempotent, and not take any action other than retrieval, crawlers won't "interact" with well-designed websites if by "interact" you mean "change stuff".
For example there can be links to flag content as inappropriate. We use robots.txt to prevent crawlers from hitting this kind of link as well as indexing our APIs (which return XML | JSON) and are no use to a crawler (but which they seem to love indexing).
If the APIs return an appropriate Content-Type and the crawlers still retrieve them, then the crawlers are either genuinely interested in indexing the content retrieved by those APIs, or they're buggy and you should report the issue. Cheers, -- Jasper Bryant-Greene Network Engineer, Unleash ddi: +64 3 978 1222 mob: +64 21 129 9458