> have unanticipated consequences, as well. For instance, if a site
> makes available a notification scheme to which "anyone" can hook
> themselves as a dependent (receive notifications), that site may find
> that lots of non-crawlers are hooking themselves. A good thing for
> the non-crawler site wanting change notification, but it does raise the
> notification load, to the point where it may exceed the original crawler-
> load. (sort of like highway maintenance around here -- build a road,
> and it encourages more houses, ergo traffic gets worse than it was
> before the road).
>
> This creates a tension between sending an "i'm changed" note,
> when the administrator knows that it will create a flood of download
> requests, and wanting to keep constituents apprised of the current
> state of the site. If you don't know your constituents, where's the
> value proposition?
who ever hooks in, eg, with HTTP LINK, can be notified, eg, with HTTP
UNLINK, according to taste. with reverse-path and target relative-urli+
arguments to link, the service providing HTTP GET resource can call unlink
only when it renames or deletes the URI. so there's not an extreme
overhead involved because the coarser URI space changes (which imply larger
numbers of links affected by their unlinks) occur infrequently.
a wanderer is controlled by "/robots.txt" and not necessarily via
notification, except to perform unlink calls which are, on average over
quantity and frequency, infrequent.
+( an urli is a URI whose content does not change relative to user input and
so it locates a resource in a universal syntax)
the value proposition is that links into your subweb won't get lost and
people can find your documents. dangling links into your domain are
necessarily less likely manually stemmed and retried than good links are
once tried. retrieval occurs when links are successful (or "hit" from the
resource server's perspective) which is link attempts minus bad links that
do not hit.
-john
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html