The other issue is how do you notify any remote spiders that the
information (in whatever format) is available for collection. Is there
a need for something like a "gatherer.txt" file in the web servers root
directory containing details of indexing data that is available, the
format that it is in and where to get it from.
For instance with to notify the web robot that a SOIF stream was available
from a Harvest Gatherer you could use a file containing something along
the lines of:
SOIF Harvest www.tardis.ed.ac.uk:8501
Other fields that you might want to supply would be the detail of the indexing
information - so a web crawler could decide whether it wants the most detailed
information that the site makes available, or just a brief overview - sites
could then provide these differing levels themselves, whilst still having
complete control over exactly what appears in each index.
The question is whether anyone would actually make any use of this if
it was available...
Comments?
Simon
_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html