Since this list started I've only ever seen one suggestion
for an extension to robots.txt. That, from Tim Bray,
http://info.webcrawler.com/mailing-lists/robots/0001.html
seemed sensible enough -- to add expiry information for the
robots.txt file itself. No response appears to have been given
-- did people not think it worth while? Did people think the
HTTP response field, Expires, should be used for that?
I don't know if this was discussed to death somewhere -- but
are people still considering extensions to robots.txt? I'd be
interested in any pointers to an archive of such a discussion.
If there is point in discussion additions pls read on --
otherwise bin this mail.
MinRequestInterval: X
Minimum request interval in seconds, (0=no minimum),
with a default, if missing, of 60.
This is for those of us lowely enough not to have huge
gathering tasks and the luxury ;-) of a backlog of URLs
over distributed sites. (I.e. Those of us doing a
sequential search exhausting our interest in a site in
one slurp.) Additionally local admins would have more
control over wanderers that visted.
DefaultIndex: index.html
Stating that XXXX/ and XXXX/index.html are identicle.
You can argue that this is lamely inadequate - or that it
makes a saving. I know the bigger issue is recusion. Here
I am merely hoping to save those single page recusions.
CGIMask: *.cgi
Rather than guessing at CGI urls -- why not get the local
admin to answer it? I know that the WN server uses a file
extension to indicate a CGI script -- not /cgi-bin/.
Q: Are CGI scripts universally avoided in advance -- or do
robots look at the HTTP flags of results to try to work
out wether some content is dynamically generated?
Finally -- I never understood why robots.txt was exclusion only.
Why does it not have some of positive hints added? I.e. you are
allowed & welcome to browse XXXX/fred.html. Was this a choice
built upon pragmatism -- thinking that this would open a can of
worms?
Thanks for any feedback,
Adam
-- +1-203-730-5437 | http://www.micrognosis.com/~ajack/index.html