Re: robot.polite

Terry O'Neill (toneill@mariner.com)
Tue, 18 Jun 96 00:30:31 PDT


--- On Mon, 17 Jun 1996 21:20:19 -0700 Martijn Koster <m.koster@webcrawler.com>
wrote:

>At 11:52 AM 6/17/96, Joe Nieten wrote:
>
>>I agree. Why can't we define a "configuration" that robots could download
>>and use whenever they access that pariticular web site? This would allow
>>administrators to set AccessInterval or something like that. In fact, you
>>could specify AccessHours 8a - 5p: 15 seconds or something like this to say
>>that you don't want a robot to access you site more frequently than every 15
>>seconds between the hours of 8am and 5pm. You could even specify what days
>>of the week this would apply to.
>>
>>Eh ... just a thought.
>
>And a recurring one. The questions are:
>

>- what problem does this solve? Show me the multitudes of people complaining
> that they are extremely unhappy having robots visit at certain hours;

Without regard to the implementability of the proposed configuration
stuff, I think that while perhaps there aren't multitudes
of people who are *extremely* unhappy about having robots
visit at certain hours, there are lots of people who would
sleep better at night if they knew that robots were confined
to the wee hours of the morning. For example, anyone with a
small bandwidth connection (i.e. a site connected by ISDN;
I was one of these once :-) ) would prefer that any robots
process the site at the times when real people are least
likely to visit their site, in order to minimize person-robot
contention for the line. Bigger sites would not care of course,
but I know that this used to be a concern for us when our site
was still incubating and robots showed up to give us the once over.

Another very good reason for implementing some sort of time of
day limiter is of course to help the robot avoid times of day
when the site may be down. I know that we lived in constant
terror that the much anticipated visits from certain big
Internet indexes would come just at the wrong time and go
away empty handed, leaving us unreferenced (and therefore
non-existent) on the Internet.

Additionally, (without any thought for the poor sot building
the robot), my true desire as the operator of a web site would
be to have a some parameter to tell a robot that it should
visit sometime after next Tuesday (or whatever) since that's
when we make changes (or whatever), and perhaps further that
the robot should visit every Tuesday since that's when we
always plug in the new stuff...etc.

Further, I'd like to tell the robot that certain new stuff
replaces some old stuff, whereas other new stuff is in fact
new. I'd also love it if there was some way for me to tell a
robot some things about my pages, such as which page should
be considered the the top level page in my site for indexing
purposes.

The list of nice-to-haves is endless, but the point is that
I don't think the desirability of lots of new robot controlling
configuration parameters for those of us who operate web sites
is really in question. I think most Webmasters would love
to be able to exert a lot more control over robots than they
have now. Further, even the robot would benefit from such
control since sites would be sure of being properly prepared
for robot visits, and be able to provide higher quality
indexable material.

But wishes are not fishes, and I understand there are issues...

>- are you going to be able to convince robot-authors to comply? They have
> lots of more interesting things they also don't get to :-)

I guess we should be clear about the difficulty of compliance here.
It wouldn't be too difficult to have a robot comply with the idea
that it should not index a site -- for example, at a given point in
time ("now") for a certain reason (it's not midnight yet.) The hard
part for the robot constructor is building the code that plans the time
when the robot should try the site again. Imagine the difficulty
if every site has a different idea of when a robot should visit.
Think too about what happens if fifteen robots leave a site thinking
they should come back on Tuesday at midnight... robot fight! Might
be fun to watch... :-)

>- are you going to be able to evangelise this to all webmasters out there?
> Or will this cause more confusion than comfort...

Doesn't seem like this would be required. Webmasters who want to
limit robots to certain times of day would take the trouble; those
who don't care wouldn't bother; seems like that's pretty much the
current status of the robot exclusion system anyway. But let's think
about this...

-------------

As the Internet gets bigger and better, I think it is inevitable that
there will be much greater cooperation between indexers and indexees.
Stuff such as what has been asked for here *will* be implemented, and
much more as well, precisely because robots are picking up too
much garbage using the existing shotgun approach.

Only a few robots will implement these more complex standards, and
these robots will belong to sites who have a real (read financial)
interest in having their robots retrieve the highest quality of
information. The same goes for Webmasters. Those that have a real
interest in keeping the indexes behind the robots up to date will
make the effort to work with a far more complex standard for
robot-website interaction than we see today.

It is probable that this standard will not evolve from the existing
standard so much the focus of this group, largely because there
are too many robots and too many sites for any consensus to emerge
for extending the protocol. Instead, I think it is likely that some
large organisation with control over both ends of the process
(indexer and indexee) will create an internal protocol, generalize
it to some external organisations with whom it has relationships,
others will want to participate and begin to implement support, and
the new "standard" will emerge from there.

Terry O'Neill
mariner.com

======================================================
Sent by : Terry O'Neill <toneill@mariner.com>
Sent when: 6/18/96 / 12:30:31 AM
------------------------------------------------------
Never be insulted by an e-mail; usually the author
didn't mean to offend and even if he did, life is
full of many things far more worthy of your concern.
======================================================