Re: Possible robots.txt addition

Klaus Johannes Rusch (e8726057@student.tuwien.ac.at)
Wed, 6 Nov 1996 21:59:00 CET


In <08QlzGAc+CgyEwVU@webfeet.co.uk>, Martin Kiff <mgk@webfeet.co.uk> writes:
> I still had to email the maintainers of the search engines to drop the
> old pages. I did have my mail address in the header of each page so that
> the maintainers could mail and confirm that the 'delete' request was not
> spoofed email... It still took 6 to 9 months to effectively complete the
> move.

Probably also due to the fact that search engines are compared by the number of
documents they have indexed, so most maintainers are eager to add new pages but
rather slow in removing outdated information.

> Anybody care to comment on these (or other) techniques? The Web
> does seem to need a way of handling moves and renames... What will HTTP
> 1.1 bring?

I think it was TBL at WWW3 who said "URLs don't change, people change them".
- how true. Throwing technology at organizational problems is not a good
solution. Rather than finding ways of getting changed URLs to search engines we
should probably aim at

- adding reasonable expiry information to documents, so that search engines
have an idea of how often to visit a document and check for updates or
deletions
- not changing document locations at all.

Using a department or group name as part of the URL is a bad idea. Grouping by
organizational hierarchies in the development process is obviously a must, yet
once the documents go on a web server they should rather be viewer-centric,
e.g. /Services, /Products, /CorporateInformation, not /ServiceUnitInternet,
/ModemDivision and /CommunicationsDepartment.

/Services, /Products and /CorporateInformation is very unlikely to become
obsolete ever, whereas "Service Unit Internet" may well decide to be named
"Multimedia Group".

Specifying appropriate response codes to search engines (or basically, to
anybody, as you can safely redirect human browers to the new location as well)
is the way to get the information updated in search engines.

BTW you should send a response code "301 Move permanent" rather than a
"302 Redirected", as the 302 implies the redirection may change to a different
location later, so engines are likely to check your site over and over again.

Klaus Johannes Rusch

--
e8726057@student.tuwien.ac.at, KlausRusch@atmedia.net
http://www.atmedia.net/KlausRusch/