Re: indexing via redirectors

Sigfrid Lundberg (siglun@gungner.ub2.lu.se)
Tue, 21 Jan 1997 15:20:28 +0100 (MET)


> Sigfrid Lundberg writes:
> >
> >
> > > My question is: What do spiders usually do when they come across such a
> > > link?
> > > Will they follow the redirection? If yes, which URL is then indexed?
> > > This seems to be important for commercial purposes (pay-per-hits). If
> > > the final (destination) URL is indexed, my program will be useless.
> > >
> >
> > If a spider follows redirects at all, it will delete your URL, and index the
> > destination. That's the only sensible way to handle redirects in general,
> > isn't it?. Your URL does not deliver any content. This is indeed one of
> > the problems with the OCLC PURL schemes.
> >
> > Sigfrid
>
> Hi Sigfred,
>
> I missed your concern about the PURL service (and redirection in
> general). Could you elaborate? You point about *how* people use it

Wonder why I put that PURL line there, actually irrelevant. Sorry.
However, being a person running a robot for the building of resource
discovery databases, I would love persistent URLs, even more the real
things, URNs. I have actually thought of making a distinction between 301
and 302, if hostname contains the string "purl".

> is valid, however, it seems that the problem lies in the fact that
> browsers and robots interpret 301 "Move Permanent" and 302 "Move
> Temporary" redirections as the same. A "Move Permanent" redirection
> should delete the source URL and use the redirected target URL and
> target document. This indicates that there has been a permanent

Had I followed the RFCs in this context, my database would have been
full of rubbish. A common practice is the use of status 302 and redirection
when any error occurs, for example:

Location: http://someplace/403.html

Also, a LOT of sites redirects when there should really been
status 404. Not to mention those who silently delivers something
user-friendly with status 200 in place of the requested documents.
Search for "Error" in the title field in, say, AltaVista, and you'll get
the point.

Don't blame the indexers! It is the webmasters fault!

Sigfrid

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html