I've written a robot+searchengine (www.il.ft.hse.nl/ilse (dutch sites only))
and I just implemented the robots.txt 'stay away' stuff, but it caused some
problems...
First of all, to be a good (dutch) searchengine, my database must contain
as much pages as possible, but some sysadmins on hse.nl (aswell as other
sites) have put a robots.txt with :
User-Agent: *
Disallow: /
Anyway, I think robots.txt, if used correctly, can be of great help, but
things like this made me think again, should I use robots.txt or not ?
Sysadmins probably would like their sites to be as fast as possible, but
the data on those site will not be in Ilse, so people will blame Ilse for
being incomplete...
Have others seen the same ? if so, what should I do ?
I can't go contacting all the webmasters/roots at these machines... And I
don't want to disguard a possible /robots.txt...
Grts
Wiebe
PS - I'm new to this mailinglist, so if I've asked a faq, don't blame me :-)
--- wiebe@il.ft.hse.nl | Ilse - dutch searchengine http://www.il.ft.hse.nl/~wiebe/ | http://www.il.ft.hse.nl/ilse/