>
> >On Thu, 11 Jul 1996, Rob Hartill wrote:
> >
> >>
> >> A few people have suggested that sites should use POST to protect
> >> sites from unwanted attention from robots. Could those people take
> >> a few minutes to surf around "http://us.imdb.com/" and then come
> >> back here and admit that POST is NOT A SOLUTION.
> >
> >Nope. Because you are wrong. I looked. POST could be used on imdb.com.
>
> No it cannot. I've been running the site for 3 years and you think
> you know it better than me after a few minutes?
Funny - you thought I could *confirm* your opinion regarding POST there in
those 'few minutes'. Guess it only applies to *agreeing* with you. I am
not impressed with your 'three years' figure. Your HTML is broken on the
*syntactic* level - and you expect me to accept your 'expert' opinion on
the capabilities of HTML and CGI? I could do it - other people could do it
- that you can't do it is your problem.
> >A link:imdb.com search on Alta-Vista returned 7,000 matches (600 or so of
> >these were internal to imdb.com - link:www.moviedatabase.com gave about
> >173 matches total).
>
> a) Alta-Vista's link counting system is a random number generator. Sit
> on the reload button over a period of time and you get results
> that show say 8,000 one minute and 40,000 the next. Alta-Vista
> is not a reliable source of statistics.
Funny - it gave me the the same numbers that day, the next day, and
today - a week later. Pretty darn consistent for a 'random number
generator'. But then - I know how to write good search requests.
> b) AFAIK, Alta-Vista counts documents containing links and not links. The
> IMDb is a resource that generates lots of linking from individual
> documents.
True, but then - there were only 2,000 documents linked into your search
engine tree. You would have to have 30+ links per page to manage 60,000+.
Funny - very few of them seemed to have as many as 10 links...The
remaining 5,000 were to sections of your site that would not inspire
multiple linking - the top page mainly.
> c) You didn't count all the possible URLs that we have now and have used in
> the past which automatically bounce to new URLs.
Fine. Give me the domain names for those URLs and I will add them in. By
the way - you never did answer if you were using a referer_log analyzer to
actually count links...Just what *is* your methodology for determining
that there are '60,000+' links?
>
> Anyway, this is pointless, if you don't believe my stats I don't care, but
> don't try and produce bogus ones of your own to disprove mine.
I take it you are not a big believer in *independant* audits, either...
> >Before you say something stupid like 'but that would take lots of
> >storage', the answer to that is 'so?'. Storage is dirt cheap. I
> >am looking at an ad right now that offers a fast 2.9 Gig SCSI-II drive
> >for $339 dollars. Your database reports aren't *that* big and they
> >don't vary that much from run to run. And your system performance
> >would improve to boot. Running CGI unnecessarily is evil.
>
> Your brief visit to my site now makes you an expert on the internal
> workings of the database. Your assumptions are wrong. This takes us
> back to a point that I have to keep making.. people should not assume
> that their generic ideas or assumptions apply to all sites.
Funny how you know how long my visit was. Just what was this magic
tool that gave you that information? Espcially since you do not
even know what domain(s) I web browse from...
I checked how your reports are generated. You have a series of menus if
there are more than one match and basically static pages for final results
- one match per final page. The final result pages (and their links) are
easily convertable to static HTML. And in fact you *DO* this for your
cache. Haven't you even noticed that that is in fact what your cache does?
You are 90% of the way to a dynamic database/static HTML site and don't
even seem to know it.
> >As for the legacy problem - it would be quite easy to write a special
> >purpose script to handle the 2000 or so direct links in existance. Since
> >they are not going to change - a simple hashed lookup table kicking to the
> >final *static* URL would work. Load on your system - miniscule compared
> >with actually doing the searches. Programming effort - minimal.
>
> Yawn. I know my system inside out, so please don't assume what's easy
> or practical based on little understanding of how things work.
<sarcasm>You're right. I've only run twenty odd sites on a half dozen
different server softwares over two years. I have no idea how simple
things like server redirects work. After all - I've never had to use
them.</sarcasm>
> >And the problem with imdb.com and hyperlinks is not the search engines -
> >but your interface to your database.
>
> This'll be good. Please elaborate. The interface is a showcase for
> hyperlinking.
<pointed_remark>That's funny, I thought it was a showcase for movie
related trivia.</pointed_remark> If you have the fundamental
mis-understanding of what you are doing as 'showcasing hyperlinking' vs
'communicating information' there is little that can be done to help you.
Hyperlinking is only a *tool* for communicating information.
> >I could make it bookmarkable without
> >much effort at all
>
> Sure you could; you can do site analysis without even seeing what goes
> on behind the scenes. Your services must be in real demand.
They are. I bill $100 an hour and live comfortably.
> >> For every 'reasonable' reason to ignore robots.txt that people on this
> >> list can come up with, there's probably several counterexamples that
> >> would illustrate a potential problem.
> >
> >Good link validating robots like MOMSpider that check only one link at a
> >time with substantial pauses between successive hits on the same server.
>
> We're not talking about good robots. By definition they don't cause
> trouble. Their services are welcomed.
And so you concede that there *are* legitimate reasons for a _good_ robot
to ignore robots.txt? That was the question, after all..
-- Benjamin Franz