Sorry to flame you if you didn't mean to send it to the list.
Bonnie
>
> On Fri, 12 Jul 1996, Rob Hartill wrote:
>
> >
> > >On Thu, 11 Jul 1996, Rob Hartill wrote:
> > >
> > >>
> > >> A few people have suggested that sites should use POST to protect
> > >> sites from unwanted attention from robots. Could those people take
> > >> a few minutes to surf around "http://us.imdb.com/" and then come
> > >> back here and admit that POST is NOT A SOLUTION.
> > >
> > >Nope. Because you are wrong. I looked. POST could be used on imdb.com.
> >
> > No it cannot. I've been running the site for 3 years and you think
> > you know it better than me after a few minutes?
>
> Funny - you thought I could *confirm* your opinion regarding POST there in
> those 'few minutes'. Guess it only applies to *agreeing* with you. I am
> not impressed with your 'three years' figure. Your HTML is broken on the
> *syntactic* level - and you expect me to accept your 'expert' opinion on
> the capabilities of HTML and CGI? I could do it - other people could do it
> - that you can't do it is your problem.
>
> > >A link:imdb.com search on Alta-Vista returned 7,000 matches (600 or so of
> > >these were internal to imdb.com - link:www.moviedatabase.com gave about
> > >173 matches total).
> >
> > a) Alta-Vista's link counting system is a random number generator. Sit
> > on the reload button over a period of time and you get results
> > that show say 8,000 one minute and 40,000 the next. Alta-Vista
> > is not a reliable source of statistics.
>
> Funny - it gave me the the same numbers that day, the next day, and
> today - a week later. Pretty darn consistent for a 'random number
> generator'. But then - I know how to write good search requests.
>
> > b) AFAIK, Alta-Vista counts documents containing links and not links. The
> > IMDb is a resource that generates lots of linking from individual
> > documents.
>
> True, but then - there were only 2,000 documents linked into your search
> engine tree. You would have to have 30+ links per page to manage 60,000+.
> Funny - very few of them seemed to have as many as 10 links...The
> remaining 5,000 were to sections of your site that would not inspire
> multiple linking - the top page mainly.
>
> > c) You didn't count all the possible URLs that we have now and have used in
> > the past which automatically bounce to new URLs.
>
> Fine. Give me the domain names for those URLs and I will add them in. By
> the way - you never did answer if you were using a referer_log analyzer to
> actually count links...Just what *is* your methodology for determining
> that there are '60,000+' links?
>
> >
> > Anyway, this is pointless, if you don't believe my stats I don't care, but
> > don't try and produce bogus ones of your own to disprove mine.
>
> I take it you are not a big believer in *independant* audits, either...
>
> > >Before you say something stupid like 'but that would take lots of
> > >storage', the answer to that is 'so?'. Storage is dirt cheap. I
> > >am looking at an ad right now that offers a fast 2.9 Gig SCSI-II drive
> > >for $339 dollars. Your database reports aren't *that* big and they
> > >don't vary that much from run to run. And your system performance
> > >would improve to boot. Running CGI unnecessarily is evil.
> >
> > Your brief visit to my site now makes you an expert on the internal
> > workings of the database. Your assumptions are wrong. This takes us
> > back to a point that I have to keep making.. people should not assume
> > that their generic ideas or assumptions apply to all sites.
>
> Funny how you know how long my visit was. Just what was this magic
> tool that gave you that information? Espcially since you do not
> even know what domain(s) I web browse from...
>
> I checked how your reports are generated. You have a series of menus if
> there are more than one match and basically static pages for final results
> - one match per final page. The final result pages (and their links) are
> easily convertable to static HTML. And in fact you *DO* this for your
> cache. Haven't you even noticed that that is in fact what your cache does?
> You are 90% of the way to a dynamic database/static HTML site and don't
> even seem to know it.
>
> > >As for the legacy problem - it would be quite easy to write a special
> > >purpose script to handle the 2000 or so direct links in existance. Since
> > >they are not going to change - a simple hashed lookup table kicking to the
> > >final *static* URL would work. Load on your system - miniscule compared
> > >with actually doing the searches. Programming effort - minimal.
> >
> > Yawn. I know my system inside out, so please don't assume what's easy
> > or practical based on little understanding of how things work.
>
> <sarcasm>You're right. I've only run twenty odd sites on a half dozen
> different server softwares over two years. I have no idea how simple
> things like server redirects work. After all - I've never had to use
> them.</sarcasm>
>
> > >And the problem with imdb.com and hyperlinks is not the search engines -
> > >but your interface to your database.
> >
> > This'll be good. Please elaborate. The interface is a showcase for
> > hyperlinking.
>
> <pointed_remark>That's funny, I thought it was a showcase for movie
> related trivia.</pointed_remark> If you have the fundamental
> mis-understanding of what you are doing as 'showcasing hyperlinking' vs
> 'communicating information' there is little that can be done to help you.
> Hyperlinking is only a *tool* for communicating information.
>
> > >I could make it bookmarkable without
> > >much effort at all
> >
> > Sure you could; you can do site analysis without even seeing what goes
> > on behind the scenes. Your services must be in real demand.
>
> They are. I bill $100 an hour and live comfortably.
>
> > >> For every 'reasonable' reason to ignore robots.txt that people on this
> > >> list can come up with, there's probably several counterexamples that
> > >> would illustrate a potential problem.
> > >
> > >Good link validating robots like MOMSpider that check only one link at a
> > >time with substantial pauses between successive hits on the same server.
> >
> > We're not talking about good robots. By definition they don't cause
> > trouble. Their services are welcomed.
>
> And so you concede that there *are* legitimate reasons for a _good_ robot
> to ignore robots.txt? That was the question, after all..
>
> --
> Benjamin Franz
>
>