Re: Proposed URLs that robots should search

Andrew Daviel (andrew@andrew.triumf.ca)
Tue, 24 Oct 1995 02:48:19 -0700 (PDT)


Let's see if I can reply to everyone without getting in a tangle ... :)=

>>I'm trying to build a database of URLs for business...
>I can't quite contain the urge to say, "Isn't everyone?"
Know any good ones? Nothing jumped out at me from CUSI, or Submit-It, etc.

>I have seen few .. business sites that don't offer text-only versions
I seem to keep seeing sites that say "Works best with Netscape 1.2 - get
it!"

>Could we start .. standard way to set forth the name of the site?
Having it in the <title> of the document root is quite common, but you get
"BloggCo Home Page", "Welcome to BloggCo", and sometimes
"Welcome to B L O G G C O". I've tried looking for non-dictionary words
with some success.

>>/linecard.txt - for commercial sites, a text file with comma-delimited
>> line items (brands) manufactured or stocked

>This will drown in details.
>Yup.
This was a suggestion from a professional buyer. Sure, collecting these
for the whole world would get out of control, but with a small enough
scope it might be manageable. The buyers look up brand names in a huge
12-volume book to find distributors or manufacturers. Finding who stocks
Motorcraft in Tipperary can't produce that many records.

>Well, I hate to repeat myself, but ALIWEB's /site.idx will give you ..

Didn't know about it. Looks like what I was thinking of. I see it has
keywords ( >..Disagree greatly. This opens a giant can ... )

> >/robots.htm - an HTML list of links
> Why HTML?

A simplistic idea. I figured that if existing robots are written to
traverse HTML, then giving them an HTML file to start from would be
fairly easy.

Re. site.idx, is this a fairly open-ended list of fields? I had in mind some
fields relevant to larger businesses, like Sales-Email, Info-Email,
Tech-Email, Sales-FaxBack, etc. etc. for voice, fax, email where some places
may have separate hotlines for hardware, software, licenses, etc. How to
handle this for big concerns that have one website and hundreds of regional
offices is another problem.

I find the Lat/Long format in IAFA a bit strange; I use the "standard"
navigational format from navigation books, GPS and Loran, etc. eg. 49D14.7N
123D13.6W, except that as there isn't a degree symbol in ASCII I've used "D",
which makes it similar to the NMEA0182 format. The current NMEA0183 standard
for navigation equipment would use something like:
$LCGLL,4001.74,N,07409.43,W for 40 degrees 1.74 minutes North, 74 degrees
9.43 minutes West. Anyway, it's just bits and easy enough to convert.

>How are you going to get a system administrator to implement all these
>files?

Well, one might assume that a good many HTML authors and Webmasters read
comp.infosystems.author.html, or whatever it's called. Or one could
just send them all mail ... 50,000 returned mail messages wouldn't make
too much of a dent in my disk ... :)=

>I'd propose it be implemented into the HTTP protocol ..
I'd think it might take a while for everyone to update their
servers - say, at least 2 years...

Andrew Daviel email: advax@triumf.ca