Re: Looking for a spider

Xiaodong Zhang (xiaodong@softshell.com)
18 Oct 1995 15:13:44 -0700


Reply to: RE>>Looking for a spider

7/24/95 - Frontier Technologies licenses Lycos Internet
Catalog software

MEQUON, WIS. (July 24) BUSINESS WIRE -July 24, 1995--Frontier Technologies
Corp.
today announced it has signed an agreement to license the Lycos(TM) Internet
Catalog.

The Lycos Catalog has been incorporated into Frontier Technologies" new
SuperHighway Access
product, called SuperHighway Access CyberSearch(TM), which allows users to
perform a Lycos
search offline via CD-ROM, connecting to the Internet only once relevant
Internet resources have
been identified.

The Lycos technology was developed at Carnegie Mellon University, and was
recently transferred
to Lycos Inc., a newly-created subsidiary of CMG Information Services Inc.
Lycos is a software
system which contains a robot that searches the World Wide Web and catalogs
the documents it
finds. It also includes an information search engine that helps users access
information quickly and
easily when they type in key words or topics. The Lycos exploration robot
locates new and
changed documents and builds abstracts, which consist of title, headings,
subheadings, 100 most
significant words and the first 20 lines of the document. The catalog is
continually updated by the
Lycos exploration agent. Frontier will receive regular updates from Lycos
Inc., allowing it to
produce monthly issues of SuperHighway Access CyberSearch.

"It's now widely understood that one of the primary barriers to users"
productivity on the Internet is
finding information," said Dennis Freeman, Frontier Technologies" marketing
director. "That's why
Internet search services like Lycos are among the Internet's most popular
sites."

"Lycos Inc. is pleased to partner with Frontier as they contribute to our
continued position as the
most widely used and most comprehensive catalog product on the Web," said Bob
Davis, CEO of
Lycos Inc.

The product, now shipping, consists of a 608-megabyte subset of the Lycos
catalog, indexing about
half a million web pages, integrated with Frontier's multi-session,
multi-protocol Internet browser
software. The product is shipped on CD-ROM and is available through Frontier's
reseller channel.
The CD will be updated monthly (bi-monthly initially)

Frontier is offering the first issue of CyberSearch at $14.95. A charter
subscription for 6 issues is
priced at $6.75 per month. Subscribers should call 1-800/879-0075
(+1-414/571-0190 outside
the U.S.) or access Frontier's web server, http://www.frontiertech.com for
further information.

Lycos Inc., with offices in Wilmington, Mass. and Pittsburgh, Penn., is the
newly formed
corporation based upon technology developed at Carnegie Mellon University.

Frontier Technologies Corp., based in Mequon, is a leading supplier of TCP/IP
and Internet-based
products that make businesses more competitive in a global market.

CONTACT:

Frontier Technologies Corp., Mequon
Nicole Rogers, 414/241-4555 x293
or
Lycos Inc.
Mike Olfe, 508/657-5050 x3124

------------------------------
Date: 10/18/95 3:01 PM
To: Zhang, Xiaodong
From: robots@webcrawler.com

A colleague of mine and I are also doing research which is AI based
and are in need of a large corpus for our use. We would like to use
anything that is already available which keeps the structure of the
real WWW and does not take anything away. This is in order to create
realistic experiments of our approaches.

Thanks in advance for any pointers,

--Alvaro
Computer science and engineering department
University of California, San Diego

>
> Dear spider developpers.
>
>
> My name is Alain Desilets. I am a researcher in the Interactive
> Information Group of the National Research Council of Canada.
>
> We are a small group (6 people) developing tools for interactive
> access to information. Our technological angle on this problem is AI
> based approaches, in particular Machine Learning and Agents. You can
> find more about our work at http://ai.iit.nrc.ca/II_public/.
>
> In order to test our methods we need to acquire a large corpus of
> full HTML files from the Web. We plan to use a spider for that task.
>
> We are aware of the controversy surrounding the creation of new
> spiders and therefore do not plan to develop one. That
> would not only be a duplication of effort but would also introduce a
> new, possibly buggy spider in Koster's already vast list of Web
> critters. Instead, we would like to use a publically available, well
> behaved and proven spider.
>
> Is there such spider available for serious research purpose?
>
> Or maybe the corpus we need already exists? Is there a CD-ROM or .zip
> file that would give us the whole of the web in full HTML?
>
>
> Thanks for your help.
>
> Alain Desilets
>
> Institute for Information Technology
> National Research Concil of Canada
> Building M-50
> Montreal Road
> Ottawa (Ont)
> K1A 0R6
>
> e-mail: alain@ai.iit.nrc.ca
> Tel: (613) 990-2813
> Fax: (613) 952-7151
>
>

------------------ RFC822 Header Follows ------------------
Received: by zazu.softshell.com with SMTP;18 Oct 1995 14:59:25 -0700
Received: by webcrawler.com (NX5.67f2/NX3.0M)
id AA25902; Wed, 18 Oct 95 13:16:38 -0700
From: amonge@cs.ucsd.edu (Alvaro Monge)
Message-Id: <9510182013.AA10642@dino>
Subject: Re: Looking for a spider
To: robots@webcrawler.com
Date: Wed, 18 Oct 1995 13:13:55 -0700 (PDT)
In-Reply-To: <9510181831.AA06646@ai.iit.nrc.ca> from "Alain Desilets" at Oct
18, 95 02:31:39 pm
X-Mailer: ELM [version 2.4 PL23]
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 1865
Sender: owner-robots@webcrawler.com
Precedence: bulk
Reply-To: robots@webcrawler.com