robot-id: Acme.Spider robot-name: Acme.Spider robot-cover-url: http://www.acme.com/java/software/Acme.Spider.html robot-details-url: http://www.acme.com/java/software/Acme.Spider.html robot-owner-name: Jef Poskanzer - ACME Laboratories robot-owner-url: http://www.acme.com/ robot-owner-email: jef@acme.com robot-status: active robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: java robot-availability: source robot-exclusion: yes robot-exclusion-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-noindex: no robot-host: * robot-from: no robot-useragent: Due to a deficiency in Java it's not currently possible to set the User-Agent. robot-language: java robot-description: A Java utility class for writing your own robots. robot-history: robot-environment: modified-date: Wed, 04 Dec 1996 21:30:11 GMT modified-by: Jef Poskanzer robot-id: activeagent robot-name: ActiveAgent robot-cover-url: http://www.hipcrime.com robot-details-url: http://www.hipcrime.com robot-owner-name: robert returned robot-owner-url: http://www.hipcrime.com robot-owner-email: agent@hipcrime.com robot-status: active robot-purpose: other robot-type: applet robot-platform: all robot-availability: source robot-exclusion: no robot-exclusion-useragent: no robot-noindex: no robot-host: anywhere robot-from: no robot-useragent: no robot-language: java robot-description: crawling email robot and publicity engine robot-history: hipcrime's Internet art project robot-environment: research/hobby modified-date: 10-13-96 modified-by: RR1563 robot-id: ahoythehomepagefinder robot-name: Ahoy! The Homepage Finder robot-cover-url: http://www.cs.washington.edu/research/ahoy/ robot-details-url: http://www.cs.washington.edu/research/ahoy/doc/home.html robot-owner-name: Marc Langheinrich robot-owner-url: http://www.cs.washington.edu/homes/marclang robot-owner-email: marclang@cs.washington.edu robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ahoy robot-noindex: no robot-host: cs.washington.edu robot-from: no robot-useragent: 'Ahoy! The Homepage Finder' robot-language: Perl 5 robot-description: Ahoy! is an ongoing research project at the University of Washington for finding personal Homepages. robot-history: Research project at the University of Washington in 1995/1996 robot-environment: research modified-date: Fri June 28 14:00:00 1996 modified-by: Marc Langheinrich robot-id: Alkaline robot-name: Alkaline robot-cover-url: http://www.vestris.com/alkaline robot-details-url: http://www.vestris.com/alkaline robot-owner-name: Daniel Doubrovkine robot-owner-url: http://cuiwww.unige.ch/~doubrov5 robot-owner-email: dblock@vestris.com robot-status: development active robot-purpose: indexing robot-type: standalone robot-platform: unix windows95 windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: AlkalineBOT robot-noindex: yes robot-host: * robot-from: no robot-useragent: AlkalineBOT robot-language: c++ robot-description: Unix/NT internet/intranet search engine robot-history: Vestris Inc. search engine designed at the University of Geneva robot-environment: commercial research modified-date: Thu Dec 10 14:01:13 MET 1998 modified-by: Daniel Doubrovkine robot-id: arachnophilia robot-name: Arachnophilia robot-cover-url: robot-details-url: robot-owner-name: Vince Taluskie robot-owner-url: http://www.ph.utexas.edu/people/vince.html robot-owner-email: taluskie@utpapa.ph.utexas.edu robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: halsoft.com robot-from: robot-useragent: Arachnophilia robot-language: robot-description: The purpose (undertaken by HaL Software) of this run was to collect approximately 10k html documents for testing automatic abstract generation robot-history: robot-environment: modified-date: modified-by: robot-id: architext robot-name: ArchitextSpider robot-cover-url: http://www.excite.com/ robot-details-url: robot-owner-name: Architext Software robot-owner-url: http://www.atext.com/spider.html robot-owner-email: spider@atext.com robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.atext.com robot-from: yes robot-useragent: ArchitextSpider robot-language: perl 5 and c robot-description: Its purpose is to generate a Resource Discovery database, and to generate statistics. The ArchitextSpider collects information for the Excite and WebCrawler search engines. robot-history: robot-environment: modified-date: Tue Oct 3 01:10:26 1995 modified-by: robot-id: aretha robot-name: Aretha robot-cover-url: robot-details-url: robot-owner-name: Dave Weiner robot-owner-url: http://www.hotwired.com/Staff/userland/ robot-owner-email: davew@well.com robot-status: robot-purpose: robot-type: robot-platform: Macintosh robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: robot-language: robot-description: A crude robot built on top of Netscape and Userland Frontier, a scripting system for Macs robot-history: robot-environment: modified-date: modified-by: robot-id: aspider robot-name: ASpider (Associative Spider) robot-cover-url: robot-details-url: robot-owner-name: Fred Johansen robot-owner-url: http://www.pvv.ntnu.no/~fredj/ robot-owner-email: fredj@pvv.ntnu.no robot-status: retired robot-purpose: indexing robot-type: robot-platform: unix robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: nova.pvv.unit.no robot-from: yes robot-useragent: ASpider/0.09 robot-language: perl4 robot-description: ASpider is a CGI script that searches the web for keywords given by the user through a form. robot-history: robot-environment: hobby modified-date: modified-by: robot-id: auresys robot-name: AURESYS robot-cover-url: http://crrm.univ-mrs.fr robot-details-url: http://crrm.univ-mrs.fr robot-owner-name: Mannina Bruno robot-owner-url: ftp://crrm.univ-mrs.fr/pub/CVetud/Etudiants/Mannina/CVbruno.htm robot-owner-email: mannina@crrm.univ-mrs.fr robot-status: robot actively in use robot-purpose: indexing,statistics robot-type: Standalone robot-platform: Aix, Unix robot-availability: Protected by Password robot-exclusion: Yes robot-exclusion-useragent: robot-noindex: no robot-host: crrm.univ-mrs.fr, 192.134.99.192 robot-from: Yes robot-useragent: AURESYS/1.0 robot-language: Perl 5.001m robot-description: The AURESYS is used to build a personnal database for somebody who search information. The database is structured to be analysed. AURESYS can found new server by IP incremental. It generate statistics... robot-history: This robot finds its roots in a research project at the University of Marseille in 1995-1996 robot-environment: used for Research modified-date: Mon, 1 Jul 1996 14:30:00 GMT modified-by: Mannina Bruno robot-id: backrub robot-name: BackRub robot-cover-url: robot-details-url: robot-owner-name: Larry Page robot-owner-url: http://backrub.stanford.edu/ robot-owner-email: page@leland.stanford.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stanford.edu robot-from: yes robot-useragent: BackRub/*.* robot-language: Java. robot-description: robot-history: robot-environment: modified-date: Wed Feb 21 02:57:42 1996. modified-by: robot-id: bigbrother robot-name: Big Brother robot-cover-url: http://pauillac.inria.fr/~fpottier/mac-soft.html.en robot-details-url: robot-owner-name: Francois Pottier robot-owner-url: http://pauillac.inria.fr/~fpottier/ robot-owner-email: Francois.Pottier@inria.fr robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: not as of 1.0 robot-useragent: Big Brother robot-language: c++ robot-description: Macintosh-hosted link validation tool. robot-history: robot-environment: shareware modified-date: Thu Sep 19 18:01:46 MET DST 1996 modified-by: Francois Pottier robot-id: blackwidow robot-name: BlackWidow robot-cover-url: http://140.190.65.12/~khooghee/index.html robot-details-url: robot-owner-name: Kevin Hoogheem robot-owner-url: robot-owner-email: khooghee@marys.smumn.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: 140.190.65.* robot-from: yes robot-useragent: BlackWidow robot-language: C, C++. robot-description: Started as a research project and now is used to find links for a random link generator. Also is used to research the growth of specific sites. robot-history: robot-environment: modified-date: Fri Feb 9 00:11:22 1996. modified-by: robot-id: blindekuh robot-name: Die Blinde Kuh robot-cover-url: http://www.blinde-kuh.de/ robot-details-url: http://www.blinde-kuh.de/robot.html (german language) robot-owner-name: Stefan R. Mueller robot-owner-url: http://www.rrz.uni-hamburg.de/philsem/stefan_mueller/ robot-owner-email:maschinist@blinde-kuh.de robot-status: development robot-purpose: indexing robot-type: browser robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: minerva.sozialwiss.uni-hamburg.de robot-from: yes robot-useragent: Die Blinde Kuh robot-language: perl5 robot-description: The robot is use for indixing and proofing the registered urls in the german language search-engine for kids. Its a none-comercial one-woman-project of Birgit Bachmann living in Hamburg, Germany. robot-history: The robot was developed by Stefan R. Mueller to help by the manual proof of registered Links. robot-environment: hobby modified-date: Mon Jul 22 1998 modified-by: Stefan R. Mueller robot-id: brightnet robot-name: bright.net caching robot robot-cover-url: robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: active robot-purpose: caching robot-type: robot-platform: robot-availability: none robot-exclusion: no robot-noindex: robot-host: 209.143.1.46 robot-from: no robot-useragent: Mozilla/3.01 (compatible;) robot-language: robot-description: robot-history: robot-environment: modified-date: Fri Nov 13 14:08:01 EST 1998 modified-by: brian d foy robot-id: bspider robot-name: BSpider robot-cover-url: not yet robot-details-url: not yet robot-owner-name: Yo Okumura robot-owner-url: not yet robot-owner-email: okumura@rsl.crl.fujixerox.co.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: bspider robot-noindex: yes robot-host: 210.159.73.34, 210.159.73.35 robot-from: yes robot-useragent: BSpider/1.0 libwww-perl/0.40 robot-language: perl robot-description: BSpider is crawling inside of Japanese domain for indexing. robot-history: Starts Apr 1997 in a research project at Fuji Xerox Corp. Research Lab. robot-environment: research modified-date: Mon, 21 Apr 1997 18:00:00 JST modified-by: Yo Okumura robot-id: cactvschemistryspider robot-name: CACTVS Chemistry Spider robot-cover-url: http://schiele.organik.uni-erlangen.de/cactvs/spider.html robot-details-url: robot-owner-name: W. D. Ihlenfeldt robot-owner-url: http://schiele.organik.uni-erlangen.de/cactvs/ robot-owner-email: wdi@eros.ccc.uni-erlangen.de robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: utamaro.organik.uni-erlangen.de robot-from: no robot-useragent: CACTVS Chemistry Spider robot-language: TCL, C robot-description: Locates chemical structures in Chemical MIME formats on WWW and FTP servers and downloads them into database searchable with structure queries (substructure, fullstructure, formula, properties etc.) robot-history: robot-environment: modified-date: Sat Mar 30 00:55:40 1996. modified-by: robot-id: cassandra robot-name: Cassandra robot-cover-url: http://post.mipt.rssi.ru/~billy/search/ robot-details-url: http://post.mipt.rssi.ru/~billy/search/ robot-owner-name: Mr. Oleg Bilibin robot-owner-url: http://post.mipt.rssi.ru/~billy/ robot-owner-email: billy168@aha.ru robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: crossplatform robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: www.aha.ru robot-from: no robot-useragent: robot-language: java robot-description: Cassandra search robot is used to create and maintain indexed database for widespread Information Retrieval System robot-history: Master of Science degree project at Moscow Institute of Physics and Technology robot-environment: research modified-date: Wed, 3 Jun 1998 12:00:00 GMT robot-id: cgireader robot-name: Digimarc Marcspider/CGI robot-cover-url: http://www.digimarc.com/prod_fam.html robot-details-url: http://www.digimarc.com/prod_fam.html robot-owner-name: Digimarc Corporation robot-owner-url: http://www.digimarc.com robot-owner-email: wmreader@digimarc.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 206.102.3.* robot-from: robot-useragent: Digimarc CGIReader/1.0 robot-language: c++ robot-description: Similar to Digimarc Marcspider, Marcspider/CGI examines image files for watermarks but more focused on CGI Urls. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific CGI URLs of interest to us. If an URL is to a page of interest (via CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages. robot-history: First operation in December 1997 robot-environment: service modified-date: Fri, 5 Dec 1997 12:00:00 GMT modified-by: Dan Ramos robot-id: checkbot robot-name: Checkbot robot-cover-url: http://www.xs4all.nl/~graaff/checkbot/ robot-details-url: robot-owner-name: Hans de Graaff robot-owner-url: http://www.xs4all.nl/~graaff/checkbot/ robot-owner-email: graaff@xs4all.nl robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix,WindowsNT robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: Checkbot/x.xx LWP/5.x robot-language: perl 5 robot-description: Checkbot checks links in a given set of pages on one or more servers. It reports links which returned an error code robot-history: robot-environment: hobby modified-date: Tue Jun 25 07:44:00 1996 modified-by: Hans de Graaff robot-id: churl robot-name: churl robot-cover-url: http://www-personal.engin.umich.edu/~yunke/scripts/churl/ robot-details-url: robot-owner-name: Justin Yunke robot-owner-url: http://www-personal.engin.umich.edu/~yunke/ robot-owner-email: yunke@umich.edu robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: A URL checking robot, which stays within one step of the local server robot-history: robot-environment: modified-date: modified-by: robot-id: cmc robot-name: CMC/0.01 robot-details-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=robot robot-cover-url: http://www2.next.ne.jp/music/ robot-owner-name: Shinobu Kubota. robot-owner-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=profile robot-owner-email: shinobu@po.next.ne.jp robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: CMC/0.01 robot-noindex: no robot-host: haruna.next.ne.jp, 203.183.218.4 robot-from: yes robot-useragent: CMC/0.01 robot-language: perl5 robot-description: This CMC/0.01 robot collects the information of the page that was registered to the music specialty searching service. robot-history: This CMC/0.01 robot was made for the computer music center on November 4, 1997. robot-environment: hobby modified-date: Sat, 23 May 1998 17:22:00 GMT robot-id: combine robot-name: Combine System robot-cover-url: http://www.ub2.lu.se/~tsao/combine.ps robot-details-url: http://www.ub2.lu.se/~tsao/combine.ps robot-owner-name: Yong Cao robot-owner-url: http://www.ub2.lu.se/ robot-owner-email: tsao@munin.ub2.lu.se robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: combine robot-noindex: no robot-host: *.ub2.lu.se robot-from: yes robot-useragent: combine/0.0 robot-language: c, perl5 robot-description: An open, distributed, and efficient harvester. robot-history: A complete re-design of the NWI robot (w3index) for DESIRE project. robot-environment: research modified-date: Tue, 04 Mar 1997 16:11:40 GMT modified-by: Yong Cao robot-id: conceptbot robot-name: Conceptbot robot-cover-url: http://www.aptltd.com/~sifry/conceptbot/tech.html robot-details-url: http://www.aptltd.com/~sifry/conceptbot robot-owner-name: David L. Sifry robot-owner-url: http://www.aptltd.com/~sifry robot-owner-email: david@sifry.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: conceptbot robot-noindex: yes robot-host: router.sifry.com robot-from: yes robot-useragent: conceptbot/0.3 robot-language: perl5 robot-description:The Conceptbot spider is used to research concept-based search indexing techniques. It uses a breadth first seach to spread out the number of hits on a single site over time. The spider runs at irregular intervals and is still under construction. robot-history: This spider began as a research project at Sifry Consulting in April 1996. robot-environment: research modified-date: Mon, 9 Sep 1996 15:31:07 GMT modified-by: David L. Sifry robot-id: core robot-name: Web Core / Roots robot-cover-url: http://www.di.uminho.pt/wc robot-details-url: robot-owner-name: Jorge Portugal Andrade robot-owner-url: http://www.di.uminho.pt/~cbm robot-owner-email: wc@di.uminho.pt robot-status: robot-purpose: indexing, maintenance robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: shiva.di.uminho.pt, from www.di.uminho.pt robot-from: no robot-useragent: root/0.1 robot-language: perl robot-description: Parallel robot developed in Minho Univeristy in Portugal to catalog relations among URLs and to support a special navigation aid. robot-history: First versions since October 1995. robot-environment: modified-date: Wed Jan 10 23:19:08 1996. modified-by: robot-id: cshkust robot-name: CS-HKUST WISE: WWW Index and Search Engine robot-cover-url: http://www.cs.ust.hk/IndexServer/ robot-details-url: robot-owner-name: Budi Yuwono robot-owner-url: http://www.cis.ohio-state.edu/~yuwono-b/ robot-owner-email: yuwono-b@cs.ust.hk robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: dbx.cs.ust.hk robot-from: yes robot-useragent: CS-HKUST-IndexServer/1.0 robot-language: c robot-description: Its purpose is to generate a Resource Discovery database, and validate HTML. Part of an on-going research project on Internet Resource Discovery at Department of Computer Science, Hong Kong University of Science and Technology (CS-HKUST) robot-history: robot-environment: modified-date: Tue Jun 20 02:39:16 1995 modified-by: robot-id: cyberspyder robot-name: CyberSpyder Link Test robot-cover-url: http://www.cyberspyder.com/cslnkts1.html robot-details-url: http://www.cyberspyder.com/cslnkts1.html robot-owner-name: Tom Aman robot-owner-url: http://www.cyberspyder.com/ robot-owner-email: amant@cyberspyder.com robot-status: active robot-purpose: link validation, some html validation robot-type: standalone robot-platform: windows 3.1x, windows95, windowsNT robot-availability: binary robot-exclusion: user configurable robot-exclusion-useragent: cyberspyder robot-noindex: no robot-host: * robot-from: no robot-useragent: CyberSpyder/2.1 robot-language: Microsoft Visual Basic 4.0 robot-description: CyberSpyder Link Test is intended to be used as a site management tool to validate that HTTP links on a page are functional and to produce various analysis reports to assist in managing a site. robot-history: The original robot was created to fill a widely seen need for a easy to use link checking program. robot-environment: commercial modified-date: Tue, 31 Mar 1998 01:02:00 GMT modified-by: Tom Aman robot-id: deweb robot-name: DeWeb(c) Katalog/Index robot-cover-url: http://deweb.orbit.de/ robot-details-url: robot-owner-name: Marc Mielke robot-owner-url: http://www.orbit.de/ robot-owner-email: dewebmaster@orbit.de robot-status: robot-purpose: indexing, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: deweb.orbit.de robot-from: yes robot-useragent: Deweb/1.01 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database, perform mirroring, and generate statistics. Uses combination of Informix(tm) Database and WN 1.11 serversoftware for indexing/ressource discovery, fulltext search, text excerpts. robot-history: robot-environment: modified-date: Wed Jan 10 08:23:00 1996 modified-by: robot-id: dienstspider robot-name: DienstSpider robot-cover-url: http://sappho.csi.forth.gr:22000/ robot-details-url: robot-owner-name: Antonis Sidiropoulos robot-owner-url: http://www.csi.forth.gr/~asidirop robot-owner-email: asidirop@csi.forth.gr robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: sappho.csi.forth.gr robot-from: robot-useragent: dienstspider/1.0 robot-language: C robot-description: Indexing and searching the NCSTRL(Networked Computer Science Technical Report Library) and ERCIM Collection robot-history: The version 1.0 was the developer's master thesis project robot-environment: research modified-date: Fri, 4 Dec 1998 0:0:0 GMT modified-by: asidirop@csi.forth.gr robot-id: dnabot robot-name: DNAbot robot-cover-url: http://xx.dnainc.co.jp/dnabot/ robot-details-url: http://xx.dnainc.co.jp/dnabot/ robot-owner-name: Tom Tanaka robot-owner-url: http://xx.dnainc.co.jp robot-owner-email: tomatell@xx.dnainc.co.jp robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows, windows95, windowsNT, mac robot-availability: data robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: xx.dnainc.co.jp robot-from: yes robot-useragent: DNAbot/1.0 robot-language: java robot-description: A search robot in 100 java, with its own built-in database engine and web server . Currently in Japanese. robot-history: Developed by DNA, Inc.(Niigata City, Japan) in 1998. robot-environment: commercial modified-date: Mon, 4 Jan 1999 14:30:00 GMT modified-by: Tom Tanaka robot-id: download_express robot-name: DownLoad Express robot-cover-url: http://www.jacksonville.net/~dlxpress robot-details-url: http://www.jacksonville.net/~dlxpress robot-owner-name: DownLoad Express Inc robot-owner-url: http://www.jacksonville.net/~dlxpress robot-owner-email: dlxpress@mediaone.net robot-status: active robot-purpose: graphic download robot-type: standalone robot-platform: win95/98/NT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: downloadexpress robot-noindex: no robot-host: * robot-from: no robot-useragent: robot-language: visual basic robot-description: automatically downloads graphics from the web robot-history: robot-environment: commerical modified-date: Wed, 05 May 1998 modified-by: DownLoad Express Inc robot-id: dragonbot robot-name: DragonBot robot-cover-url: http://www.paczone.com/ robot-details-url: robot-owner-name: Paul Law robot-owner-url: robot-owner-email: admin@paczone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: DragonBot robot-noindex: no robot-host: *.paczone.com robot-from: no robot-useragent: DragonBot/1.0 libwww/5.0 robot-language: C++ robot-description: Collects web pages related to East Asia robot-history: robot-environment: service modified-date: Mon, 11 Aug 1997 00:00:00 GMT modified-by: robot-id: eit robot-name: EIT Link Verifier Robot robot-cover-url: http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html robot-details-url: robot-owner-name: Jim McGuire robot-owner-url: http://www.eit.com/people/mcguire.html robot-owner-email: mcguire@eit.COM robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: robot-useragent: EIT-Link-Verifier-Robot/0.2 robot-language: robot-description: Combination of an HTML form and a CGI script that verifies links from a given starting point (with some controls to prevent it going off-site or limitless) robot-history: Announced on 12 July 1994 robot-environment: modified-date: modified-by: robot-id: emacs robot-name: Emacs-w3 Search Engine robot-cover-url: http://www.cs.indiana.edu/elisp/w3/docs.html robot-details-url: robot-owner-name: William M. Perry robot-owner-url: http://www.cs.indiana.edu/hyplan/wmperry.html robot-owner-email: wmperry@spry.com robot-status: retired robot-purpose: indexing robot-type: browser robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: Emacs-w3/v[0-9\.]+ robot-language: lisp robot-description: Its purpose is to generate a Resource Discovery database This code has not been looked at in a while, but will be spruced up for the Emacs-w3 2.2.0 release sometime this month. It will honor the /robots.txt file at that time. robot-history: robot-environment: modified-date: Fri May 5 16:09:18 1995 modified-by: robot-id: emcspider robot-name: ananzi robot-cover-url: http://www.empirical.com/ robot-details-url: robot-owner-name: Hunter Payne robot-owner-url: http://www.psc.edu/~hpayne/ robot-owner-email: hpayne@u-media.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: bilbo.internal.empirical.com robot-from: yes robot-useragent: EMC Spider robot-language: java This spider is still in the development stages but, it will be hitting sites while I finish debugging it. robot-description: robot-history: robot-environment: modified-date: Wed May 29 14:47:01 1996. modified-by: robot-id: esther robot-name: Esther robot-details-url: http://search.falconsoft.com/ robot-cover-url: http://search.falconsoft.com/ robot-owner-name: Tim Gustafson robot-owner-url: http://www.falconsoft.com/ robot-owner-email: tim@falconsoft.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix (FreeBSD 2.2.8) robot-availability: data robot-exclusion: yes robot-exclusion-useragent: esther robot-noindex: no robot-host: *.falconsoft.com robot-from: yes robot-useragent: esther robot-language: perl5 robot-description: This crawler is used to build the search database at http://search.falconsoft.com/ robot-history: Developed by FalconSoft. robot-environment: service modified-date: Tue, 22 Dec 1998 00:22:00 PST robot-id: nzexplorer robot-name: nzexplorer robot-cover-url: http://nzexplorer.co.nz/ robot-details-url: robot-owner-name: Paul Bourke robot-owner-url: http://bourke.gen.nz/paul.html robot-owner-email: paul@bourke.gen.nz robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: UNIX robot-availability: source (commercial) robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: bitz.co.nz robot-from: no robot-useragent: explorersearch robot-language: c++ robot-history: Started in 1995 to provide a comprehensive index to WWW pages within New Zealand. Now also used in Malaysia and other countries. robot-environment: service modified-date: Tues, 25 Jun 1996 modified-by: Paul Bourke robot-id: felix robot-name: Felix IDE robot-cover-url: http://www.pentone.com robot-details-url: http://www.pentone.com robot-owner-name: The Pentone Group, Inc. robot-owner-url: http://www.pentone.com robot-owner-email: felix@pentone.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: windows95, windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: FELIX IDE robot-noindex: yes robot-host: * robot-from: yes robot-useragent: FelixIDE/1.0 robot-language: visual basic robot-description: Felix IDE is a retail personal search spider sold by The Pentone Group, Inc. It supports the proprietary exclusion "Frequency: ??????????" in the robots.txt file. Question marks represent an integer indicating number of milliseconds to delay between document requests. This is called VDRF(tm) or Variable Document Retrieval Frequency. Note that users can re-define the useragent name. robot-history: This robot began as an in-house tool for the lucrative Felix IDS (Information Discovery Service) and has gone retail. robot-environment: service, commercial, research modified-date: Fri, 11 Apr 1997 19:08:02 GMT modified-by: Kerry B. Rogers robot-id: ferret robot-name: Wild Ferret Web Hopper #1, #2, #3 robot-cover-url: http://www.greenearth.com/ robot-details-url: robot-owner-name: Greg Boswell robot-owner-url: http://www.greenearth.com/ robot-owner-email: ghbos@postoffice.worldnet.att.net robot-status: robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Hazel's Ferret Web hopper, robot-language: C++, Visual Basic, Java robot-description: The wild ferret web hopper's are designed as specific agents to retrieve data from all available sources on the internet. They work in an onion format hopping from spot to spot one level at a time over the internet. The information is gathered into different relational databases, known as "Hazel's Horde". The information is publicly available and will be free for the browsing at www.greenearth.com. Effective date of the data posting is to be announced. robot-history: robot-environment: modified-date: Mon Feb 19 00:28:37 1996. modified-by: robot-id: fetchrover robot-name: FetchRover robot-cover-url: http://www.engsoftware.com/fetch.htm robot-details-url: http://www.engsoftware.com/spiders/ robot-owner-name: Dr. Kenneth R. Wadland robot-owner-url: http://www.engsoftware.com/ robot-owner-email: ken@engsoftware.com robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: Windows/NT, Windows/95, Solaris SPARC robot-availability: binary, source robot-exclusion: yes robot-exclusion-useragent: ESI robot-noindex: N/A robot-host: * robot-from: yes robot-useragent: ESIRover v1.0 robot-language: C++ robot-description: FetchRover fetches Web Pages. It is an automated page-fetching engine. FetchRover can be used stand-alone or as the front-end to a full-featured Spider. Its database can use any ODBC compliant database server, including Microsoft Access, Oracle, Sybase SQL Server, FoxPro, etc. robot-history: Used as the front-end to SmartSpider (another Spider product sold by Engineeering Software, Inc.) robot-environment: commercial, service modified-date: Thu, 03 Apr 1997 21:49:50 EST modified-by: Ken Wadland robot-id: fido robot-name: fido robot-cover-url: http://www.planetsearch.com/ robot-details-url: http://www.planetsearch.com/info/fido.html robot-owner-name: Steve DeJarnett robot-owner-url: http://www.planetsearch.com/staff/steved.html robot-owner-email: fido@planetsearch.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fido robot-noindex: no robot-host: fido.planetsearch.com, *.planetsearch.com, 206.64.113.* robot-from: yes robot-useragent: fido/0.9 Harvest/1.4.pl2 robot-language: c, perl5 robot-description: fido is used to gather documents for the search engine provided in the PlanetSearch service, which is operated by the Philips Multimedia Center. The robots runs on an ongoing basis. robot-history: fido was originally based on the Harvest Gatherer, but has since evolved into a new creature. It still uses some support code from Harvest. robot-environment: service modified-date: Sat, 2 Nov 1996 00:08:18 GMT modified-by: Steve DeJarnett robot-id: finnish robot-name: Hämähäkki robot-cover-url: http://www.fi/search.html robot-details-url: http://www.fi/www/spider.html robot-owner-name: Timo Metsälä robot-owner-url: http://www.fi/~timo/ robot-owner-email: Timo.Metsala@www.fi robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: no robot-exclusion: yes robot-exclusion-useragent: Hämähäkki robot-noindex: no robot-host: *.www.fi robot-from: yes robot-useragent: Hämähäkki/0.2 robot-language: C robot-description: Its purpose is to generate a Resource Discovery database from the Finnish (top-level domain .fi) www servers. The resulting database is used by the search engine at http://www.fi/search.html. robot-history: (The name Hämähäkki is just Finnish for spider.) robot-environment: modified-date: 1996-06-25 modified-by: Jaakko.Hyvatti@www.fi robot-id: fireball robot-name: KIT-Fireball robot-cover-url: http://www.fireball.de robot-details-url: http://www.fireball.de/technik.html (in German) robot-owner-name: Gruner + Jahr Electronic Media Service GmbH robot-owner-url: http://www.ems.guj.de robot-owner-email:info@fireball.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: KIT-Fireball robot-noindex: yes robot-host: *.fireball.de robot-from: yes robot-useragent: KIT-Fireball/2.0 libwww/5.0a robot-language: c robot-description: The Fireball robots gather web documents in German language for the database of the Fireball search service. robot-history: The robot was developed by Benhui Chen in a research project at the Technical University of Berlin in 1996 and was re-implemented by its developer in 1997 for the present owner. robot-environment: service modified-date: Mon Feb 23 11:26:08 1998 modified-by: Detlev Kalb robot-id: fish robot-name: Fish search robot-cover-url: http://www.win.tue.nl/bin/fish-search robot-details-url: robot-owner-name: Paul De Bra robot-owner-url: http://www.win.tue.nl/win/cs/is/debra/ robot-owner-email: debra@win.tue.nl robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: www.win.tue.nl robot-from: no robot-useragent: Fish-Search-Robot robot-language: c robot-description: Its purpose is to discover resources on the fly a version exists that is integrated into the Tübingen Mosaic 2.4.2 browser (also written in C) robot-history: Originated as an addition to Mosaic for X robot-environment: modified-date: Mon May 8 09:31:19 1995 modified-by: robot-id: fouineur robot-name: Fouineur robot-cover-url: http://fouineur.9bit.qc.ca/ robot-details-url: http://fouineur.9bit.qc.ca/informations.html robot-owner-name: Joel Vandal robot-owner-url: http://www.9bit.qc.ca/~jvandal/ robot-owner-email: jvandal@9bit.qc.ca robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix, windows robot-availability: none robot-exclusion: yes robot-exclusion-useragent: fouineur robot-noindex: no robot-host: * robot-from: yes robot-useragent: Mozilla/2.0 (compatible fouineur v2.0; fouineur.9bit.qc.ca) robot-language: perl5 robot-description: This robot build automaticaly a database that is used by our own search engine. This robot auto-detect the language (french, english & spanish) used in the HTML page. Each database record generated by this robot include: date, url, title, total words, title, size and de-htmlized text. Also support server-side and client-side IMAGEMAP. robot-history: No robots does all thing that we need for our usage. robot-environment: service modified-date: Thu, 9 Jan 1997 22:57:28 EST modified-by: jvandal@9bit.qc.ca robot-id: francoroute robot-name: Robot Francoroute robot-cover-url: robot-details-url: robot-owner-name: Marc-Antoine Parent robot-owner-url: http://www.crim.ca/~maparent robot-owner-email: maparent@crim.ca robot-status: robot-purpose: indexing, mirroring, statistics robot-type: browser robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: zorro.crim.ca robot-from: yes robot-useragent: Robot du CRIM 1.0a robot-language: perl5, sqlplus robot-description: Part of the RISQ's Francoroute project for researching francophone. Uses the Accept-Language tag and reduces demand accordingly robot-history: robot-environment: modified-date: Wed Jan 10 23:56:22 1996. modified-by: robot-id: freecrawl robot-name: Freecrawl robot-cover-url: http://euroseek.net/ robot-owner-name: Jesper Ekhall robot-owner-email: ekhall@freeside.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Freecrawl robot-noindex: no robot-host: *.freeside.net robot-from: yes robot-useragent: Freecrawl robot-language: c robot-description: The Freecrawl robot is used to build a database for the EuroSeek service. robot-environment: service robot-id: funnelweb robot-name: FunnelWeb robot-cover-url: http://funnelweb.net.au robot-details-url: robot-owner-name: David Eagles robot-owner-url: http://www.pc.com.au robot-owner-email: eaglesd@pc.com.au robot-status: robot-purpose: indexing, statisitics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: earth.planets.com.au robot-from: yes robot-useragent: FunnelWeb-1.0 robot-language: c and c++ robot-description: Its purpose is to generate a Resource Discovery database, and generate statistics. Localised South Pacific Discovery and Search Engine, plus distributed operation under development. robot-history: robot-environment: modified-date: Mon Nov 27 21:30:11 1995 modified-by: robot-id: gcreep robot-name: GCreep robot-cover-url: http://www.instrumentpolen.se/gcreep/index.html robot-details-url: http://www.instrumentpolen.se/gcreep/index.html robot-owner-name: Instrumentpolen AB robot-owner-url: http://www.instrumentpolen.se/ip-kontor/eng/index.html robot-owner-email: anders@instrumentpolen.se robot-status: development robot-purpose: indexing robot-type: browser+standalone robot-platform: linux+mysql robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gcreep robot-noindex: yes robot-host: mbx.instrumentpolen.se robot-from: yes robot-useragent: gcreep/1.0 robot-language: c robot-description: Indexing robot to learn SQL robot-history: Spare time project begun late '96, maybe early '97 robot-environment: hobby modified-date: Fri, 23 Jan 1998 16:09:00 MET modified-by: Anders Hedstrom robot-id: getbot robot-name: GetBot robot-cover-url: http://www.blacktop.com.zav/bots robot-details-url: robot-owner-name: Alex Zavatone robot-owner-url: http://www.blacktop.com/zav robot-owner-email: zav@macromedia.com robot-status: robot-purpose: maintenance robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: ??? robot-language: Shockwave/Director. robot-description: GetBot's purpose is to index all the sites it can find that contain Shockwave movies. It is the first bot or spider written in Shockwave. The bot was originally written at Macromedia on a hungover Sunday as a proof of concept. - Alex Zavatone 3/29/96 robot-history: robot-environment: modified-date: Fri Mar 29 20:06:12 1996. modified-by: robot-id: geturl robot-name: GetURL robot-cover-url: http://Snark.apana.org.au/James/GetURL/ robot-details-url: robot-owner-name: James Burton robot-owner-url: http://Snark.apana.org.au/James/ robot-owner-email: James@Snark.apana.org.au robot-status: robot-purpose: maintenance, mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: GetURL.rexx v1.05 robot-language: ARexx (Amiga REXX) robot-description: Its purpose is to validate links, perform mirroring, and copy document trees. Designed as a tool for retrieving web pages in batch mode without the encumbrance of a browser. Can be used to describe a set of pages to fetch, and to maintain an archive or mirror. Is not run by a central site and accessed by clients - is run by the end user or archive maintainer robot-history: robot-environment: modified-date: Tue May 9 15:13:12 1995 modified-by: robot-id: golem robot-name: Golem robot-cover-url: http://www.quibble.com/golem/ robot-details-url: http://www.quibble.com/golem/ robot-owner-name: Geoff Duncan robot-owner-url: http://www.quibble.com/geoff/ robot-owner-email: geoff@quibble.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: golem robot-noindex: no robot-host: *.quibble.com robot-from: yes robot-useragent: Golem/1.1 robot-language: HyperTalk/AppleScript/C++ robot-description: Golem generates status reports on collections of URLs supplied by clients. Designed to assist with editorial updates of Web-related sites or products. robot-history: Personal project turned into a contract service for private clients. robot-environment: service,research modified-date: Wed, 16 Apr 1997 20:50:00 GMT modified-by: Geoff Duncan robot-id: googlebot robot-name: Googlebot robot-cover-url: http://googlebot.com/ robot-details-url: http://googlebot.com/ robot-owner-name: Google Inc. robot-owner-url: http://google.com/ robot-owner-email: googlebot@googlebot.com robot-status: active robot-purpose: indexing statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: Googlebot robot-noindex: no robot-host: *.googlebot.com robot-from: yes robot-useragent: Googlebot//1.0 robot-language: Python robot-description: robot-history: Used to be called backrub and run from stanford.edu robot-environment: service modified-date: Wed, 21 Oct 1998 21:58:03 -0700 modified-by: "L a r r y . P a g e" robot-id: grapnel robot-name: Grapnel/0.01 Experiment robot-cover-url: varies robot-details-url: mailto:v93_kat@ce.kth.se robot-owner-name: Philip Kallerman robot-owner-url: v93_kat@ce.kth.se robot-owner-email: v93_kat@ce.kth.se robot-status: Experimental robot-purpose: Indexing robot-type: robot-platform: WinNT robot-availability: None, yet robot-exclusion: Yes robot-exclusion-useragent: No robot-noindex: No robot-host: varies robot-from: Varies robot-useragent: robot-language: Perl robot-description: Resource Discovery Experimentation robot-history: None, hoping to make some robot-environment: modified-date: modified-by: 7 Feb 1997 robot-id: gromit robot-name: Gromit robot-cover-url: http://www.austlii.edu.au/ robot-details-url: http://www2.austlii.edu.au/~dan/gromit/ robot-owner-name: Daniel Austin robot-owner-url: http://www2.austlii.edu.au/~dan/ robot-owner-email: dan@austlii.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Gromit robot-noindex: no robot-host: *.austlii.edu.au robot-from: yes robot-useragent: Gromit/1.0 robot-language: perl5 robot-description: Gromit is a Targetted Web Spider that indexes legal sites contained in the AustLII legal links database. robot-history: This robot is based on the Perl5 LWP::RobotUA module. robot-environment: research modified-date: Wed, 11 Jun 1997 03:58:40 GMT modified-by: Daniel Austin robot-id: gulliver robot-name: Northern Light Gulliver robot-cover-url: robot-details-url: robot-owner-name: Mike Mulligan robot-owner-url: robot-owner-email: crawler@northernlight.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: gulliver robot-noindex: yes robot-host: scooby.northernlight.com, taz.northernlight.com, gulliver.northernlight.com robot-from: yes robot-useragent: Gulliver/1.1 robot-language: c robot-description: Gulliver is a robot to be used to collect web pages for indexing and subsequent searching of the index. robot-history: Oct 1996: development; Dec 1996-Jan 1997: crawl & debug; Mar 1997: crawl again; robot-environment: service modified-date: Wed, 21 Apr 1999 16:00:00 GMT modified-by: Mike Mulligan robot-id: hambot robot-name: HamBot robot-cover-url: http://www.hamrad.com/search.html robot-details-url: http://www.hamrad.com/ robot-owner-name: John Dykstra robot-owner-url: robot-owner-email: john@futureone.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, Windows95 robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hambot robot-noindex: yes robot-host: *.hamrad.com robot-from: robot-useragent: robot-language: perl5, C++ robot-description: Two HamBot robots are used (stand alone & browser based) to aid in building the database for HamRad Search - The Search Engine for Search Engines. The robota are run intermittently and perform nearly identical functions. robot-history: A non commercial (hobby?) project to aid in building and maintaining the database for the the HamRad search engine. robot-environment: service modified-date: Fri, 17 Apr 1998 21:44:00 GMT modified-by: JD robot-id: harvest robot-name: Harvest robot-cover-url: http://harvest.cs.colorado.edu robot-details-url: robot-owner-name: robot-owner-url: robot-owner-email: robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: bruno.cs.colorado.edu robot-from: yes robot-useragent: yes robot-language: robot-description: Harvest's motivation is to index community- or topic- specific collections, rather than to locate and index all HTML objects that can be found. Also, Harvest allows users to control the enumeration several ways, including stop lists and depth and count limits. Therefore, Harvest provides a much more controlled way of indexing the Web than is typical of robots. Pauses 1 second between requests (by default). robot-history: robot-environment: modified-date: modified-by: robot-id: havindex robot-name: havIndex robot-cover-url: http://www.hav.com/ robot-details-url: http://www.hav.com/ robot-owner-name: hav.Software and Horace A. (Kicker) Vallas robot-owner-url: http://www.hav.com/ robot-owner-email: havIndex@hav.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Java VM 1.1 robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: havIndex robot-noindex: yes robot-host: * robot-from: no robot-useragent: havIndex/X.xx[bxx] robot-language: Java robot-description: havIndex allows individuals to build searchable word index of (user specified) lists of URLs. havIndex does not crawl - rather it requires one or more user supplied lists of URLs to be indexed. havIndex does (optionally) save urls parsed from indexed pages. robot-history: Developed to answer client requests for URL specific index capabilities. robot-environment: commercial, service modified-date: 6-27-98 modified-by: Horace A. (Kicker) Vallas robot-id: hi robot-name: HI (HTML Index) Search robot-cover-url: http://cs6.cs.ait.ac.th:21870/pa.html robot-details-url: robot-owner-name: Razzakul Haider Chowdhury robot-owner-url: http://cs6.cs.ait.ac.th:21870/index.html robot-owner-email: a94385@cs.ait.ac.th robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: AITCSRobot/1.1 robot-language: perl 5 robot-description: Its purpose is to generate a Resource Discovery database. This Robot traverses the net and creates a searchable database of Web pages. It stores the title string of the HTML document and the absolute url. A search engine provides the boolean AND & OR query models with or without filtering the stop list of words. Feature is kept for the Web page owners to add the url to the searchable database. robot-history: robot-environment: modified-date: Wed Oct 4 06:54:31 1995 modified-by: robot-id: wired-digital robot-name: Wired Digital robot-cover-url: robot-details-url: robot-owner-name: Bowen Dwelle robot-owner-url: robot-owner-email: bowen@hotwired.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: hotwired robot-noindex: no robot-host: gossip.hotwired.com robot-from: yes robot-useragent: wired-digital-newsbot/1.5 robot-language: perl-5.004 robot-description: this is a test robot-history: robot-environment: research modified-date: Thu, 30 Oct 1997 modified-by: bowen@hotwired.com robot-id: htdig robot-name: ht://Dig robot-cover-url: http://www.htdig.org/ robot-details-url: http://www.htdig.org/howitworks.html robot-owner-name: Andrew Scherpbier robot-owner-url: http://www.htdig.org/author.html robot-owner-email: andrew@contigo.com robot-owner-name2: Geoff Hutchison robot-owner-url2: http://wso.williams.edu/~ghutchis/ robot-owner-email2: ghutchis@wso.williams.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: htdig robot-noindex: yes robot-host: * robot-from: no robot-useragent: htdig/3.1.0b2 robot-language: C,C++. robot-history:This robot was originally developed for use at San Diego State University. robot-environment: modified-date:Tue, 3 Nov 1998 10:09:02 EST modified-by: Geoff Hutchison robot-id: htmlgobble robot-name: HTMLgobble robot-cover-url: robot-details-url: robot-owner-name: Andreas Ley robot-owner-url: robot-owner-email: ley@rz.uni-karlsruhe.de robot-status: robot-purpose: mirror robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: tp70.rz.uni-karlsruhe.de robot-from: yes robot-useragent: HTMLgobble v2.2 robot-language: robot-description: A mirroring robot. Configured to stay within a directory, sleeps between requests, and the next version will use HEAD to check if the entire document needs to be retrieved robot-history: robot-environment: modified-date: modified-by: robot-id: hyperdecontextualizer robot-name: Hyper-Decontextualizer robot-cover-url: http://www.tricon.net/Comm/synapse/spider/ robot-details-url: robot-owner-name: Cliff Hall robot-owner-url: http://kpt1.tricon.net/cgi-bin/cliff.cgi robot-owner-email: cliff@tricon.net robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: no robot-language: Perl 5 Takes an input sentence and marks up each word with an appropriate hyper-text link. robot-description: robot-history: robot-environment: modified-date: Mon May 6 17:41:29 1996. modified-by: robot-id: ibm robot-name: IBM_Planetwide robot-cover-url: http://www.ibm.com/%7ewebmaster/ robot-details-url: robot-owner-name: Ed Costello robot-owner-url: http://www.ibm.com/%7ewebmaster/ robot-owner-email: epc@www.ibm.com" robot-status: robot-purpose: indexing, maintenance, mirroring robot-type: standalone and robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: www.ibm.com www2.ibm.com robot-from: yes robot-useragent: IBM_Planetwide, robot-language: Perl5 robot-description: Restricted to IBM owned or related domains. robot-history: robot-environment: modified-date: Mon Jan 22 22:09:19 1996. modified-by: robot-id: iconoclast robot-name: Popular Iconoclast robot-cover-url: http://gestalt.sewanee.edu/ic/ robot-details-url: http://gestalt.sewanee.edu/ic/info.html robot-owner-name: Chris Cappuccio robot-owner-url: http://sefl.satelnet.org/~ccappuc/ robot-owner-email: chris@gestalt.sewanee.edu robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix (OpenBSD) robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: gestalt.sewanee.edu robot-from: yes robot-useragent: gestaltIconoclast/1.0 libwww-FM/2.17 robot-language: c,perl5 robot-description: This guy likes statistics robot-history: This robot has a history in mathematics and english robot-environment: research modified-date: Wed, 5 Mar 1997 17:35:16 CST modified-by: chris@gestalt.sewanee.edu robot-id: Ilse robot-name: Ingrid robot-cover-url: robot-details-url: robot-owner-name: Ilse c.v. robot-owner-url: http://www.ilse.nl/ robot-owner-email: ilse@ilse.nl robot-status: Running robot-purpose: Indexing robot-type: Web Indexer robot-platform: UNIX robot-availability: Commercial as part of search engine package robot-exclusion: Yes robot-exclusion-useragent: INGRID/0.1 robot-noindex: Yes robot-host: bart.ilse.nl robot-from: Yes robot-useragent: INGRID/0.1 robot-language: C robot-description: robot-history: robot-environment: modified-date: 06/13/1997 modified-by: Ilse robot-id: imagelock robot-name: Imagelock robot-cover-url: robot-details-url: robot-owner-name: Ken Belanger robot-owner-url: robot-owner-email: belanger@imagelock.com robot-status: development robot-purpose: maintenance robot-type: robot-platform: windows95 robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: 209.111.133.* robot-from: no robot-useragent: Mozilla 3.01 PBWF (Win95) robot-language: robot-description: searches for image links robot-history: robot-environment: service modified-date: Tue, 11 Aug 1998 17:28:52 GMT modified-by: brian@smithrenaud.com robot-id: incywincy robot-name: IncyWincy robot-cover-url: http://osiris.sunderland.ac.uk/sst-scripts/simon.html robot-details-url: robot-owner-name: Simon Stobart robot-owner-url: http://osiris.sunderland.ac.uk/sst-scripts/simon.html robot-owner-email: simon.stobart@sunderland.ac.uk robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: osiris.sunderland.ac.uk robot-from: yes robot-useragent: IncyWincy/1.0b1 robot-language: C++ robot-description: Various Research projects at the University of Sunderland robot-history: robot-environment: modified-date: Fri Jan 19 21:50:32 1996. modified-by: robot-id: informant robot-name: Informant robot-cover-url: http://informant.dartmouth.edu/ robot-details-url: http://informant.dartmouth.edu/about.html robot-owner-name: Bob Gray robot-owner-name2: Aditya Bhasin robot-owner-name3: Katsuhiro Moizumi robot-owner-name4: Dr. George V. Cybenko robot-owner-url: http://informant.dartmouth.edu/ robot-owner-email: info_adm@cosmo.dartmouth.edu robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: no robot-exclusion-useragent: Informant robot-noindex: no robot-host: informant.dartmouth.edu robot-from: yes robot-useragent: Informant robot-language: c, c++ robot-description: The Informant robot continually checks the Web pages that are relevant to user queries. Users are notified of any new or updated pages. The robot runs daily, but the number of hits per site per day should be quite small, and these hits should be randomly distributed over several hours. Since the robot does not actually follow links (aside from those returned from the major search engines such as Lycos), it does not fall victim to the common looping problems. The robot will support the Robot Exclusion Standard by early December, 1996. robot-history: The robot is part of a research project at Dartmouth College. The robot may become part of a commercial service (at which time it may be subsumed by some other, existing robot). robot-environment: research, service modified-date: Sun, 3 Nov 1996 11:55:00 GMT modified-by: Bob Gray robot-id: infoseek robot-name: InfoSeek Robot 1.0 robot-cover-url: http://www.infoseek.com robot-details-url: robot-owner-name: Steve Kirsch robot-owner-url: http://www.infoseek.com robot-owner-email: stk@infoseek.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: corp-gw.infoseek.com robot-from: yes robot-useragent: InfoSeek Robot 1.0 robot-language: python robot-description: Its purpose is to generate a Resource Discovery database. Collects WWW pages for both InfoSeek's free WWW search and commercial search. Uses a unique proprietary algorithm to identify the most popular and interesting WWW pages. Very fast, but never has more than one request per site outstanding at any given time. Has been refined for more than a year. robot-history: robot-environment: modified-date: Sun May 28 01:35:48 1995 modified-by: robot-id: infoseeksidewinder robot-name: Infoseek Sidewinder robot-cover-url: http://www.infoseek.com/ robot-details-url: robot-owner-name: Mike Agostino robot-owner-url: http://www.infoseek.com/ robot-owner-email: mna@infoseek.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Infoseek Sidewinder robot-language: C Collects WWW pages for both InfoSeek's free WWW search services. Uses a unique, incremental, very fast proprietary algorithm to find WWW pages. robot-description: robot-history: robot-environment: modified-date: Sat Apr 27 01:20:15 1996. modified-by: robot-id: infospider robot-name: InfoSpiders robot-cover-url: http://www-cse.ucsd.edu/users/fil/agents/agents.html robot-owner-name: Filippo Menczer robot-owner-url: http://www-cse.ucsd.edu/users/fil/ robot-owner-email: fil@cs.ucsd.edu robot-status: development robot-purpose: search robot-type: standalone robot-platform: unix, mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: InfoSpiders robot-noindex: no robot-host: *.ucsd.edu robot-from: yes robot-useragent: InfoSpiders/0.1 robot-language: c, perl5 robot-description: application of artificial life algorithm to adaptive distributed information retrieval robot-history: UC San Diego, Computer Science Dept. PhD research project (1995-97) under supervision of Prof. Rik Belew robot-environment: research modified-date: Mon, 16 Sep 1996 14:08:00 PDT robot-id: inspectorwww robot-name: Inspector Web robot-cover-url: http://www.greenpac.com/inspector/ robot-details-url: http://www.greenpac.com/inspector/ourrobot.html robot-owner-name: Doug Green robot-owner-url: http://www.greenpac.com robot-owner-email: doug@greenpac.com robot-status: active: robot significantly developed, but still undergoing fixes robot-purpose: maintentance: link validation, html validation, image size validation, etc robot-type: standalone robot-platform: unix robot-availability: free service and more extensive commercial service robot-exclusion: yes robot-exclusion-useragent: inspectorwww robot-noindex: no robot-host: www.corpsite.com, www.greenpac.com, 38.234.171.* robot-from: yes robot-useragent: inspectorwww/1.0 http://www.greenpac.com/inspectorwww.html robot-language: c robot-description: Provide inspection reports which give advise to WWW site owners on missing links, images resize problems, syntax errors, etc. robot-history: development started in Mar 1997 robot-environment: commercial modified-date: Tue Jun 17 09:24:58 EST 1997 modified-by: Doug Green robot-id: intelliagent robot-name: IntelliAgent robot-cover-url: http://www.geocities.com/SiliconValley/3086/iagent.html robot-details-url: robot-owner-name: David Reilly robot-owner-url: http://www.geocities.com/SiliconValley/3086/index.html robot-owner-email: s1523@sand.it.bond.edu.au robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: sand.it.bond.edu.au robot-from: no robot-useragent: 'IAGENT/1.0' robot-language: C robot-description: IntelliAgent is still in development. Indeed, it is very far from completion. I'm planning to limit the depth at which it will probe, so hopefully IAgent won't cause anyone much of a problem. At the end of its completion, I hope to publish both the raw data and original source code. robot-history: robot-environment: modified-date: Fri May 31 02:10:39 1996. modified-by: robot-id:iron33 robot-name:Iron33 robot-cover-url:http://verno.ueda.info.waseda.ac.jp/iron33/ robot-details-url:http://verno.ueda.info.waseda.ac.jp/iron33/history.html robot-owner-name:Takashi Watanabe robot-owner-url:http://www.ueda.info.waseda.ac.jp/~watanabe/ robot-owner-email:watanabe@ueda.info.waseda.ac.jp robot-status:active robot-purpose:indexing, statistics robot-type:standalone robot-platform:unix robot-availability:source robot-exclusion:yes robot-exclusion-useragent:Iron33 robot-noindex:no robot-host:*.folon.ueda.info.waseda.ac.jp, 133.9.215.* robot-from:yes robot-useragent:Iron33/0.0 robot-language:c robot-description:The robot "Iron33" is used to build the database for the WWW search engine "Verno". robot-history: robot-environment:research modified-date:Fri, 20 Mar 1998 18:34 JST modified-by:Watanabe Takashi robot-id: israelisearch robot-name: Israeli-search robot-cover-url: http://www.idc.ac.il/Sandbag/ robot-details-url: robot-owner-name: Etamar Laron robot-owner-url: http://www.xpert.com/~etamar/ robot-owner-email: etamar@xpert.co robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: dylan.ius.cs.cmu.edu robot-from: no robot-useragent: IsraeliSearch/1.0 robot-language: C A complete software designed to collect information in a distributed workload and supports context queries. Intended to be a complete updated resource for Israeli sites and information related to Israel or Israeli Society. robot-description: robot-history: robot-environment: modified-date: Tue Apr 23 19:23:55 1996. modified-by: robot-id: jcrawler robot-name: JCrawler robot-cover-url: http://www.nihongo.org/jcrawler/ robot-details-url: robot-owner-name: Benjamin Franz robot-owner-url: http://www.nihongo.org/snowhare/ robot-owner-email: snowhare@netimages.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: jcrawler robot-noindex: yes robot-host: db.netimages.com robot-from: yes robot-useragent: JCrawler/0.2 robot-language: perl5 robot-description: JCrawler is currently used to build the Vietnam topic specific WWW index for VietGATE . It schedules visits randomly, but will not visit a site more than once every two minutes. It uses a subject matter relevance pruning algorithm to determine what pages to crawl and index and will not generally index pages with no Vietnam related content. Uses Unicode internally, and detects and converts several different Vietnamese character encodings. robot-history: robot-environment: service modified-date: Wed, 08 Oct 1997 00:09:52 GMT modified-by: Benjamin Franz robot-id: jeeves robot-name: Jeeves robot-cover-url: http://www-students.doc.ic.ac.uk/~lglb/Jeeves/ robot-details-url: robot-owner-name: Leon Brocard robot-owner-url: http://www-students.doc.ic.ac.uk/~lglb/ robot-owner-email: lglb@doc.ic.ac.uk robot-status: development robot-purpose: indexing maintenance statistics robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: no robot-exclusion-useragent: jeeves robot-noindex: no robot-host: *.doc.ic.ac.uk robot-from: yes robot-useragent: Jeeves v0.05alpha (PERL, LWP, lglb@doc.ic.ac.uk) robot-language: perl5 robot-description: Jeeves is basically a web-mirroring robot built as a final-year degree project. It will have many nice features and is already web-friendly. Still in development. robot-history: Still short (0.05alpha) robot-environment: research modified-date: Wed, 23 Apr 1997 17:26:50 GMT modified-by: Leon Brocard robot-id: jobot robot-name: Jobot robot-cover-url: http://www.micrognosis.com/~ajack/jobot/jobot.html robot-details-url: robot-owner-name: Adam Jack robot-owner-url: http://www.micrognosis.com/~ajack/index.html robot-owner-email: ajack@corp.micrognosis.com robot-status: inactive robot-purpose: standalone robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: supernova.micrognosis.com robot-from: yes robot-useragent: Jobot/0.1alpha libwww-perl/4.0 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database. Intended to seek out sites of potential "career interest". Hence - Job Robot. robot-history: robot-environment: modified-date: Tue Jan 9 18:55:55 1996 modified-by: robot-id: joebot robot-name: JoeBot robot-cover-url: robot-details-url: robot-owner-name: Ray Waldin robot-owner-url: http://www.primenet.com/~rwaldin robot-owner-email: rwaldin@primenet.com robot-status: robot-purpose: robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: JoeBot/x.x, robot-language: java JoeBot is a generic web crawler implemented as a collection of Java classes which can be used in a variety of applications, including resource discovery, link validation, mirroring, etc. It currently limits itself to one visit per host per minute. robot-description: robot-history: robot-environment: modified-date: Sun May 19 08:13:06 1996. modified-by: robot-id: jubii robot-name: The Jubii Indexing Robot robot-cover-url: http://www.jubii.dk/robot/default.htm robot-details-url: robot-owner-name: Jakob Faarvang robot-owner-url: http://www.cybernet.dk/staff/jakob/ robot-owner-email: jakob@jubii.dk robot-status: robot-purpose: indexing, maintainance robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: any host in the cybernet.dk domain robot-from: yes robot-useragent: JubiiRobot/version# robot-language: visual basic 4.0 robot-description: Its purpose is to generate a Resource Discovery database, and validate links. Used for indexing the .dk top-level domain as well as other Danish sites for aDanish web database, as well as link validation. robot-history: Will be in constant operation from Spring 1996 robot-environment: modified-date: Sat Jan 6 20:58:44 1996 modified-by: robot-id: jumpstation robot-name: JumpStation robot-cover-url: http://js.stir.ac.uk/jsbin/jsii robot-details-url: robot-owner-name: Jonathon Fletcher robot-owner-url: http://www.stir.ac.uk/~jf1 robot-owner-email: j.fletcher@stirling.ac.uk robot-status: retired robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.stir.ac.uk robot-from: yes robot-useragent: jumpstation robot-language: perl, C, c++ robot-description: robot-history: Originated as a weekend project in 1993. robot-environment: modified-date: Tue May 16 00:57:42 1995. modified-by: robot-id: katipo robot-name: Katipo robot-cover-url: http://www.vuw.ac.nz/~newbery/Katipo.html robot-details-url: http://www.vuw.ac.nz/~newbery/Katipo/Katipo-doc.html robot-owner-name: Michael Newbery robot-owner-url: http://www.vuw.ac.nz/~newbery robot-owner-email: Michael.Newbery@vuw.ac.nz robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: Macintosh robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: Katipo/1.0 robot-language: c robot-description: Watches all the pages you have previously visited and tells you when they have changed. robot-history: robot-environment: commercial (free) modified-date: Tue, 25 Jun 96 11:40:07 +1200 modified-by: Michael Newbery robot-id: kdd robot-name: KDD-Explorer robot-cover-url: http://mlc.kddvw.kcom.or.jp/CLINKS/html/clinks.html robot-details-url: not available robot-owner-name: Kazunori Matsumoto robot-owner-url: not available robot-owner-email: matsu@lab.kdd.co.jp robot-status: development (to be avtive in June 1997) robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent:KDD-Explorer robot-noindex: no robot-host: mlc.kddvw.kcom.or.jp robot-from: yes robot-useragent: KDD-Explorer/0.1 robot-language: c robot-description: KDD-Explorer is used for indexing valuable documents which will be retrieved via an experimental cross-language search engine, CLINKS. robot-history: This robot was designed in Knowledge-bases Information processing Laboratory, KDD R&D Laboratories, 1996-1997 robot-environment: research modified-date: Mon, 2 June 1997 18:00:00 JST modified-by: Kazunori Matsumoto robot-id:kilroy robot-name:Kilroy robot-cover-url:http://purl.org/kilroy robot-details-url:http://purl.org/kilroy robot-owner-name:OCLC robot-owner-url:http://www.oclc.org robot-owner-email:kilroy@oclc.org robot-status:active robot-purpose:indexing,statistics robot-type:standalone robot-platform:unix,windowsNT robot-availability:none robot-exclusion:yes robot-exclusion-useragent:* robot-noindex:no robot-host:*.oclc.org robot-from:no robot-useragent:yes robot-language:java robot-description:Used to collect data for several projects. Runs constantly and visits site no faster than once every 90 seconds. robot-history:none robot-environment:research,service modified-date:Thursday, 24 Apr 1997 20:00:00 GMT modified-by:tkac robot-id: ko_yappo_robot robot-name: KO_Yappo_Robot robot-cover-url: http://yappo.com/info/robot.html robot-details-url: http://yappo.com/ robot-owner-name: Kazuhiro Osawa robot-owner-url: http://yappo.com/ robot-owner-email: office_KO@yappo.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: ko_yappo_robot robot-noindex: yes robot-host: yappo.com,209.25.40.1 robot-from: yes robot-useragent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html) robot-language: perl robot-description: The KO_Yappo_Robot robot is used to build the database for the Yappo search service by k,osawa (part of AOL). The robot runs random day, and visits sites in a random order. robot-history: The robot is hobby of k,osawa at the Tokyo in 1997 robot-environment: hobby modified-date: Fri, 18 Jul 1996 12:34:21 GMT modified-by: KO robot-id: labelgrabber.txt robot-name: LabelGrabber robot-cover-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm robot-details-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm robot-owner-name: Kyle Jamieson robot-owner-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm robot-owner-email: jamieson@mit.edu robot-status: active robot-purpose: Grabs PICS labels from web pages, submits them to a label bueau robot-type: standalone robot-platform: windows, windows95, windowsNT, unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: label-grabber robot-noindex: no robot-host: head.w3.org robot-from: no robot-useragent: LabelGrab/1.1 robot-language: java robot-description: The label grabber searches for PICS labels and submits them to a label bureau robot-history: N/A robot-environment: research modified-date: Wed, 28 Jan 1998 17:32:52 GMT modified-by: jamieson@mit.edu robot-id: linkwalker robot-name: LinkWalker robot-cover-url: http://www.seventwentyfour.com robot-details-url: http://www.seventwentyfour.com/tech.html robot-owner-name: Roy Bryant robot-owner-url: robot-owner-email: rbryant@seventwentyfour.com robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: linkwalker robot-noindex: yes robot-host: *.seventwentyfour.com robot-from: yes robot-useragent: LinkWalker robot-language: c++ robot-description: LinkWalker generates a database of links. We send reports of bad ones to webmasters. robot-history: Constructed late 1997 through April 1998. In full service April 1998. robot-environment: service modified-date: Wed, 22 Apr 1998 modified-by: Roy Bryant robot-id:lockon robot-name:Lockon robot-cover-url: robot-details-url: robot-owner-name:Seiji Sasazuka & Takahiro Ohmori robot-owner-url: robot-owner-email:search@rsch.tuis.ac.jp robot-status:active robot-purpose:indexing robot-type:standalone robot-platform:UNIX robot-availability:none robot-exclusion:yes robot-exclusion-useragent:Lockon robot-noindex:yes robot-host:*.hitech.tuis.ac.jp robot-from:yes robot-useragent:Lockon/xxxxx robot-language:perl5 robot-description:This robot gathers only HTML document. robot-history:This robot was developed in the Tokyo university of information sciences in 1998. robot-environment:research modified-date:Tue. 10 Nov 1998 20:00:00 GMT modified-by:Seiji Sasazuka & Takahiro Ohmori robot-id:logo_gif robot-name: logo.gif Crawler robot-cover-url: http://www.inm.de/projects/logogif.html robot-details-url: robot-owner-name: Sevo Stille robot-owner-url: http://www.inm.de/people/sevo robot-owner-email: sevo@inm.de robot-status: under development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: logo_gif_crawler robot-noindex: no robot-host: *.inm.de robot-from: yes robot-useragent: logo.gif crawler robot-language: perl robot-description: meta-indexing engine for corporate logo graphics The robot runs at irregular intervals and will only pull a start page and its associated /.*logo\.gif/i (if any). It will be terminated once a statistically significant number of samples has been collected. robot-history: logo.gif is part of the design diploma of Markus Weisbeck, and tries to analyze the abundance of the logo metaphor in WWW corporate design. The crawler and image database were written by Sevo Stille and Peter Frank of the Institut für Neue Medien, respectively. robot-environment: research, statistics modified-date: 25.5.97 modified-by: Sevo Stille robot-id: lycos robot-name: Lycos robot-cover-url: http://lycos.cs.cmu.edu/ robot-details-url: robot-owner-name: Dr. Michael L. Mauldin robot-owner-url: http://fuzine.mt.cs.cmu.edu/mlm/home.html robot-owner-email: fuzzy@cmu.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: fuzine.mt.cs.cmu.edu, lycos.com robot-from: robot-useragent: Lycos/x.x robot-language: robot-description: This is a research program in providing information retrieval and discovery in the WWW, using a finite memory model of the web to guide intelligent, directed searches for specific information needs robot-history: robot-environment: modified-date: modified-by: robot-id: macworm robot-name: Mac WWWWorm robot-cover-url: robot-details-url: robot-owner-name: Sebastien Lemieux robot-owner-url: robot-owner-email: lemieuse@ERE.UMontreal.CA robot-status: robot-purpose: indexing robot-type: robot-platform: Macintosh robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: hypercard robot-description: a French Keyword-searching robot for the Mac The author has decided not to release this robot to the public robot-history: robot-environment: modified-date: modified-by: robot-id: magpie robot-name: Magpie robot-cover-url: robot-details-url: robot-owner-name: Keith Jones robot-owner-url: robot-owner-email: Keith.Jones@blueberry.co.uk robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: *.blueberry.co.uk, 194.70.52.*, 193.131.167.144 robot-from: no robot-useragent: Magpie/1.0 robot-language: perl5 robot-description: Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites. robot-history: Part of a research project. Alpha testing from 10 July 1996, Beta testing from 10 September. robot-environment: research modified-date: Wed, 10 Oct 1996 13:15:00 GMT modified-by: Keith Jones robot-id: mediafox robot-name: MediaFox robot-cover-url: none robot-details-url: none robot-owner-name: Lars Eilebrecht robot-owner-url: http://www.home.unix-ag.org/sfx/ robot-owner-email: sfx@uni-media.de robot-status: development robot-purpose: indexing and maintenance robot-type: standalone robot-platform: (Java) robot-availability: none robot-exclusion: yes robot-exclusion-useragent: mediafox robot-noindex: yes robot-host: 141.99.*.* robot-from: yes robot-useragent: MediaFox/x.y robot-language: Java robot-description: The robot is used to index meta information of a specified set of documents and update a database accordingly. robot-history: Project at the University of Siegen robot-environment: research modified-date: Fri Aug 14 03:37:56 CEST 1998 modified-by: Lars Eilebrecht robot-id:merzscope robot-name:MerzScope robot-cover-url:http://www.merzcom.com robot-details-url:http://www.merzcom.com robot-owner-name:(Client based robot) robot-owner-url:(Client based robot) robot-owner-email: robot-status:actively in use robot-purpose:WebMapping robot-type:standalone robot-platform: (Java Based) unix,windows95,windowsNT,os2,mac etc .. robot-availability:binary robot-exclusion: yes robot-exclusion-useragent: MerzScope robot-noindex: no robot-host:(Client Based) robot-from: robot-useragent: MerzScope robot-language: java robot-description: Robot is part of a Web-Mapping package called MerzScope, to be used mainly by consultants, and web masters to create and publish maps, on and of the World wide web. robot-history: robot-environment: modified-date: Fri, 13 March 1997 16:31:00 modified-by: Philip Lenir, MerzScope lead developper robot-id: meshexplorer robot-name: NEC-MeshExplorer robot-cover-url: http://netplaza.biglobe.or.jp/ robot-details-url: http://netplaza.biglobe.or.jp/keyword.html robot-owner-name: web search service maintenance group robot-owner-url: http://netplaza.biglobe.or.jp/keyword.html robot-owner-email: web-dir@mxa.meshnet.or.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: NEC-MeshExplorer robot-noindex: no robot-host: meshsv300.tk.mesh.ad.jp robot-from: yes robot-useragent: NEC-MeshExplorer robot-language: c robot-description: The NEC-MeshExplorer robot is used to build database for the NETPLAZA search service operated by NEC Corporation. The robot searches URLs around sites in japan(JP domain). The robot runs every day, and visits sites in a random order. robot-history: Prototype version of this robot was developed in C&C Research Laboratories, NEC Corporation. Current robot (Version 1.0) is based on the prototype and has more functions. robot-environment: research modified-date: Jan 1, 1997 modified-by: Nobuya Kubo, Hajime Takano robot-id: momspider robot-name: MOMspider robot-cover-url: http://www.ics.uci.edu/WebSoft/MOMspider/ robot-details-url: robot-owner-name: Roy T. Fielding robot-owner-url: http://www.ics.uci.edu/dir/grad/Software/fielding robot-owner-email: fielding@ics.uci.edu robot-status: active robot-purpose: maintenance, statistics robot-type: standalone robot-platform: UNIX robot-availability: source robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: MOMspider/1.00 libwww-perl/0.40 robot-language: perl 4 robot-description: to validate links, and generate statistics. It's usually run from anywhere robot-history: Originated as a research project at the University of California, Irvine, in 1993. Presented at the First International WWW Conference in Geneva, 1994. robot-environment: modified-date: Sat May 6 08:11:58 1995 modified-by: fielding@ics.uci.edu robot-id: monster robot-name: Monster robot-cover-url: http://www.neva.ru/monster.list/russian.www.html robot-details-url: robot-owner-name: Dmitry Dicky robot-owner-url: http://wild.stu.neva.ru/ robot-owner-email: diwil@wild.stu.neva.ru robot-status: active robot-purpose: maintenance, mirroring robot-type: standalone robot-platform: UNIX (Linux) robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: wild.stu.neva.ru robot-from: robot-useragent: Monster/vX.X.X -$TYPE ($OSTYPE) robot-language: C robot-description: The Monster has two parts - Web searcher and Web analyzer. Searcher is intended to perform the list of WWW sites of desired domain (for example it can perform list of all WWW sites of mit.edu, com, org, etc... domain) In the User-agent field $TYPE is set to 'Mapper' for Web searcher and 'StAlone' for Web analyzer. robot-history: Now the full (I suppose) list of ex-USSR sites is produced. robot-environment: modified-date: Tue Jun 25 10:03:36 1996 modified-by: robot-id: motor robot-name: Motor robot-cover-url: http://www.cybercon.de/Motor/index.html robot-details-url: robot-owner-name: Mr. Oliver Runge, Mr. Michael Goeckel robot-owner-url: http://www.cybercon.de/index.html robot-owner-email: Motor@cybercon.technopark.gmd.de robot-status: developement robot-purpose: indexing robot-type: standalone robot-platform: mac robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Motor robot-noindex: no robot-host: Michael.cybercon.technopark.gmd.de robot-from: yes robot-useragent: Motor/0.2 robot-language: 4th dimension robot-description: The Motor robot is used to build the database for the www.webindex.de search service operated by CyberCon. The robot ios under development - it runs in random intervals and visits site in a priority driven order (.de/.ch/.at first, root and robots.txt first) robot-history: robot-environment: service modified-date: Wed, 3 Jul 1996 15:30:00 +0100 modified-by: Michael Goeckel (Michael@cybercon.technopark.gmd.de) robot-id: muscatferret robot-name: Muscat Ferret robot-cover-url: http://www.muscat.co.uk/euroferret/ robot-details-url: robot-owner-name: Olly Betts robot-owner-url: http://www.muscat.co.uk/~olly/ robot-owner-email: olly@muscat.co.uk robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: MuscatFerret robot-noindex: yes robot-host: 193.114.89.*, 194.168.54.11 robot-from: yes robot-useragent: MuscatFerret/ robot-language: c, perl5 robot-description: Used to build the database for the EuroFerret robot-history: robot-environment: service modified-date: Tue, 21 May 1997 17:11:00 GMT modified-by: olly@muscat.co.uk robot-id: mwdsearch robot-name: Mwd.Search robot-cover-url: (none) robot-details-url: (none) robot-owner-name: Antti Westerberg robot-owner-url: (none) robot-owner-email: Antti.Westerberg@mwd.sci.fi robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix (Linux) robot-availability: none robot-exclusion: yes robot-exclusion-useragent: MwdSearch robot-noindex: yes robot-host: *.fifi.net robot-from: no robot-useragent: MwdSearch/0.1 robot-language: perl5, c robot-description: Robot for indexing finnish (toplevel domain .fi) webpages for search engine called Fifi. Visits sites in random order. robot-history: (none) robot-environment: service (+ commercial)mwd.sci.fi> modified-date: Mon, 26 May 1997 15:55:02 EEST modified-by: Antti.Westerberg@mwd.sci.fi robot-id: netcarta robot-name: NetCarta WebMap Engine robot-cover-url: http://www.netcarta.com/ robot-details-url: robot-owner-name: NetCarta WebMap Engine robot-owner-url: http://www.netcarta.com/ robot-owner-email: info@netcarta.com robot-status: robot-purpose: indexing, maintenance, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: NetCarta CyberPilot Pro robot-language: C++. robot-description: The NetCarta WebMap Engine is a general purpose, commercial spider. Packaged with a full GUI in the CyberPilo Pro product, it acts as a personal spider to work with a browser to facilitiate context-based navigation. The WebMapper product uses the robot to manage a site (site copy, site diff, and extensive link management facilities). All versions can create publishable NetCarta WebMaps, which capture the crawled information. If the robot sees a published map, it will return the published map rather than continuing its crawl. Since this is a personal spider, it will be launched from multiple domains. This robot tends to focus on a particular site. No instance of the robot should have more than one outstanding request out to any given site at a time. The User-agent field contains a coded ID identifying the instance of the spider; specific users can be blocked via robots.txt using this ID. robot-history: robot-environment: modified-date: Sun Feb 18 02:02:49 1996. modified-by: robot-id: netmechanic robot-name: NetMechanic robot-cover-url: http://www.netmechanic.com robot-details-url: http://www.netmechanic.com/faq.html robot-owner-name: Tom Dahm robot-owner-url: http://iquest.com/~tdahm robot-owner-email: tdahm@iquest.com robot-status: development robot-purpose: Link and HTML validation robot-type: standalone with web gateway robot-platform: UNIX robot-availability: via web page robot-exclusion: Yes robot-exclusion-useragent: WebMechanic robot-noindex: no robot-host: 206.26.168.18 robot-from: no robot-useragent: NetMechanic robot-language: C robot-description: NetMechanic is a link validation and HTML validation robot run using a web page interface. robot-history: robot-environment: modified-date: Sat, 17 Aug 1996 12:00:00 GMT modified-by: robot-id: netscoop robot-name: NetScoop robot-cover-url: http://www-a2k.is.tokushima-u.ac.jp/search/index.html robot-owner-name: Kenji Kita robot-owner-url: http://www-a2k.is.tokushima-u.ac.jp/member/kita/index.html robot-owner-email: kita@is.tokushima-u.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: none robot-exclusion: yes robot-exclusion-useragent: NetScoop robot-host: alpha.is.tokushima-u.ac.jp, beta.is.tokushima-u.ac.jp robot-useragent: NetScoop/1.0 libwww/5.0a robot-language: C robot-description: The NetScoop robot is used to build the database for the NetScoop search engine. robot-history: The robot has been used in the research project at the Faculty of Engineering, Tokushima University, Japan., since Dec. 1996. robot-environment: research modified-date: Fri, 10 Jan 1997. modified-by: Kenji Kita robot-id: newscan-online robot-name: newscan-online robot-cover-url: http://www.newscan-online.de/ robot-details-url: http://www.newscan-online.de/info.html robot-owner-name: Axel Mueller robot-owner-url: robot-owner-email: mueller@newscan-online.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Linux robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: newscan-online robot-noindex: no robot-host: *newscan-online.de robot-from: yes robot-useragent: newscan-online/1.1 robot-language: perl robot-description: The newscan-online robot is used to build a database for the newscan-online news search service operated by smart information services. The robot runs daily and visits predefined sites in a random order. robot-history: This robot finds its roots in a prereleased software for news filtering for Lotus Notes in 1995. robot-environment: service modified-date: Fri, 9 Apr 1999 11:45:00 GMT modified-by: Axel Mueller robot-id: nhse robot-name: NHSE Web Forager robot-cover-url: http://nhse.mcs.anl.gov/ robot-details-url: robot-owner-name: Robert Olson robot-owner-url: http://www.mcs.anl.gov/people/olson/ robot-owner-email: olson@mcs.anl.gov robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: *.mcs.anl.gov robot-from: yes robot-useragent: NHSEWalker/3.0 robot-language: perl 5 robot-description: to generate a Resource Discovery database robot-history: robot-environment: modified-date: Fri May 5 15:47:55 1995 modified-by: robot-id: nomad robot-name: Nomad robot-cover-url: http://www.cs.colostate.edu/~sonnen/projects/nomad.html robot-details-url: robot-owner-name: Richard Sonnen robot-owner-url: http://www.cs.colostate.edu/~sonnen/ robot-owner-email: sonnen@cs.colostat.edu robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: *.cs.colostate.edu robot-from: no robot-useragent: Nomad-V2.x robot-language: Perl 4 robot-description: robot-history: Developed in 1995 at Colorado State University. robot-environment: modified-date: Sat Jan 27 21:02:20 1996. modified-by: robot-id: northstar robot-name: The NorthStar Robot robot-cover-url: http://comics.scs.unr.edu:7000/top.html robot-details-url: robot-owner-name: Fred Barrie robot-owner-url: robot-owner-email: barrie@unr.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: frognot.utdallas.edu, utdallas.edu, cnidir.org robot-from: yes robot-useragent: NorthStar robot-language: robot-description: Recent runs (26 April 94) will concentrate on textual analysis of the Web versus GopherSpace (from the Veronica data) as well as indexing. robot-history: robot-environment: modified-date: modified-by: robot-id: occam robot-name: Occam robot-cover-url: http://www.cs.washington.edu/research/projects/ai/www/occam/ robot-details-url: robot-owner-name: Marc Friedman robot-owner-url: http://www.cs.washington.edu/homes/friedman/ robot-owner-email: friedman@cs.washington.edu robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Occam robot-noindex: no robot-host: gentian.cs.washington.edu, sekiu.cs.washington.edu, saxifrage.cs.washington.edu robot-from: yes robot-useragent: Occam/1.0 robot-language: CommonLisp, perl4 robot-description: The robot takes high-level queries, breaks them down into multiple web requests, and answers them by combining disparate data gathered in one minute from numerous web sites, or from the robots cache. Currently the only user is me. robot-history: The robot is a descendant of Rodney, an earlier project at the University of Washington. robot-environment: research modified-date: Thu, 21 Nov 1996 20:30 GMT modified-by: friedman@cs.washington.edu (Marc Friedman) robot-id: octopus robot-name: HKU WWW Octopus robot-cover-url: http://phoenix.cs.hku.hk:1234/~jax/w3rui.shtml robot-details-url: robot-owner-name: Law Kwok Tung , Lee Tak Yeung , Lo Chun Wing robot-owner-url: http://phoenix.cs.hku.hk:1234/~jax robot-owner-email: jax@cs.hku.hk robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: phoenix.cs.hku.hk robot-from: yes robot-useragent: HKU WWW Robot, robot-language: Perl 5, C, Java. robot-description: HKU Octopus is an ongoing project for resource discovery in the Hong Kong and China WWW domain . It is a research project conducted by three undergraduate at the University of Hong Kong robot-history: robot-environment: modified-date: Thu Mar 7 14:21:55 1996. modified-by: robot-id: orb_search robot-name: Orb Search robot-cover-url: http://orbsearch.home.ml.org robot-details-url: http://orbsearch.home.ml.org robot-owner-name: Matt Weber robot-owner-url: http://www.weberworld.com robot-owner-email: webernet@geocities.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: Orbsearch/1.0 robot-noindex: yes robot-host: cow.dyn.ml.org, *.dyn.ml.org robot-from: yes robot-useragent: Orbsearch/1.0 robot-language: Perl5 robot-description: Orbsearch builds the database for Orb Search Engine. It runs when requested. robot-history: This robot was started as a hobby. robot-environment: hobby modified-date: Sun, 31 Aug 1997 02:28:52 GMT modified-by: Matt Weber robot-id: packrat robot-name: Pack Rat robot-cover-url: http://web.cps.msu.edu/~dexterte/isl/packrat.html robot-details-url: robot-owner-name: Terry Dexter robot-owner-url: http://web.cps.msu.edu/~dexterte robot-owner-email: dexterte@cps.msu.edu robot-status: development robot-purpose: both maintenance and mirroring robot-type: standalone robot-platform: unix robot-availability: at the moment, none...source when developed. robot-exclusion: yes robot-exclusion-useragent: packrat or * robot-noindex: no, not yet robot-host: cps.msu.edu robot-from: robot-useragent: PackRat/1.0 robot-language: perl with libwww-5.0 robot-description: Used for local maintenance and for gathering web pages so that local statisistical info can be used in artificial intelligence programs. Funded by NEMOnline. robot-history: In the making... robot-environment: research modified-date: Tue, 20 Aug 1996 15:45:11 modified-by: Terry Dexter robot-id: patric robot-name: Patric robot-cover-url: http://www.nwnet.net/technical/ITR/index.html robot-details-url: http://www.nwnet.net/technical/ITR/index.html robot-owner-name: toney@nwnet.net robot-owner-url: http://www.nwnet.net/company/staff/toney robot-owner-email: webmaster@nwnet.net robot-status: development robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: yes robot-exclusion-useragent: patric robot-noindex: yes robot-host: *.nwnet.net robot-from: no robot-useragent: Patric/0.01a robot-language: perl robot-description: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-history: (contained at http://www.nwnet.net/technical/ITR/index.html ) robot-environment: service modified-date: Thurs, 15 Aug 1996 modified-by: toney@nwnet.net robot-id: perignator robot-name: The Peregrinator robot-cover-url: http://www.maths.usyd.edu.au:8000/jimr/pe/Peregrinator.html robot-details-url: robot-owner-name: Jim Richardson robot-owner-url: http://www.maths.usyd.edu.au:8000/jimr.html robot-owner-email: jimr@maths.su.oz.au robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: Peregrinator-Mathematics/0.7 robot-language: perl 4 robot-description: This robot is being used to generate an index of documents on Web sites connected with mathematics and statistics. It ignores off-site links, so does not stray from a list of servers specified initially. robot-history: commenced operation in August 1994 robot-environment: modified-date: modified-by: robot-id: perlcrawler robot-name: PerlCrawler 1.0 robot-cover-url: http://perlsearch.hypermart.net/ robot-details-url: http://www.xav.com/scripts/xavatoria/index.html robot-owner-name: Matt McKenzie robot-owner-url: http://perlsearch.hypermart.net/ robot-owner-email: webmaster@perlsearch.hypermart.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: perlcrawler robot-noindex: yes robot-host: server5.hypermart.net robot-from: yes robot-useragent: PerlCrawler/1.0 Xavatoria/2.0 robot-language: perl5 robot-description: The PerlCrawler robot is designed to index and build a database of pages relating to the Perl programming language. robot-history: Originated in modified form on 25 June 1998 robot-environment: hobby modified-date: Fri, 18 Dec 1998 23:37:40 GMT modified-by: Matt McKenzie robot-id: phantom robot-name: Phantom robot-cover-url: http://www.maxum.com/phantom/ robot-details-url: robot-owner-name: Larry Burke robot-owner-url: http://www.aktiv.com/ robot-owner-email: lburke@aktiv.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: Macintosh robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Duppies robot-language: robot-description: Designed to allow webmasters to provide a searchable index of their own site as well as to other sites, perhaps with similar content. robot-history: robot-environment: modified-date: Fri Jan 19 05:08:15 1996. modified-by: robot-id: pioneer robot-name: Pioneer robot-cover-url: http://sequent.uncfsu.edu/~micah/pioneer.html robot-details-url: robot-owner-name: Micah A. Williams robot-owner-url: http://sequent.uncfsu.edu/~micah/ robot-owner-email: micah@sequent.uncfsu.edu robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: *.uncfsu.edu or flyer.ncsc.org robot-from: yes robot-useragent: Pioneer robot-language: C. robot-description: Pioneer is part of an undergraduate research project. robot-history: robot-environment: modified-date: Mon Feb 5 02:49:32 1996. modified-by: robot-id: pitkow robot-name: html_analyzer robot-cover-url: robot-details-url: robot-owner-name: James E. Pitkow robot-owner-url: robot-owner-email: pitkow@aries.colorado.edu robot-status: robot-purpose: maintainance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: to check validity of Web servers. I'm not sure if it has ever been run remotely. robot-history: robot-environment: modified-date: modified-by: robot-id: pka robot-name: PGP Key Agent robot-cover-url: http://www.starnet.it/pgp robot-details-url: robot-owner-name: Massimiliano Pucciarelli robot-owner-url: http://www.starnet.it/puma robot-owner-email: puma@comm2000.it robot-status: Active robot-purpose: indexing robot-type: standalone robot-platform: UNIX, Windows NT robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: salerno.starnet.it robot-from: yes robot-useragent: PGP-KA/1.2 robot-language: Perl 5 robot-description: This program search the pgp public key for the specified user. robot-history: Originated as a research project at Salerno University in 1995. robot-environment: Research modified-date: June 27 1996. modified-by: Massimiliano Pucciarelli robot-id: plumtreewebaccessor robot-name: PlumtreeWebAccessor robot-cover-url: robot-details-url: http://www.plumtree.com/ robot-owner-name: Joseph A. Stanko robot-owner-url: robot-owner-email: josephs@plumtree.com robot-status: development robot-purpose: indexing for the Plumtree Server robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: PlumtreeWebAccessor robot-noindex: yes robot-host: robot-from: yes robot-useragent: PlumtreeWebAccessor/0.9 robot-language: c++ robot-description: The Plumtree Web Accessor is a component that customers can add to the Plumtree Server to index documents on the World Wide Web. robot-history: robot-environment: commercial modified-date: Thu, 17 Dec 1998 modified-by: Joseph A. Stanko robot-id: Puu robot-name: GetterroboPlus Puu robot-details-url: http://marunaka.homing.net/straight/getter/ robot-cover-url: http://marunaka.homing.net/straight/ robot-owner-name: marunaka robot-owner-url: http://marunaka.homing.net robot-owner-email: marunaka@homing.net robot-status: active: robot actively in use robot-purpose: Purpose of the robot. One or more of: - gathering: gather data of original standerd TAG for Puu contains the information of the sites registered my Search Engin. - maintenance: link validation robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes (Puu patrols only registered url in my Search Engine) robot-exclusion-useragent: Getterrobo-Plus robot-noindex: no robot-host: straight FLASH!! Getterrobo-Plus, *.homing.net robot-from: yes robot-useragent: straight FLASH!! GetterroboPlus 1.5 robot-language: perl5 robot-description: Puu robot is used to gater data from registered site in Search Engin "straight FLASH!!" for building anouncement page of state of renewal of registered site in "straight FLASH!!". Robot runs everyday. robot-history: This robot patorols based registered sites in Search Engin "straight FLASH!!" robot-environment: hobby modified-date: Fri, 26 Jun 1998 robot-id: python robot-name: The Python Robot robot-cover-url: http://www.python.org/ robot-details-url: robot-owner-name: Guido van Rossum robot-owner-url: http://www.python.org/~guido/ robot-owner-email: guido@python.org robot-status: retired robot-purpose: robot-type: robot-platform: robot-availability: none robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-id: rbse robot-name: RBSE Spider robot-cover-url: http://rbse.jsc.nasa.gov/eichmann/urlsearch.html robot-details-url: robot-owner-name: David Eichmann robot-owner-url: http://rbse.jsc.nasa.gov/eichmann/home.html robot-owner-email: eichmann@rbse.jsc.nasa.gov robot-status: active robot-purpose: indexing, statistics robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: rbse.jsc.nasa.gov (192.88.42.10) robot-from: robot-useragent: robot-language: C, oracle, wais robot-description: Developed and operated as part of the NASA-funded Repository Based Software Engineering Program at the Research Institute for Computing and Information Systems, University of Houston - Clear Lake. robot-history: robot-environment: modified-date: Thu May 18 04:47:02 1995 modified-by: robot-id: resumerobot robot-name: Resume Robot robot-cover-url: http://www.onramp.net/proquest/resume/robot/robot.html robot-details-url: robot-owner-name: James Stakelum robot-owner-url: http://www.onramp.net/proquest/resume/java/resume.html robot-owner-email: proquest@onramp.net robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: Resume Robot robot-language: C++. robot-description: robot-history: robot-environment: modified-date: Tue Mar 12 15:52:25 1996. modified-by: robot-id: rhcs robot-name: RoadHouse Crawling System robot-cover-url: http://stage.perceval.be (under developpement) robot-details-url: robot-owner-name1: Gregoire Welraeds robot-owner-name2: Emmanuel Bergmans robot-owner-url: http://www.perceval.be robot-owner-email1: stage@perceval.be robot-owner-email2: helpdesk@perceval.be robot-status: development robot-purpose1: indexing robot-purpose2: maintenance robot-purpose3: statistics robot-type: standalone robot-platform1: unix (FreeBSD & Linux) robot-availability: none robot-exclusion: no (under development) robot-exclusion-useragent: RHCS robot-noindex: no (under development) robot-host: stage.perceval.be robot-from: no robot-useragent: RHCS/1.0a robot-language: c robot-description: robot used tp build the database for the RoadHouse search service project operated by Perceval robot-history: The need of this robot find its roots in the actual RoadHouse directory not maintenained since 1997 robot-environment: service modified-date: Fri, 26 Feb 1999 12:00:00 GMT modified-by: Gregoire Welraeds robot-id: roadrunner robot-name: Road Runner: The ImageScape Robot robot-owner-name: LIM Group robot-owner-email: lim@cs.leidenuniv.nl robot-status: development/active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-exclusion: yes robot-exclusion-useragent: roadrunner robot-useragent: Road Runner: ImageScape Robot (lim@cs.leidenuniv.nl) robot-language: C, perl5 robot-description: Create Image/Text index for WWW robot-history: ImageScape Project robot-environment: commercial service modified-date: Dec. 1st, 1996 robot-id: robbie robot-name: Robbie the Robot robot-cover-url: robot-details-url: robot-owner-name: Robert H. Pollack robot-owner-url: robot-owner-email: robert.h.pollack@lmco.com robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows95, windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Robbie robot-noindex: no robot-host: *.lmco.com robot-from: yes robot-useragent: Robbie/0.1 robot-language: java robot-description: Used to define document collections for the DISCO system. Robbie is still under development and runs several times a day, but usually only for ten minutes or so. Sites are visited in the order in which references are found, but no host is visited more than once in any two-minute period. robot-history: The DISCO system is a resource-discovery component in the OLLA system, which is a prototype system, developed under DARPA funding, to support computer-based education and training. robot-environment: research modified-date: Wed, 5 Feb 1997 19:00:00 GMT modified-by: robot-id: robi robot-name: ComputingSite Robi/1.0 robot-cover-url: http://www.computingsite.com/robi/ robot-details-url: http://www.computingsite.com/robi/ robot-owner-name: Tecor Communications S.L. robot-owner-url: http://www.tecor.com/ robot-owner-email: robi@computingsite.com robot-status: Active robot-purpose: indexing,maintenance robot-type: standalone robot-platform: UNIX robot-availability: robot-exclusion: yes robot-exclusion-useragent: robi robot-noindex: no robot-host: robi.computingsite.com robot-from: robot-useragent: ComputingSite Robi/1.0 (robi@computingsite.com) robot-language: python robot-description: Intelligent agent used to build the ComputingSite Search Directory. robot-history: It was born on August 1997. robot-environment: service modified-date: Wed, 13 May 1998 17:28:52 GMT modified-by: Jorge Alegre robot-id: roverbot robot-name: Roverbot robot-cover-url: http://www.roverbot.com/ robot-details-url: robot-owner-name: GlobalMedia Design (Andrew Cowan & Brian Clark) robot-owner-url: http://www.radzone.org/gmd/ robot-owner-email: gmd@spyder.net robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: roverbot.com robot-from: yes robot-useragent: Roverbot robot-language: perl5 robot-description: Targeted email gatherer utilizing user-defined seed points and interacting with both the webserver and MX servers of remote sites. robot-history: robot-environment: modified-date: Tue Jun 18 19:16:31 1996. modified-by: robot-id: safetynetrobot robot-name: SafetyNet Robot robot-cover-url: http://www.urlabs.com/ robot-details-url: robot-owner-name: Michael L. Nelson robot-owner-url: http://www.urlabs.com/ robot-owner-email: m.l.nelson@urlabs.com robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: *.urlabs.com robot-from: yes robot-useragent: SafetyNet Robot 0.1, robot-language: Perl 5 robot-description: Finds URLs for K-12 content management. robot-history: robot-environment: modified-date: Sat Mar 23 20:12:39 1996. modified-by: robot-id: scooter robot-name: Scooter robot-cover-url: http://www.altavista.com/ robot-details-url: http://www.altavista.com/av/content/addurl.htm robot-owner-name: AltaVista robot-owner-url: http://www.altavista.com/ robot-owner-email: scooter@pa.dec.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Scooter robot-noindex: yes robot-host: *.av.pa-x.dec.com robot-from: yes robot-useragent: Scooter/2.0 G.R.A.B. V1.1.0 robot-language: c robot-description: Scooter is AltaVista's prime index agent. robot-history: Version 2 of Scooter/1.0 developed by Louis Monier of WRL. robot-environment: service modified-date: Wed, 13 Jan 1999 17:18:59 GMT modified-by: steves@avs.dec.com robot-id: search_au robot-name: Search.Aus-AU.COM robot-details-url: http://Search.Aus-AU.COM/ robot-cover-url: http://Search.Aus-AU.COM/ robot-owner-name: Dez Blanchfield robot-owner-url: not currently available robot-owner-email: dez@geko.com robot-status: - development: robot under development robot-purpose: - indexing: gather content for an indexing service robot-type: - standalone: a separate program robot-platform: - mac - unix - windows95 - windowsNT robot-availability: - none robot-exclusion: yes robot-exclusion-useragent: Search-AU robot-noindex: yes robot-host: Search.Aus-AU.COM, 203.55.124.29, 203.2.239.29 robot-from: no robot-useragent: not available robot-language: c, perl, sql robot-description: Search-AU is a development tool I have built to investigate the power of a search engine and web crawler to give me access to a database of web content ( html / url's ) and address's etc from which I hope to build more accurate stats about the .au zone's web content. the robot started crawling from http://www.geko.net.au/ on march 1st, 1998 and after nine days had 70mb of compressed ascii in a database to work with. i hope to run a refresh of the crawl every month initially, and soon every week bandwidth and cpu allowing. if the project warrants further development, i will turn it into an australian ( .au ) zone search engine and make it commercially available for advertising to cover the costs which are starting to mount up. --dez (980313 - black friday!) robot-environment: - hobby: written as a hobby modified-date: Fri Mar 13 10:03:32 EST 1998 robot-id: senrigan robot-name: Senrigan robot-cover-url: http://www.info.waseda.ac.jp/search-e.html robot-details-url: robot-owner-name: TAMURA Kent robot-owner-url: http://www.info.waseda.ac.jp/muraoka/members/kent/ robot-owner-email: kent@muraoka.info.waseda.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Java robot-availability: none robot-exclusion: yes robot-exclusion-useragent:Senrigan robot-noindex: yes robot-host: aniki.olu.info.waseda.ac.jp robot-from: yes robot-useragent: Senrigan/xxxxxx robot-language: Java robot-description: This robot now gets HTMLs from only jp domain. robot-history: It has been running since Dec 1994 robot-environment: research modified-date: Mon Jul 1 07:30:00 GMT 1996 modified-by: TAMURA Kent robot-id: sgscout robot-name: SG-Scout robot-cover-url: http://www-swiss.ai.mit.edu/~ptbb/SG-Scout/SG-Scout.html robot-details-url: robot-owner-name: Peter Beebee robot-owner-url: http://www-swiss.ai.mit.edu/~ptbb/personal/index.html robot-owner-email: ptbb@ai.mit.edu, beebee@parc.xerox.com robot-status: active robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: beta.xerox.com robot-from: yes robot-useragent: SG-Scout robot-language: robot-description: Does a "server-oriented" breadth-first search in a round-robin fashion, with multiple processes. robot-history: Run since 27 June 1994, for an internal XEROX research project robot-environment: modified-date: modified-by: robot-id: shaihulud robot-name: Shai'Hulud robot-cover-url: robot-details-url: robot-owner-name: Dimitri Khaoustov robot-owner-url: robot-owner-email: shawdow@usa.net robot-status: active robot-purpose: mirroring robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: *.rdtex.ru robot-from: robot-useragent: Shai'Hulud robot-language: C robot-description: Used to build mirrors for internal use robot-history: This robot finds its roots in a research project at RDTeX Perspective Projects Group in 1996 robot-environment: research modified-date: Mon, 5 Aug 1996 14:35:08 GMT modified-by: Dimitri Khaoustov robot-id: simbot robot-name: Simmany Robot Ver1.0 robot-cover-url: http://simmany.hnc.net/ robot-details-url: http://simmany.hnc.net/irman1.html robot-owner-name: Youngsik, Lee(@L?5=D) robot-owner-url: robot-owner-email: ailove@hnc.co.kr robot-status: development & active robot-purpose: indexing, maintenance, statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: SimBot robot-noindex: no robot-host: sansam.hnc.net robot-from: no robot-useragent: SimBot/1.0 robot-language: C robot-description: The Simmany Robot is used to build the Map(DB) for the simmany service operated by HNC(Hangul & Computer Co., Ltd.). The robot runs weekly, and visits sites that have a useful korean information in a defined order. robot-history: This robot is a part of simmany service and simmini products. The simmini is the Web products that make use of the indexing and retrieving modules of simmany. robot-environment: service, commercial modified-date: Thu, 19 Sep 1996 07:02:26 GMT modified-by: Youngsik, Lee robot-id: sitegrabber robot-name: Open Text Index Robot robot-cover-url: http://index.opentext.net/main/faq.html robot-details-url: http://index.opentext.net/OTI_Robot.html robot-owner-name: John Faichney robot-owner-url: robot-owner-email: faichney@opentext.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: UNIX robot-availability: inquire to markk@opentext.com (Mark Kraatz) robot-exclusion: yes robot-exclusion-useragent: Open Text Site Crawler robot-noindex: no robot-host: *.opentext.com robot-from: yes robot-useragent: Open Text Site Crawler V1.0 robot-language: perl/C robot-description: This robot is run by Open Text Corporation to produce the data for the Open Text Index robot-history: Started in May/95 to replace existing Open Text robot which was based on libwww robot-environment: commercial modified-date: Fri Jul 25 11:46:56 EDT 1997 modified-by: John Faichney robot-id: sitetech robot-name: SiteTech-Rover robot-cover-url: http://www.sitetech.com/ robot-details-url: robot-owner-name: Anil Peres-da-Silva robot-owner-url: http://www.sitetech.com robot-owner-email: adasilva@sitetech.com robot-status: robot-purpose: indexing robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: SiteTech-Rover robot-language: C++. robot-description: Originated as part of a suite of Internet Products to organize, search & navigate Intranet sites and to validate links in HTML documents. robot-history: This robot originally went by the name of LiberTech-Rover robot-environment: modified-date: Fri Aug 9 17:06:56 1996. modified-by: Anil Peres-da-Silva robot-id: slurp robot-name: Inktomi Slurp robot-cover-url: http://www.inktomi.com/ robot-details-url: http://www.inktomi.com/slurp.html robot-owner-name: Inktomi Corporation robot-owner-url: http://www.inktomi.com/ robot-owner-email: slurp@inktomi.com robot-status: active robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: slurp robot-noindex: yes robot-host: *.inktomi.com robot-from: yes robot-useragent: Slurp/2.0 robot-language: C/C++ robot-description: Indexing documents for the HotBot search engine (www.hotbot.com), collecting Web statistics robot-history: Switch from Slurp/1.0 to Slurp/2.0 November 1996 robot-environment: service modified-date: Fri Feb 28 13:57:43 PST 1997 modified-by: slurp@inktomi.com robot-id: smartspider robot-name: Smart Spider robot-cover-url: http://www.travel-finder.com robot-details-url: http://www.engsoftware.com/robots.htm robot-owner-name: Ken Wadland robot-owner-url: http://www.engsoftware.com robot-owner-email: ken@engsoftware.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows95, windowsNT robot-availability: data, binary, source robot-exclusion: Yes robot-exclusion-useragent: ESI robot-noindex: Yes robot-host: 207.16.241.* robot-from: Yes robot-useragent: ESISmartSpider/2.0 robot-language: C++ robot-description: Classifies sites using a Knowledge Base. Robot collects web pages which are then parsed and feed to the Knowledge Base. The Knowledge Base classifies the sites into any of hundreds of categories based on the vocabulary used. Currently used by: //www.travel-finder.com (Travel and Tourist Info) and //www.golightway.com (Christian Sites). Several options exist to control whether sites are discovered and/or classified fully automatically, full manually or somewhere in between. robot-history: Feb '96 -- Product design begun. May '96 -- First data results published by Travel-Finder. Oct '96 -- Generalized and announced and a product for other sites. Jan '97 -- First data results published by GoLightWay. robot-environment: service, commercial modified-date: Mon, 13 Jan 1997 10:41:00 EST modified-by: Ken Wadland robot-id: snooper robot-name: Snooper robot-cover-url: http://darsun.sit.qc.ca robot-details-url: robot-owner-name: Isabelle A. Melnick robot-owner-url: robot-owner-email: melnicki@sit.ca robot-status: part under development and part active robot-purpose: robot-type: robot-platform: robot-availability: none robot-exclusion: yes robot-exclusion-useragent: snooper robot-noindex: robot-host: robot-from: robot-useragent: Snooper/b97_01 robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-id: solbot robot-name: Solbot robot-cover-url: http://kvasir.sol.no/ robot-details-url: robot-owner-name: Frank Tore Johansen robot-owner-url: robot-owner-email: ftj@sys.sol.no robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: solbot robot-noindex: yes robot-host: robot*.sol.no robot-from: robot-useragent: Solbot/1.0 LWP/5.07 robot-language: perl, c robot-description: Builds data for the Kvasir search service. Only searches sites which ends with one of the following domains: "no", "se", "dk", "is", "fi"robot-history: This robot is the result of a 3 years old late night hack when the Verity robot (of that time) was unable to index sites with iso8859 characters (in URL and other places), and we just _had_ to have something up and going the next day... robot-environment: service modified-date: Tue Apr 7 16:25:05 MET DST 1998 modified-by: Frank Tore Johansen robot-id: spanner robot-name: Spanner robot-cover-url: http://www.kluge.net/NES/spanner/ robot-details-url: http://www.kluge.net/NES/spanner/ robot-owner-name: Theo Van Dinter robot-owner-url: http://www.kluge.net/~felicity/ robot-owner-email: felicity@kluge.net robot-status: development robot-purpose: indexing,maintenance robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: Spanner robot-noindex: yes robot-host: *.kluge.net robot-from: yes robot-useragent: Spanner/1.0 (Linux 2.0.27 i586) robot-language: perl robot-description: Used to index/check links on an intranet. robot-history: Pet project of the author since beginning of 1996. robot-environment: hobby modified-date: Mon, 06 Jan 1997 00:00:00 GMT modified-by: felicity@kluge.net robot-id: spiderbot robot-name: SpiderBot 1.0 - P.F.C. "Recuperador p.ginas Web" de Ignacio Cruzado Nu.o (U.B.U.) robot-cover-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/cover.htm robot-details-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/details.htm robot-owner-name: Ignacio Cruzado Nu.o : Student of "Computer Engineering" at Burgos University(Spain) robot-owner-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/icruzadn.htm robot-owner-email: spidrbot@solaria.emp.ubu.es robot-status: development robot-purpose: indexing robot-type: standalone, browser robot-platform: unix, windows, windows95 robot-availability: source, binary, data robot-exclusion: yes robot-exclusion-useragent: yes robot-noindex: yes robot-host: * robot-from: yes robot-useragent: yes robot-language: C++ robot-description: Recovers Web Pages and saves them on your hard disk. Then it reindexes them. robot-history: This Robot belongs to Ignacio Cruzado Nu.o University Degree to obtain the titulation of "Management Informatics Engineering" in the Burgos University (Spain) robot-environment: research modified-date: Mon, 29 Dec 1998 10:00:00 GMT modified-by: Ignacio Cruzado Nu.o robot-id: spry robot-name: Spry Wizard Robot robot-cover-url: http://www.spry.com/wizard/index.html robot-details-url: robot-owner-name: spry robot-owner-url: ttp://www.spry.com/index.html robot-owner-email: info@spry.com robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: wizard.spry.com or tiger.spry.com robot-from: no robot-useragent: no robot-language: robot-description: Its purpose is to generate a Resource Discovery database Spry is refusing to give any comments about this robot robot-history: robot-environment: modified-date: Tue Jul 11 09:29:45 GMT 1995 modified-by: robot-id: ssearcher robot-name: Site Searcher robot-cover-url: www.satacoy.com robot-details-url: www.satacoy.com robot-owner-name: Zackware robot-owner-url: www.satacoy.com robot-owner-email: zackware@hotmail.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: winows95, windows98, windowsNT robot-availability: binary robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: ssearcher100 robot-language: C++ robot-description: Site Searcher scans web sites for specific file types. (JPG, MP3, MPG, etc) robot-history: Released 4/4/1999 robot-environment: hobby modified-date: 04/26/1999 robot-id: suke robot-name: Suke robot-cover-url: http://www.kuro.net/robot/index.ja.html robot-details-url: http://www.kuro.net/robot/index.ja.html robot-owner-name: Yosuke Kuroda robot-owner-url: http://www.kuro.net/~yosuke/ robot-owner-email: robot@kuro.net robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: FreeBSD3.* robot-availability: source robot-exclusion: yes robot-exclusion-useragent: suke robot-noindex: no robot-host: * robot-from: yes robot-useragent: suke/*.* robot-language: c robot-description: This robot visits mainly sites in japan. robot-history: since 1999 robot-environment: service modified-date: Thu Dec 31 20:06:00 JST 1998 modified-by: Yosuke Kuroda robot-id: sven robot-name: Sven robot-cover-url: robot-details-url: http://marty.weathercity.com/sven/ robot-owner-name: Marty Anstey robot-owner-url: http://marty.weathercity.com/ robot-owner-email: rhondle@home.com robot-status: Active robot-purpose: indexing robot-type: standalone robot-platform: Windows robot-availability: none robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: 24.113.12.29 robot-from: no robot-useragent: robot-language: VB5 robot-description: Used to gather sites for netbreach.com. Runs constantly. robot-history: Developed as an experiment in web indexing. robot-environment: hobby, service modified-date: Tue, 3 Mar 1999 08:15:00 PST modified-by: Marty Anstey robot-id: tach_bw robot-name: TACH Black Widow robot-cover-url: http://theautochannel.com/~mjenn/bw.html robot-details-url: http://theautochannel.com/~mjenn/bw-syntax.html robot-owner-name: Michael Jennings robot-owner-url: http://www.spd.louisville.edu/~mejenn01/ robot-owner-email: mjenn@theautochannel.com robot-status: development robot-purpose: maintenance: link validation robot-type: standalone robot-platform: UNIX, Linux robot-availability: none robot-exclusion: yes robot-exclusion-useragent: tach_bw robot-noindex: no robot-host: *.theautochannel.com robot-from: yes robot-useragent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31 1997 12:25:00 robot-language: C/C++ robot-description: Exhaustively recurses a single site to check for broken links robot-history: Corporate application begun in 1996 for The Auto Channel robot-environment: commercial modified-date: Thu, Jan 23 1997 23:09:00 GMT modified-by: Michael Jennings robot-id:tarantula robot-name: Tarantula robot-cover-url: http://www.nathan.de/nathan/software.html#TARANTULA robot-details-url: http://www.nathan.de/ robot-owner-name: Markus Hoevener robot-owner-url: robot-owner-email: Markus.Hoevener@evision.de robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: yes robot-noindex: yes robot-host: yes robot-from: no robot-useragent: Tarantula/1.0 robot-language: C robot-description: Tarantual gathers information for german search engine Nathanrobot-history: Started February 1997 robot-environment: service modified-date: Mon, 29 Dec 1997 15:30:00 GMT modified-by: Markus Hoevener robot-id: tarspider robot-name: tarspider robot-cover-url: robot-details-url: robot-owner-name: Olaf Schreck robot-owner-url: http://www.chemie.fu-berlin.de/user/chakl/ChaklHome.html robot-owner-email: chakl@fu-berlin.de robot-status: robot-purpose: mirroring robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: chakl@fu-berlin.de robot-useragent: tarspider robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-id: tcl robot-name: Tcl W3 Robot robot-cover-url: http://hplyot.obspm.fr/~dl/robo.html robot-details-url: robot-owner-name: Laurent Demailly robot-owner-url: http://hplyot.obspm.fr/~dl/ robot-owner-email: dl@hplyot.obspm.fr robot-status: robot-purpose: maintenance, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: hplyot.obspm.fr robot-from: yes robot-useragent: dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/) robot-language: tcl robot-description: Its purpose is to validate links, and generate statistics. robot-history: robot-environment: modified-date: Tue May 23 17:51:39 1995 modified-by: robot-id: techbot robot-name: TechBOT robot-cover-url: http://www.techaid.net/ robot-details-url: http://www.echaid.net/TechBOT/ robot-owner-name: TechAID Internet Services robot-owner-url: http://www.techaid.net/ robot-owner-email: techbot@techaid.net robot-status: active robot-purpose:statistics, maintenance robot-type: standalone robot-platform: Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: TechBOT robot-noindex: yes robot-host: techaid.net robot-from: yes robot-useragent: TechBOT robot-language: perl5 robot-description: TechBOT is constantly upgraded. Currently he is used for Link Validation, Load Time, HTML Validation and much much more. robot-history: TechBOT started his life as a Page Change Detection robot, but has taken on many new and exciting roles. robot-environment: service modified-date: Sat, 18 Dec 1998 14:26:00 EST modified-by: techbot@techaid.net robot-id: templeton robot-name: Templeton robot-cover-url: http://www.bmtmicro.com/catalog/tton/ robot-details-url: http://www.bmtmicro.com/catalog/tton/ robot-owner-name: Neal Krawetz robot-owner-url: http://www.cs.tamu.edu/people/nealk/ robot-owner-email: nealk@net66.com robot-status: active robot-purpose: mirroring, mapping, automating web applications robot-type: standalone robot-platform: OS/2, Linux, SunOS, Solaris robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: templeton robot-noindex: no robot-host: * robot-from: yes robot-useragent: Templeton/{version} for {platform} robot-language: C robot-description: Templeton is a very configurable robots for mirroring, mapping, and automating applications on retrieved documents. robot-history: This robot was originally created as a test-of-concept. robot-environment: service, commercial, research, hobby modified-date: Sun, 6 Apr 1997 10:00:00 GMT modified-by: Neal Krawetz robot-id: titin robot-name: TitIn robot-cover-url: http://www.foi.hr/~dpavlin/titin/ robot-details-url: http://www.foi.hr/~dpavlin/titin/tehnical.htm robot-owner-name: Dobrica Pavlinusic robot-owner-url: http://www.foi.hr/~dpavlin/ robot-owner-email: dpavlin@foi.hr robot-status: development robot-purpose: indexing, statistics robot-type: standalone robot-platform: unix robot-availability: data, source on request robot-exclusion: yes robot-exclusion-useragent: titin robot-noindex: no robot-host: barok.foi.hr robot-from: no robot-useragent: TitIn/0.2 robot-language: perl5, c robot-description: The TitIn is used to index all titles of Web server in .hr domain. robot-history: It was done as result of desperate need for central index of Croatian web servers in December 1996. robot-environment: research modified-date: Thu, 12 Dec 1996 16:06:42 MET modified-by: Dobrica Pavlinusic robot-id: titan robot-name: TITAN robot-cover-url: http://isserv.tas.ntt.jp/chisho/titan-e.html robot-details-url: http://isserv.tas.ntt.jp/chisho/titan-help/eng/titan-help-e.html robot-owner-name: Yoshihiko HAYASHI robot-owner-url: robot-owner-email: hayashi@nttnly.isl.ntt.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: SunOS 4.1.4 robot-availability: no robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: nlptitan.isl.ntt.jp robot-from: yes robot-useragent: TITAN/0.1 robot-language: perl 4 robot-description: Its purpose is to generate a Resource Discovery database, and copy document trees. Our primary goal is to develop an advanced method for indexing the WWW documents. Uses libwww-perl robot-history: robot-environment: modified-date: Mon Jun 24 17:20:44 PDT 1996 modified-by: Yoshihiko HAYASHI robot-id: tkwww robot-name: The TkWWW Robot robot-cover-url: http://fang.cs.sunyit.edu/Robots/tkwww.html robot-details-url: robot-owner-name: Scott Spetka robot-owner-url: http://fang.cs.sunyit.edu/scott/scott.html robot-owner-email: scott@cs.sunyit.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: It is designed to search Web neighborhoods to find pages that may be logically related. The Robot returns a list of links that looks like a hot list. The search can be by key word or all links at a distance of one or two hops may be returned. The TkWWW Robot is described in a paper presented at the WWW94 Conference in Chicago. robot-history: robot-environment: modified-date: modified-by: robot-id: ucsd robot-name: UCSD Crawl robot-cover-url: http://www.mib.org/~ucsdcrawl robot-details-url: robot-owner-name: Adam Tilghman robot-owner-url: http://www.mib.org/~atilghma robot-owner-email: atilghma@mib.org robot-status: robot-purpose: indexing, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: nuthaus.mib.org scilib.ucsd.edu robot-from: yes robot-useragent: UCSD-Crawler robot-language: Perl 4 robot-description: Should hit ONLY within UC San Diego - trying to count servers here. robot-history: robot-environment: modified-date: Sat Jan 27 09:21:40 1996. modified-by: robot-id: urlck robot-name: URL Check robot-cover-url: http://www.cutternet.com/products/webcheck.html robot-details-url: http://www.cutternet.com/products/urlck.html robot-owner-name: Dave Finnegan robot-owner-url: http://www.cutternet.com robot-owner-email: dave@cutternet.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: urlck robot-noindex: no robot-host: * robot-from: yes robot-useragent: urlck/1.2.3 robot-language: c robot-description: The robot is used to manage, maintain, and modify web sites. It builds a database detailing the site, builds HTML reports describing the site, and can be used to up-load pages to the site or to modify existing pages and URLs within the site. It can also be used to mirror whole or partial sites. It supports HTTP, File, FTP, and Mailto schemes. robot-history: Originally designed to validate URLs. robot-environment: commercial modified-date: July 9, 1997 modified-by: Dave Finnegan robot-id: US robot-name: URL Spider Pro robot-cover-url: http://www.infostreak.com/us.htm robot-details-url: http://www.infostreak.com/us.htm robot-owner-name: Infostreak Software robot-owner-url: http://www.infostreak.com robot-owner-email: greg@infostreak.com robot-status: active robot-purpose: indexing: gather content for an indexing service robot-type: standalone: a separate program robot-platform: windows95, windowsNT robot-availability: binary: binary form available robot-exclusion: no robot-exclusion-useragent: robot-noindex: yes robot-host: * robot-from: no robot-useragent: URL Spider Pro/1.5 robot-language: delphi robot-description: URL Spider Pro builds Targeted Search Engines robot-history: Project started in July 1998 robot-environment: commercial: is a commercial product modified-date: Tue, 02 Mar 1999 17:28:52 GMT modified-by: Infostreak Software robot-id: valkyrie robot-name: Valkyrie robot-cover-url: http://kichijiro.c.u-tokyo.ac.jp/odin/ robot-details-url: http://kichijiro.c.u-tokyo.ac.jp/odin/robot.html robot-owner-name: Masanori Harada robot-owner-url: http://www.graco.c.u-tokyo.ac.jp/~harada/ robot-owner-email: harada@graco.c.u-tokyo.ac.jp robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Valkyrie libwww-perl robot-noindex: no robot-host: *.c.u-tokyo.ac.jp robot-from: yes robot-useragent: Valkyrie/1.0 libwww-perl/0.40 robot-language: perl4 robot-description: used to collect resources from Japanese Web sites for ODIN search engine. robot-history: This robot has been used since Oct. 1995 for author's research. robot-environment: service research modified-date: Thu Mar 20 19:09:56 JST 1997 modified-by: harada@graco.c.u-tokyo.ac.jp robot-id: victoria robot-name: Victoria robot-cover-url: robot-details-url: robot-owner-name: Adrian Howard robot-owner-url: robot-owner-email: adrianh@oneworld.co.uk robot-status: development robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Victoria robot-noindex: yes robot-host: robot-from: robot-useragent: Victoria/1.0 robot-language: perl,c robot-description: Victoria is part of a groupware produced by Victoria Real Ltd. (voice: +44 [0]1273 774469, fax: +44 [0]1273 779960 email: victoria@pavilion.co.uk). Victoria is used to monitor changes in W3 documents, both intranet and internet based. Contact Victoria Real for more information. robot-history: robot-environment: commercial modified-date: Fri, 22 Nov 1996 16:45 GMT modified-by: victoria@pavilion.co.uk robot-id: visionsearch robot-name: vision-search robot-cover-url: http://www.ius.cs.cmu.edu/cgi-bin/vision-search robot-details-url: robot-owner-name: Henry A. Rowley robot-owner-url: http://www.cs.cmu.edu/~har robot-owner-email: har@cs.cmu.edu robot-status: robot-purpose: indexing. robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: dylan.ius.cs.cmu.edu robot-from: no robot-useragent: vision-search/3.0' robot-language: Perl 5 robot-description: Intended to be an index of computer vision pages, containing all pages within n links (for some small n) of the Vision Home Page robot-history: robot-environment: modified-date: Fri Mar 8 16:03:04 1996 modified-by: robot-id: voyager robot-name: Voyager robot-cover-url: http://www.lisa.co.jp/voyager/ robot-details-url: robot-owner-name: Voyager Staff robot-owner-url: http://www.lisa.co.jp/voyager/ robot-owner-email: voyager@lisa.co.jp robot-status: development robot-purpose: indexing, maintenance robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Voyager robot-noindex: no robot-host: *.lisa.co.jp robot-from: yes robot-useragent: Voyager/0.0 robot-language: perl5 robot-description: This robot is used to build the database for the Lisa Search service. The robot manually launch and visits sites in a random order. robot-history: robot-environment: service modified-date: Mon, 30 Nov 1998 08:00:00 GMT modified-by: Hideyuki Ezaki robot-id: vwbot robot-name: VWbot robot-cover-url: http://vancouver-webpages.com/VWbot/ robot-details-url: http://vancouver-webpages.com/VWbot/aboutK.shtml robot-owner-name: Andrew Daviel robot-owner-url: http://vancouver-webpages.com/~admin/ robot-owner-email: andrew@vancouver-webpages.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: VWbot_K robot-noindex: yes robot-host: vancouver-webpages.com robot-from: yes robot-useragent: VWbot_K/4.2 robot-language: perl4 robot-description: Used to index BC sites for the searchBC database. Runs daily. robot-history: Originally written fall 1995. Actively maintained. robot-environment: service commercial research modified-date: Tue, 4 Mar 1997 20:00:00 GMT modified-by: Andrew Daviel robot-id: w3index robot-name: The NWI Robot robot-cover-url: http://www.ub2.lu.se/NNC/projects/NWI/the_nwi_robot.html robot-owner-name: Sigfrid Lundberg, Lund university, Sweden robot-owner-url: http://nwi.ub2.lu.se/~siglun robot-owner-email: siglun@munin.ub2.lu.se robot-status: active robot-purpose: discovery,statistics robot-type: standalone robot-platform: UNIX robot-availability: none (at the moment) robot-exclusion: yes robot-noindex: No robot-host: nwi.ub2.lu.se, mars.dtv.dk and a few others robot-from: yes robot-useragent: w3index robot-language: perl5 robot-description: A resource discovery robot, used primarily for the indexing of the Scandinavian Web robot-history: It is about a year or so old. Written by Anders Ard–, Mattias Borrell, HÂkan Ard– and myself. robot-environment: service,research modified-date: Wed Jun 26 13:58:04 MET DST 1996 modified-by: Sigfrid Lundberg robot-id: w3m2 robot-name: W3M2 robot-cover-url: http://tronche.com/W3M2 robot-details-url: robot-owner-name: Christophe Tronche robot-owner-url: http://tronche.com/ robot-owner-email: tronche@lri.fr robot-status: robot-purpose: indexing, maintenance, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: yes robot-useragent: W3M2/x.xxx robot-language: Perl 4, Perl 5, and C++ robot-description: to generate a Resource Discovery database, validate links, validate HTML, and generate statistics robot-history: robot-environment: modified-date: Fri May 5 17:48:48 1995 modified-by: robot-id: wanderer robot-name: the World Wide Web Wanderer robot-cover-url: http://www.mit.edu/people/mkgray/net/ robot-details-url: robot-owner-name: Matthew Gray robot-owner-url: http://www.mit.edu:8001/people/mkgray/mkgray.html robot-owner-email: mkgray@mit.edu robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: data robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: *.mit.edu robot-from: robot-useragent: WWWWanderer v3.0 robot-language: perl4 robot-description: Run initially in June 1993, its aim is to measure the growth in the web. robot-history: robot-environment: research modified-date: modified-by: robot-id:webbandit robot-name:WebBandit Web Spider robot-cover-url:http://pw2.netcom.com/~wooger/ robot-details-url:http://pw2.netcom.com/~wooger/ robot-owner-name:Jerry Walsh robot-owner-url:http://pw2.netcom.com/~wooger/ robot-owner-email:wooger@ix.netcom.com robot-status:active robot-purpose:Resource Gathering / Server Benchmarking robot-type:standalone application robot-platform:Intel - windows95 robot-availability:source, binary robot-exclusion:no robot-exclusion-useragent:WebBandit/1.0 robot-noindex:no robot-host:ix.netcom.com robot-from:no robot-useragent:WebBandit/1.0 robot-language:C++ robot-description:multithreaded, hyperlink-following, resource finding webspider robot-history:Inspired by reading of Internet Programming book by Jamsa/Cope robot-environment:commercial modified-date:11/21/96 modified-by:Jerry Walsh robot-id: webcatcher robot-name: WebCatcher robot-cover-url: http://oscar.lang.nagoya-u.ac.jp robot-details-url: robot-owner-name: Reiji SUZUKI robot-owner-url: http://oscar.lang.nagoya-u.ac.jp/~reiji/index.html robot-owner-email: reiji@infonia.ne.jp robot-owner-name2: Masatoshi SUGIURA robot-owner-url2: http://oscar.lang.nagoya-u.ac.jp/~sugiura/index.html robot-owner-email2: sugiura@lang.nagoya-u.ac.jp robot-status: development robot-purpose: indexing robot-type: standalone robot-platform: unix, windows, mac robot-availability: none robot-exclusion: yes robot-exclusion-useragent: webcatcher robot-noindex: no robot-host: oscar.lang.nagoya-u.ac.jp robot-from: no robot-useragent: WebCatcher/1.0 robot-language: perl5 robot-description: WebCatcher gathers web pages that Japanese collage students want to visit. robot-history: This robot finds its roots in a research project at Nagoya University in 1998. robot-environment: research modified-date: Fri, 16 Oct 1998 17:28:52 JST modified-by: "Reiji SUZUKI" robot-id: webcopy robot-name: WebCopy robot-cover-url: http://www.inf.utfsm.cl/~vparada/webcopy.html robot-details-url: robot-owner-name: Victor Parada robot-owner-url: http://www.inf.utfsm.cl/~vparada/ robot-owner-email: vparada@inf.utfsm.cl robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: * robot-from: no robot-useragent: WebCopy/(version) robot-language: perl 4 or perl 5 robot-description: Its purpose is to perform mirroring. WebCopy can retrieve files recursively using HTTP protocol.It can be used as a delayed browser or as a mirroring tool. It cannot jump from one site to another. robot-history: robot-environment: modified-date: Sun Jul 2 15:27:04 1995 modified-by: robot-id: webfetcher robot-name: webfetcher robot-cover-url: http://www.ontv.com/ robot-details-url: robot-owner-name: robot-owner-url: http://www.ontv.com/ robot-owner-email: webfetch@ontv.com robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: * robot-from: yes robot-useragent: WebFetcher/0.8, robot-language: C++ robot-description: don't wait! OnTV's WebFetcher mirrors whole sites down to your hard disk on a TV-like schedule. Catch w3 documentation. Catch discovery.com without waiting! A fully operational web robot for NT/95 today, most UNIX soon, MAC tomorrow. robot-history: robot-environment: modified-date: Sat Jan 27 10:31:43 1996. modified-by: robot-id: webfoot robot-name: The Webfoot Robot robot-cover-url: robot-details-url: robot-owner-name: Lee McLoughlin robot-owner-url: http://web.doc.ic.ac.uk/f?/lmjm robot-owner-email: L.McLoughlin@doc.ic.ac.uk robot-status: robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: phoenix.doc.ic.ac.uk robot-from: robot-useragent: robot-language: robot-description: robot-history: First spotted in Mid February 1994 robot-environment: modified-date: modified-by: robot-id: weblayers robot-name: weblayers robot-cover-url: http://www.univ-paris8.fr/~loic/weblayers/ robot-details-url: robot-owner-name: Loic Dachary robot-owner-url: http://www.univ-paris8.fr/~loic/ robot-owner-email: loic@afp.com robot-status: robot-purpose: maintainance robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: weblayers/0.0 robot-language: perl 5 robot-description: Its purpose is to validate, cache and maintain links. It is designed to maintain the cache generated by the emacs emacs w3 mode (N*tscape replacement) and to support annotated documents (keep them in sync with the original document via diff/patch). robot-history: robot-environment: modified-date: Fri Jun 23 16:30:42 FRE 1995 modified-by: robot-id: weblinker robot-name: WebLinker robot-cover-url: http://www.cern.ch/WebLinker/ robot-details-url: robot-owner-name: James Casey robot-owner-url: http://www.maths.tcd.ie/hyplan/jcasey/jcasey.html robot-owner-email: jcasey@maths.tcd.ie robot-status: robot-purpose: maintenance robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: robot-host: robot-from: robot-useragent: WebLinker/0.0 libwww-perl/0.1 robot-language: robot-description: it traverses a section of web, doing URN->URL conversion. It will be used as a post-processing tool on documents created by automatic converters such as LaTeX2HTML or WebMaker. At the moment it works at full speed, but is restricted to localsites. External GETs will be added, but these will be running slowly. WebLinker is meant to be run locally, so if you see it elsewhere let the author know! robot-history: robot-environment: modified-date: modified-by: robot-id: webmirror robot-name: WebMirror robot-cover-url: http://www.winsite.com/pc/win95/netutil/wbmiror1.zip robot-details-url: robot-owner-name: Sui Fung Chan robot-owner-url: http://www.geocities.com/NapaVally/1208 robot-owner-email: sfchan@mailhost.net robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: Windows95 robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: no robot-language: C++ robot-description: It download web pages to hard drive for off-line browsing. robot-history: robot-environment: modified-date: Mon Apr 29 08:52:25 1996. modified-by: robot-id: webmoose robot-name: The Web Moose robot-cover-url: robot-details-url: http://www.nwlink.com/~mikeblas/webmoose/ robot-owner-name: Mike Blaszczak robot-owner-url: http://www.nwlink.com/~mikeblas/ robot-owner-email: mikeblas@nwlink.com robot-status: development robot-purpose: statistics, maintenance robot-type: standalone robot-platform: Windows NT robot-availability: data robot-exclusion: no robot-exclusion-useragent: WebMoose robot-noindex: no robot-host: msn.com robot-from: no robot-useragent: WebMoose/0.0.0000 robot-language: C++ robot-description: This robot collects statistics and verifies links. It builds an graph of its visit path. robot-history: This robot is under development. It will support ROBOTS.TXT soon. robot-environment: hobby modified-date: Fri, 30 Aug 1996 00:00:00 GMT modified-by: Mike Blaszczak robot-id:webquest robot-name:WebQuest robot-cover-url: robot-details-url: robot-owner-name:TaeYoung Choi robot-owner-url:http://www.cosmocyber.co.kr:8080/~cty/index.html robot-owner-email:cty@cosmonet.co.kr robot-status:development robot-purpose:indexing robot-type:standalone robot-platform:unix robot-availability:none robot-exclusion:yes robot-exclusion-useragent:webquest robot-noindex:no robot-host:210.121.146.2, 210.113.104.1, 210.113.104.2 robot-from:yes robot-useragent:WebQuest/1.0 robot-language:perl5 robot-description:WebQuest will be used to build the databases for various web search service sites which will be in service by early 1998. Until the end of Jan. 1998, WebQuest will run from time to time. Since then, it will run daily(for few hours and very slowly). robot-history:The developent of WebQuest was motivated by the need for a customized robot in various projects of COSMO Information & Communication Co., Ltd. in Korea. robot-environment:service modified-date:Tue, 30 Dec 1997 09:27:20 GMT modified-by:TaeYoung Choi robot-id: webreader robot-name: Digimarc MarcSpider robot-cover-url: http://www.digimarc.com/prod_fam.html robot-details-url: http://www.digimarc.com/prod_fam.html robot-owner-name: Digimarc Corporation robot-owner-url: http://www.digimarc.com robot-owner-email: wmreader@digimarc.com robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windowsNT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: robot-noindex: robot-host: 206.102.3.* robot-from: yes robot-useragent: Digimarc WebReader/1.2 robot-language: c++ robot-description: Examines image files for watermarks. In order to not waste internet bandwidth with yet another crawler, we have contracted with one of the major crawlers/seach engines to provide us with a list of specific URLs of interest to us. If an URL is to an image, we may read the image, but we do not crawl to any other URLs. If an URL is to a page of interest (ususally due to CGI), then we access the page to get the image URLs from it, but we do not crawl to any other pages. robot-history: First operation in August 1997. robot-environment: service modified-date: Mon, 20 Oct 1997 16:44:29 GMT modified-by: Brian MacIntosh robot-id: webreaper robot-name: WebReaper robot-cover-url: http://www.otway.com/webreaper robot-details-url: robot-owner-name: Mark Otway robot-owner-url: http://www.otway.com robot-owner-email: webreaper@otway.com robot-status: active robot-purpose: indexing/offline browsing robot-type: standalone robot-platform: windows95, windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: webreaper robot-noindex: no robot-host: * robot-from: no robot-useragent: WebReaper [webreaper@otway.com] robot-language: c++ robot-description: Freeware app which downloads and saves sites locally for offline browsing. robot-history: Written for personal use, and then distributed to the public as freeware. robot-environment: hobby modified-date: Thu, 25 Mar 1999 15:00:00 GMT modified-by: Mark Otway robot-id: webs robot-name: webs robot-cover-url: http://webdew.rnet.or.jp/ robot-details-url: http://webdew.rnet.or.jp/service/shank/NAVI/SEARCH/info2.html#robot robot-owner-name: Recruit Co.Ltd, robot-owner-url: robot-owner-email: dew@wwwadmin.rnet.or.jp robot-status: active robot-purpose: statistics robot-type: standalone robot-platform: unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: webs robot-noindex: no robot-host: lemon.recruit.co.jp robot-from: yes robot-useragent: webs@recruit.co.jp robot-language: perl5 robot-description: The webs robot is used to gather WWW servers' top pages last modified date data. Collected statistics reflects the priority of WWW server data collection for webdew indexing service. Indexing in webdew is done by manually. robot-history: robot-environment: service modified-date: Fri, 6 Sep 1996 10:00:00 GMT modified-by: robot-id: websnarf robot-name: Websnarf robot-cover-url: robot-details-url: robot-owner-name: Charlie Stross robot-owner-url: robot-owner-email: charles@fma.com robot-status: retired robot-purpose: robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: robot-useragent: robot-language: robot-description: robot-history: robot-environment: modified-date: modified-by: robot-id: webspider robot-name: WebSpider robot-details-url: http://www.csi.uottawa.ca/~u610468 robot-cover-url: robot-owner-name: Nicolas Fraiji robot-owner-email: u610468@csi.uottawa.ca robot-status: active, under further enhancement. robot-purpose: maintenance, link diagnostics robot-type: standalone robot-exclusion: yes robot-noindex: no robot-exclusion-useragent: webspider robot-host: several robot-from: Yes robot-language: Perl4 robot-history: developped as a course project at the University of Ottawa, Canada in 1996. robot-environment: Educational use and Research robot-id: webvac robot-name: WebVac robot-cover-url: http://www.federated.com/~tim/webvac.html robot-details-url: robot-owner-name: Tim Jensen robot-owner-url: http://www.federated.com/~tim robot-owner-email: tim@federated.com robot-status: robot-purpose: mirroring robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: robot-host: robot-from: no robot-useragent: webvac/1.0 robot-language: C++ robot-description: robot-history: robot-environment: modified-date: Mon May 13 03:19:17 1996. modified-by: robot-id: webwalk robot-name: webwalk robot-cover-url: robot-details-url: robot-owner-name: Rich Testardi robot-owner-url: robot-owner-email: robot-status: retired robot-purpose: indexing, maintentance, mirroring, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: yes robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: yes robot-useragent: webwalk robot-language: c robot-description: Its purpose is to generate a Resource Discovery database, validate links, validate HTML, perform mirroring, copy document trees, and generate statistics. Webwalk is easily extensible to perform virtually any maintenance function which involves web traversal, in a way much like the '-exec' option of the find(1) command. Webwalk is usually used behind the HP firewall robot-history: robot-environment: modified-date: Wed Nov 15 09:51:59 PST 1995 modified-by: robot-id: webwalker robot-name: WebWalker robot-cover-url: robot-details-url: robot-owner-name: Fah-Chun Cheong robot-owner-url: http://www.cs.berkeley.edu/~fccheong/ robot-owner-email: fccheong@cs.berkeley.edu robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: WebWalker robot-noindex: no robot-host: * robot-from: yes robot-useragent: WebWalker/1.10 robot-language: perl4 robot-description: WebWalker performs WWW traversal for individual sites and tests for the integrity of all hyperlinks to external sites. robot-history: A Web maintenance robot for expository purposes, first published in the book "Internet Agents: Spiders, Wanderers, Brokers, and Bots" by the robot's author. robot-environment: hobby modified-date: Thu, 25 Jul 1996 16:00:52 PDT modified-by: Fah-Chun Cheong robot-id: webwatch robot-name: WebWatch robot-cover-url: http://www.specter.com/users/janos/specter robot-details-url: robot-owner-name: Joseph Janos robot-owner-url: http://www.specter.com/users/janos/specter robot-owner-email: janos@specter.com robot-status: robot-purpose: maintainance, statistics robot-type: standalone robot-platform: robot-availability: robot-exclusion: no robot-exclusion-useragent: robot-noindex: no robot-host: robot-from: no robot-useragent: WebWatch robot-language: c++ robot-description: Its purpose is to validate HTML, and generate statistics. Check URLs modified since a given date. robot-history: robot-environment: modified-date: Wed Jul 26 13:36:32 1995 modified-by: robot-id: wget robot-name: Wget robot-cover-url: ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/ robot-details-url: robot-owner-name: Hrvoje Niksic robot-owner-url: robot-owner-email: hniksic@srce.hr robot-status: development robot-purpose: mirroring, maintenance robot-type: standalone robot-platform: unix robot-availability: source robot-exclusion: yes robot-exclusion-useragent: wget robot-noindex: no robot-host: * robot-from: yes robot-useragent: Wget/1.4.0 robot-language: C robot-description: Wget is a utility for retrieving files using HTTP and FTP protocols. It works non-interactively, and can retrieve HTML pages and FTP trees recursively. It can be used for mirroring Web pages and FTP sites, or for traversing the Web gathering data. It is run by the end user or archive maintainer. robot-history: robot-environment: hobby, research modified-date: Mon, 11 Nov 1996 06:00:44 MET modified-by: Hrvoje Niksic robot-id: whowhere robot-name: WhoWhere Robot robot-cover-url: http://www.whowhere.com robot-details-url: robot-owner-name: Rupesh Kapoor robot-owner-url: robot-owner-email: rupesh@whowhere.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: Sun Unix robot-availability: none robot-exclusion: yes robot-exclusion-useragent: whowhere robot-noindex: no robot-host: spica.whowhere.com robot-from: no robot-useragent: robot-language: C/Perl robot-description: Gathers data for email directory from web pages robot-history: robot-environment: commercial modified-date: modified-by: robot-id: wmir robot-name: w3mir robot-cover-url: http://www.ifi.uio.no/~janl/w3mir.html robot-details-url: robot-owner-name: Nicolai Langfeldt robot-owner-url: http://www.ifi.uio.no/~janl/w3mir.html robot-owner-email: w3mir-core@usit.uio.no robot-status: robot-purpose: mirroring. robot-type: standalone robot-platform: UNIX, WindowsNT robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: robot-from: yes robot-useragent: w3mir robot-language: Perl robot-description: W3mir uses the If-Modified-Since HTTP header and recurses only the directory and subdirectories of it's start document. Known to work on U*ixes and Windows NT. robot-history: robot-environment: modified-date: Wed Apr 24 13:23:42 1996. modified-by: robot-id: wolp robot-name: WebStolperer robot-cover-url: http://www.suchfibel.de/maschinisten robot-details-url: http://www.suchfibel.de/maschinisten/text/werkzeuge.htm (in German) robot-owner-name: Marius Dahler robot-owner-url: http://www.suchfibel.de/maschinisten robot-owner-email: mda@suchfibel.de robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix, NT robot-availability: none robot-exclusion: yes robot-exclusion-useragent: WOLP robot-noindex: yes robot-host: www.suchfibel.de robot-from: yes robot-useragent: WOLP/1.0 mda/1.0 robot-language: perl5 robot-description: The robot gathers information about specified web-projects and generates knowledge bases in Javascript or an own format robot-environment: hobby modified-date: 22 Jul 1998 modified-by: Marius Dahler robot-id: wombat robot-name: The Web Wombat robot-cover-url: http://www.intercom.com.au/wombat/ robot-details-url: robot-owner-name: Internet Communications robot-owner-url: http://www.intercom.com.au/ robot-owner-email: phill@intercom.com.au robot-status: robot-purpose: indexing, statistics. robot-type: robot-platform: robot-availability: robot-exclusion: no. robot-exclusion-useragent: robot-noindex: robot-host: qwerty.intercom.com.au robot-from: no robot-useragent: no robot-language: IBM Rexx/VisualAge C++ under OS/2. robot-description: The robot is the basis of the Web Wombat search engine (Australian/New Zealand content ONLY). robot-history: robot-environment: modified-date: Thu Feb 29 00:39:49 1996. modified-by: robot-id: worm robot-name: The World Wide Web Worm robot-cover-url: http://www.cs.colorado.edu/home/mcbryan/WWWW.html robot-details-url: robot-owner-name: Oliver McBryan robot-owner-url: http://www.cs.colorado.edu/home/mcbryan/Home.html robot-owner-email: mcbryan@piper.cs.colorado.edu robot-status: robot-purpose: indexing robot-type: robot-platform: robot-availability: robot-exclusion: robot-exclusion-useragent: robot-noindex: no robot-host: piper.cs.colorado.edu robot-from: robot-useragent: robot-language: robot-description: indexing robot, actually has quite flexible search options robot-history: robot-environment: modified-date: modified-by: robot-id: wwwc robot-name: WWWC Ver 0.2.5 robot-cover-url: http://www.kinet.or.jp/naka/tomo/wwwc.html robot-details-url: robot-owner-name: Tomoaki Nakashima. robot-owner-url: http://www.kinet.or.jp/naka/tomo/ robot-owner-email: naka@kinet.or.jp robot-status: active robot-purpose: maintenance robot-type: standalone robot-platform: windows, windows95, windowsNT robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: WWWC robot-noindex: no robot-host: robot-from: yes robot-useragent: WWWC/0.25 (Win95) robot-language: c robot-description: robot-history: 1997 robot-environment: hobby modified-date: Tuesday, 18 Feb 1997 06:02:47 GMT modified-by: Tomoaki Nakashima (naka@kinet.or.jp) robot-id: wz101 robot-name: WebZinger robot-details-url: http://www.imaginon.com/wzindex.html robot-cover-url: http://www.imaginon.com robot-owner-name: ImaginOn, Inc robot-owner-url: http://www.imaginon.com robot-owner-email: info@imaginon.com robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: windows95, windowsNT 4, mac, solaris, unix robot-availability: binary robot-exclusion: no robot-exclusion-useragent: none robot-noindex: no robot-host: http://www.imaginon.com/wzindex.html * robot-from: no robot-useragent: none robot-language: java robot-description: commercial Web Bot that accepts plain text queries, uses webcrawler, lycos or excite to get URLs, then visits sites. If the user's filter parameters are met, downloads one picture and a paragraph of test. Playsback slide show format of one text paragraph plus image from each site. robot-history: developed by ImaginOn in 1996 and 1997 robot-environment: commercial modified-date: Wed, 11 Sep 1997 02:00:00 GMT modified-by: schwartz@imaginon.com robot-id: xget robot-name: XGET robot-cover-url: http://www2.117.ne.jp/~moremore/x68000/soft/soft.html robot-details-url: http://www2.117.ne.jp/~moremore/x68000/soft/soft.html robot-owner-name: Hiroyuki Shigenaga robot-owner-url: http://www2.117.ne.jp/~moremore/ robot-owner-email: shige@mh1.117.ne.jp robot-status: active robot-purpose: mirroring robot-type: standalone robot-platform: X68000, X68030 robot-availability: binary robot-exclusion: yes robot-exclusion-useragent: XGET robot-noindex: no robot-host: * robot-from: yes robot-useragent: XGET/0.7 robot-language: c robot-description: Its purpose is to retrieve updated files.It is run by the end userrobot-history: 1997 robot-environment: hobby modified-date: Fri, 07 May 1998 17:00:00 GMT modified-by: Hiroyuki Shigenaga robot-id: Nederland.zoek robot-name: Nederland.zoek robot-cover-url: http://www.nederland.net/ robot-details-url: robot-owner-name: System Operator Nederland.net robot-owner-url: robot-owner-email: zoek@nederland.net robot-status: active robot-purpose: indexing robot-type: standalone robot-platform: unix (Linux) robot-availability: none robot-exclusion: yes robot-exclusion-useragent: Nederland.zoek robot-noindex: no robot-host: 193.67.110.* robot-from: yes robot-useragent: Nederland.zoek robot-language: c robot-description: This robot indexes all .nl sites for the search-engine of Nederland.net robot-history: Developed at Computel Standby in Apeldoorn, The Netherlands robot-environment: service modified-date: Sat, 8 Feb 1997 01:10:00 CET modified-by: Sander Steffann