Altavista indexing password files

John Messerly (johnms@MICROSOFT.com)
Wed, 28 Feb 1996 19:54:52 -0800


#1 may be interesting to the readership of this list.

>----------
>The Risks Digest Volume 17: Issue 70
>
> Thursday 8 February 1996
>
>
>Risks of web robots
>
>Joe A. Dellinger <jdellinger@amoco.com>
>Sun, 4 Feb 96 20:50:03 CST
>
>Here are three risks of "web robots" I've run across recently that I
>think Risks readers might find interesting.
>1 The first is probably already well known to Risks readers: password
> files accidentally being exported to the world. Web servers are just
>yet
> another way of making that mistake.
>
> Here is a post that has already had wide circulation (and may have
> already appeared in Risks... I'm unable to scan back issues to check
> right now because of heavy network load):
>
> >Subject: BoS: Misconfigured Web Servers
> >
> > A friend of mine showed me a nasty little "trick" over the weekend.
>He
> > went to a Web Search server (http://www.altavista.digital.com/) and
> > did a search on the following keywords -
> >
> > root: 0:0 sync: bin: daemon:
> >
> > You get the idea. He copied out several encrypted root passwords
>from
> > passwd files, launched CrackerJack and a 1/2 MB word file and had a
> > root password in under 30 minutes. All without accessing the site's
> > server, just the index on a web search server!
> >
> .... >
> > The guy that showed me this found it funny, but I find it
>disturbing.
> > Are there that many sites that are that poorly configured?
> >
> > Mark_W_Loveless@smtp.bnr.com
>
> I just verified that indeed this search does work, although to my
>relief
> the majority of the "hits" found are legitimate documents discussing
> UNIX security. The risks are fairly obvious.
>
> 1Here is a variation on the above risk that I HAVEN'T seen discussed
> before, however. See what happens if you search AltaVista for THESE
> keywords:
>
> "unpublished proprietary source code actual intended reserved
>copyright
> notice"
>
> The results of this search are even more frightening, at least to me.
>
>
> The general risk is not just that you can conveniently find password
> files, but ANY kind of document that shouldn't be widely distributed:
>
> material useful for breaking into your system, copyrighted material,
> illegal material, libelous material, incriminating or embarrassing
> material, etc...
>
>2 The second risk works the other way: fooling stupid web robots so
> as to lure people to your web site.
>
> A month ago I tried searching for "eisner reciprocity paradox" on
> WebCrawler, hoping to find that it had indexed a paper of mine that I
> had reprinted electronically under my home page. Nope, it hadn't (or
>at
> least I was unable to find it using any of the likely keywords I could
>
> think of!). Instead the single match was on a URL intriguingly
>entitled
> "The information source".
>
> Gee, this "information source" must have an article in it about
>Eisner's
> Reciprocity Paradox, one that I hadn't known of before! So I followed
> the link, and ended up at something unexpected: "
> http://www.graviton.com/red/", "The Red Herring Home Page"! (It comes
> complete with gifs of red fish!)
>
> A little experimentation revealed that almost ANY obscure search would
>
> match "The information source", often as the only matching document
> found. As near as I could figure out, his site recognized probes by
>web
> robots and then threw a dictionary at them! (His point made, he has
> since stopped, although the Red Herring page is still there for your
> perusal.)
>
> I contacted the author, Tom White, and asked for more details. He
>didn't
> want to give his secrets away, but did reply:
>
> > I will say that I spent no more than an hour on the whole thing,
> including
> > writing the page, and it was effective far beyond what I thought a
> silly
> > trick like that would muster. I think that by virtue of not hiding
> what
> > I am trying to do, people who write web indexers may see the page
>and
> think
> > of ways to subvert feeble attempts like mine - which is a good thing
>
> since
> > the page could have as easily been any propaganda I wanted to push
>on
> people.
>
> The risk? It can be frustratingly difficult (or impossible) to get a
>web
> robot's attention for a legitimate page you WANT indexed, or to find a
>
> page you know is there amist all the distractions of "false hits".
>Part
> of the clutter may be wildly off-topic pages engineered to fool web
> robots into thinking that almost anything matches them. (Or simply
>long
> rambling pages containing lots of poems and such... documents that
> "fool" the robots more by accident than design.)
>
>3 Finally, the act of being searched can cause problems for certain
>kinds
> of sites: ones that carry hundreds of thousands of distinct URLs,
>often
> generated only on demand, and that don't expect any one site to ever
> have reason to download ALL of them, whether all at once or a few at a
>
> time.
>
> See for example "http://xxx.lanl.gov/RobotsBeware.html". The authors
> state there: "This www server has been under all-too-frequent attack
> from `intelligent agents' (a.k.a. `robots') that mindlessly download
> every link encountered, ultimately trying to access the entire
>database
> through the listings links. In most cases, these processes are run by
> well-intentioned but thoughtless neophytes, ignorant of common sense
> guidelines."
>
> They have been forced to take a "proactive" stance to protect
> themselves: "We are not willing to play sitting duck to this
>nonsensical
> method of `indexing' information." The rather UNIQUE hot link that
> follows, "(Click here to initiate automated `seek-and-destroy' against
>
> your site.)", doesn't actually do anything but pause for 30 seconds,
>I'm
> told...
>
> I'll let readers examine the page and draw their own Risks!
>