Re: VB. page grabber...

Ed Carp (erc@dal1820.computek.net)
Mon, 6 May 1996 11:43:28 -0500 (CDT)


> I second you on this. With Web-enabled tools coming out, a lot of people =
> will start writing their own little badbots and I bet the traffic on the =
> net is yet again going to have a boost.
>
> Marc
>
> PS: Could I have references of that VB. freeware you are talking about ? =
> Sound like something I'd like to get my hand on...

Here's a little something I wrote in C called "gethtml" that gets html
documents from a server. It's quite simple, actually - just connect to
port 80 on your victim's machine, then issue the "GET ..." command. For
example, to get "/iwin/us/allwarnings.html" from iwin.nws.noaa.gov (this
is a page listing all the current watches and warnings issued by NWS),
you'd connect to port 80 on iwin.nws.noaa.gov and say "GET
/iwin/us/allwarnings.html". Capture the resulting HTML.

The implementation in VB (VERY easy!) is left as an exercise for the
student ;)

This could be used to, say, fetch pages from a server, parse the page for
<A HREF> references to other pages, then fetch those pages, etc. A
simple robot. Of course, you have to make sure you haven't fetched that
page before (getting yourself in a fetch loop), you have to save away
and/or index the text, etc.

This code is for linux, but I've used it on Solaris and other systems before.

----------------------------------- cut here -------------------------------
/*
* gethtml - get HTML document (specified in argv[2]) from port 80 at site argv[1]
* Copyright 1996, Ed Carp (ecarp@netcom.com). Commercial use prohibited
* without prior arrangement. Non-commercial use permitted.
*/

#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
#include <netinet/in.h>
#include <errno.h>

int sendbyuucp, sockfd;
int readcnt;
char cmd[128], *nameptr;
struct sockaddr_in address;
struct hostent *hi;
struct in_addr *aptr;
FILE *in, *log;
int i;

main (argc, argv)
int argc;
char **argv;
{
if (argc < 3)
{
printf ("usage: %s site document\n", argv[0]);
exit (1);
}
hi = gethostbyname (argv[1]);
if (hi == NULL)
{
printf ("can't get host by name '%s' - skipped\n", nameptr);
exit (1);
}
sockfd = socket (AF_INET, SOCK_STREAM, 0);
if (sockfd == EOF)
{
perror ("socket");
printf ("can't do socket for '%s' - skipped - error code=%d\n", nameptr, errno);
exit (1);
}
address.sin_family = AF_INET;
address.sin_port = htons (80);
aptr = (struct in_addr *) *(hi->h_addr_list);
address.sin_addr = *aptr;
if (connect (sockfd, (struct sockaddr *) &address, sizeof (address)) == EOF)
{
perror ("connect");
printf ("can't do connect for '%s' - skipped - error code=%d\n", nameptr, errno);
exit (1);
}
sprintf (cmd, "GET %s\r\n", argv[2]);
write (sockfd, cmd, strlen (cmd));
while ((readcnt = read (sockfd, cmd, 127)) > 0)
write (1, cmd, readcnt);
close (sockfd);
fclose (in);
exit (0);
}

--
Ed Carp, N7EKG    			Ed.Carp@linux.org, ecarp@netcom.com
					214/993-3935 voicemail/digital pager
Finger ecarp@netcom.com for PGP 2.5 public key		an88744@anon.penet.fi

"Past the wounds of childhood, past the fallen dreams and the broken families, through the hurt and the loss and the agony only the night ever hears, is a waiting soul. Patient, permanent, abundant, it opens its infinite heart and asks only one thing of you ... 'Remember who it is you really are.'"

-- "Losing Your Mind", Karen Alexander and Rick Boyes

The mark of a good conspiracy theory is its untestability. -- Andrew Spring