RE: How to get the document info ?

Howard, Dan: CIO (howard.dan@ic.gc.ca)
Fri, 14 Feb 1997 12:02:49 -0500


>I would like get the "document info" of HTML page with a perl program.

"Document info" isn't so much associated with an HTML page as it is
associated with a *URL*, since it is generated by the HTTP server that
serves the page. Some of it may even be generated by your browser.

The following small program may be the sort of thing you're looking for.
You'll need to download and install libwww-perl (LWP) off of a CPAN server
before it will work.

> #!/usr/local/bin/perl5
> use LWP::UserAgent;
> $ua = new LWP::UserAgent;
> $request = new HTTP::Request('HEAD', 'http://crrm.univ-mrs.fr/');
> $response = $ua->request($request);
> print $response->as_string();

Here is the program's output:

--- HTTP::Response=HASH(0x2509b8) ---
RC: 200 (OK)
Message: Document follows

Date: Friday, 14-Feb-97 16:16:05 GMT
Server: CERN/3.0
Content-Length: 3552
Content-Type: text/html
Last-Modified: Thursday, 30-Jan-97 13:06:54 GMT
Client-Date: Fri, 14 Feb 1997 16:30:48 GMT
MIME-Version: 1.0

-----------------------------------

Note that you don't always get returned the exact same field names.

Good luck!

Dan Howard
Ottawa, Ontario

_________________________________________________
This messages was sent by the robots mailing list. To unsubscribe, send mail
to robots-request@webcrawler.com with the word "unsubscribe" in the body.
For more info see http://info.webcrawler.com/mak/projects/robots/robots.html