Re: Java Robot

L a r r y P a g e (page@CS.Stanford.EDU)
Tue, 25 Jun 1996 14:32:07 -0700


>> FYI Java will not allow you to set the User-Agent

I think it is possible to set a system property which has the user-agent
value at runtime. However, it's not documented. Besides, you'll probably
want to write your own HTTP parser anyways because the Java one is not very
capable. The problem is that the HTTP class is a binary, and you can't
rewrite parts of it because you don't have source.

Major parts of my Java robot, are now reimplemented in Python. This is
because Java doesn't have mutithreaded IO. So, it is impossible to write a
fast robot, because it blocks all your other IO threads on a slow socket
read or open. There is no way around this problem except running many
processes, which use lots of memory.

So, it is possible to write a real robot completely in Java -- I've
collected serveral million pages with it. But, there are some real
roadblocks, and you should strongly consider another language.

-Larry