Perl's libwww: https downloads hang from time to time

Is it just me, or do (Perl) https downloads hang from time to time?
<--break-->
I can reproduce the problem, but can't get any more information than this on it. I'm crawling a whole load of web pages, which is working quite sucessfully. However, every once in a while, the "fetcher" (which is a Perl LWP::UserAgent script) hangs indefinitely. "strace" shows it's blocking in a read(), which lsof reveals is a socket that's connected to port 443 on the remote machine.

There seem to be two problems here - the remote machine seems happy to leave the connection open for days, but more importantly, the client doesn't seem to have a timeout on the read().

I've had a look in the Perl module code, but can't see a point where this could happen. I'm about to resort to semi-unnecessary select() calls before any sysread() calls to try and track down the problem.

Stuff in use: Fedora 1 (up2date to about June 2004) and libwww-5.79.

Comment viewing options
Select your preferred way to display the comments and click "Save settings" to activate your changes.

Perl's libwww: https downloads hang from time to time


You can see that sort of behaviour in a normal browser (IE, Netscape, etc...) when the server does not provide a document size properly. We've had the problem when generating PDF files for upload for example. So it might not be a perl specific problem. One thing you could do is get the HTTP headers from offending pages and see if all headers are supplied correctly.

Perl's libwww: https downloads hang from time to time


Ahh, the old "don't trust what you're told by a remote machine" thing, eh? Usually good advice(!).
My problem looks to be pretty low-level. I'm now not so certain a select() wrapper will solve this, so may have to resort to alarm() timeouts or something.

Perl's libwww: https downloads hang from time to time


I think I've semi-solved this problem. There's actually a bug for it at Cpan. I've added a reply there with a kludge fix.

Comment viewing options
Select your preferred way to display the comments and click "Save settings" to activate your changes.
Post new comment
The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <img>
  • Lines and paragraphs break automatically.
  • Images can be added to this post.

More information about formatting options