Perl's libwww: https downloads hang from time to time
Is it just me, or do (Perl) https downloads hang from time to time?
<--break-->
I can reproduce the problem, but can't get any more information than this on it. I'm crawling a whole load of web pages, which is working quite sucessfully. However, every once in a while, the "fetcher" (which is a Perl LWP::UserAgent script) hangs indefinitely. "strace" shows it's blocking in a read(), which lsof reveals is a socket that's connected to port 443 on the remote machine.
There seem to be two problems here - the remote machine seems happy to leave the connection open for days, but more importantly, the client doesn't seem to have a timeout on the read().
I've had a look in the Perl module code, but can't see a point where this could happen. I'm about to resort to semi-unnecessary select() calls before any sysread() calls to try and track down the problem.
Stuff in use: Fedora 1 (up2date to about June 2004) and libwww-5.79.
Perl's libwww: https downloads hang from time to time
You can see that sort of behaviour in a normal browser (IE, Netscape, etc...) when the server does not provide a document size properly. We've had the problem when generating PDF files for upload for example. So it might not be a perl specific problem. One thing you could do is get the HTTP headers from offending pages and see if all headers are supplied correctly.
Perl's libwww: https downloads hang from time to time
Ahh, the old "don't trust what you're told by a remote machine" thing, eh? Usually good advice(!).
My problem looks to be pretty low-level. I'm now not so certain a select() wrapper will solve this, so may have to resort to alarm() timeouts or something.
Perl's libwww: https downloads hang from time to time
I think I've semi-solved this problem. There's actually a bug for it at Cpan. I've added a reply there with a kludge fix.