JIComTransport fix for file descriptor leak
Brought to you by:
vikramrc
I'm attaching a modified version of the JIComTransport class (as well as a patch of the same) that fixes the file descriptor leak we're seeing when blocking reads are down without specifically creating a Selector. As far as I can determine, the external
behavior of the class is unchanged.
I've testing this by running a new WMI request every 5 seconds in a new thread for 100 iterations. The file descriptor count has been constant, aside from a temporary increase when the read is actually occurring.
Re-written version of the class
Unified diff of the changes
Hi,
How is your patch holding ? Have you done any long running tests on it ? How does it hold under heavy usage ?
thanks,
best regards,
Vikram
I ran the test code over the weekend (about 250,000 queries) and did not see any file descriptor leaks. In our application there does not seem to be any performance impact with managing the selector ourselves rather than letting the Sun code cache it until it is GC'd.
However, the evaluation in this issue:
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5083450
make me think that some more detailed performance testing on Windows is required. I've got that running now.
Hi,
Okay good. A few more :-
1. What version of Java are you testing it with ?
2. Would you also be testing this on other platforms ?
3. We have a build ready , but I think we should include this patch in it too , (provided it holds for other FA customers also), we can send you the source for the next release , could you at your end merge this change and try the long running tests ? I don't think that particular file has any changes from our end.
thanks,
best regards,
Vikram
1. I'm using JDK 1.6.0_21 on Linux, JDK 1.6.0_04 on WIndows.
2. I've not tested this on other platform yet but I can do that fairly easily. I'll throw Solaris, HP-UX and AIX into the mix and see what happens.
3. Sure, I can merge in my changes quite easily and toss the new code into my tests. Would it be easier if I just grabbed trunk out of the SourceForge SVN server?
Added:
1. Solaris 10 (sparc), JDK 1.6.0_21
2. AIX 5.2, JDK 1.6.0.3260.1
3. HP-UX 11.11 (PA-RISC), JDK 1.6.0.6.1
to the mix and run my test code for about 2 hours (at about 1400 requests each) and did not encounter any problems. The processes also did not consume extra file descriptors except when actually making a query.
Hi,
Sent a mail to your SF address with the updated srcs. Please do perform the tests over all environments (including long running).
thanks,
best regards,
Vikram
I've been running the updated code for about 6 hours on Windows XP, Solaris 10, and Linux and all looks good. I'll leave the tests running overnight and see what happens.
Great ! Let me know if it held up ...
best regards,
Vikram
The overnight test with the new code + patch revealed no problems. Tested for 24 hours on:
Windows XP - JDK 1.6.0_21
Solaris 10 - JDK 1.6.0_21
Linux - JDK 1.6.0_21
AIX 5.2 - JDK 1.6.0.3260.1
Hi,
One of th e users has reported a PDU Assembly exception while testing a jar with this patch. If I may ask, why are you not applying the timeout directly on the socket ? And keep read function as it was before. As I understand we loose two descriptors during a "connect" and rest of the operations are blocking in nature.
thanks,
best regards,
Vikram
Putting the timeout on the socket causes the JDK to cache 2 (pipe) file descriptors in a thread local. This means that the descriptors remain open and cached until the thread is GC'd. This is what was causing the file descriptor leak that was killing our app.
Do you have a link to the actual exception stack trace?