JIComTransport fix for file descriptor leak

Brought to you by: vikramrc

#4 JIComTransport fix for file descriptor leak

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2012-11-29

Created: 2010-08-06

Creator: Robert Clark

Private: No

I'm attaching a modified version of the JIComTransport class (as well as a patch of the same) that fixes the file descriptor leak we're seeing when blocking reads are down without specifically creating a Selector. As far as I can determine, the external
behavior of the class is unchanged.

I've testing this by running a new WMI request every 5 seconds in a new thread for 100 iterations. The file descriptor count has been constant, aside from a temporary increase when the read is actually occurring.

Discussion

Robert Clark - 2010-08-06

Re-written version of the class

JIComTransport.java

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-08-06

Unified diff of the changes

JIComTransport.java.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vikram Roopchand - 2010-08-09

Hi,
How is your patch holding ? Have you done any long running tests on it ? How does it hold under heavy usage ?

thanks,
best regards,
Vikram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-08-09

I ran the test code over the weekend (about 250,000 queries) and did not see any file descriptor leaks. In our application there does not seem to be any performance impact with managing the selector ourselves rather than letting the Sun code cache it until it is GC'd.

However, the evaluation in this issue:

http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5083450

make me think that some more detailed performance testing on Windows is required. I've got that running now.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vikram Roopchand - 2010-08-10

Hi,
Okay good. A few more :-

1. What version of Java are you testing it with ?
2. Would you also be testing this on other platforms ?
3. We have a build ready , but I think we should include this patch in it too , (provided it holds for other FA customers also), we can send you the source for the next release , could you at your end merge this change and try the long running tests ? I don't think that particular file has any changes from our end.

thanks,
best regards,
Vikram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-08-10

1. I'm using JDK 1.6.0_21 on Linux, JDK 1.6.0_04 on WIndows.
2. I've not tested this on other platform yet but I can do that fairly easily. I'll throw Solaris, HP-UX and AIX into the mix and see what happens.
3. Sure, I can merge in my changes quite easily and toss the new code into my tests. Would it be easier if I just grabbed trunk out of the SourceForge SVN server?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-08-10

Added:

1. Solaris 10 (sparc), JDK 1.6.0_21
2. AIX 5.2, JDK 1.6.0.3260.1
3. HP-UX 11.11 (PA-RISC), JDK 1.6.0.6.1

to the mix and run my test code for about 2 hours (at about 1400 requests each) and did not encounter any problems. The processes also did not consume extra file descriptors except when actually making a query.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vikram Roopchand - 2010-08-11

Hi,
Sent a mail to your SF address with the updated srcs. Please do perform the tests over all environments (including long running).

thanks,
best regards,
Vikram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-08-11

I've been running the updated code for about 6 hours on Windows XP, Solaris 10, and Linux and all looks good. I'll leave the tests running overnight and see what happens.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vikram Roopchand - 2010-08-12

Great ! Let me know if it held up ...

best regards,
Vikram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-08-12

The overnight test with the new code + patch revealed no problems. Tested for 24 hours on:

Windows XP - JDK 1.6.0_21
Solaris 10 - JDK 1.6.0_21
Linux - JDK 1.6.0_21
AIX 5.2 - JDK 1.6.0.3260.1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vikram Roopchand - 2010-09-01

Hi,
One of th e users has reported a PDU Assembly exception while testing a jar with this patch. If I may ask, why are you not applying the timeout directly on the socket ? And keep read function as it was before. As I understand we loose two descriptors during a "connect" and rest of the operations are blocking in nature.

thanks,
best regards,
Vikram

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Robert Clark - 2010-09-01

Putting the timeout on the socket causes the JDK to cache 2 (pipe) file descriptors in a thread local. This means that the descriptors remain open and cached until the thread is GC'd. This is what was causing the file descriptor leak that was killing our app.

Do you have a link to the actual exception stack trace?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.