Replies: 4 comments 5 replies
-
@masseyke, thanks so much for the report and the reproduction. As you very well know concurrency is hard, and troubleshooting concurrency issues is even harder. @ppkarwasz and I have glanced over the issue. We will try to see if we can spare time for this. |
Beta Was this translation helpful? Give feedback.
-
@masseyke, for the record, we have a dedicated page for ELK in the Log4j documentation and a |
Beta Was this translation helpful? Give feedback.
-
Do you have some platform information for this? I'm not seeing a deadlock myself, but I do see a lot of CPU time spent in |
Beta Was this translation helpful? Give feedback.
-
I have investigated this deadlock issue and would like to share a summary of my analysis and results. SummaryThe issue is resolved in version 1. Root Cause Analysis (on v2.19.0)The deadlock occurs from a conflict between two locks:
This happens when two threads attempt to acquire these locks in the exact opposite order. Path A: Monitor Lock → ReentrantLockThis path is triggered when a rollover occurs and throws an exception.
Click to see relevant stack trace for Path A
Path B: ReentrantLock → Monitor LockThis path is triggered by a standard
Click to see relevant stack trace for Path B
In TextEncoderHelper.java:
private static void writeEncodedText(..., final ByteBufferDestination destination, ...) {
...
result = charsetEncoder.flush(byteBuf);
if (!result.isUnderflow()) {
synchronized(destination) { // <-- DEADLOCK POINT: Tries to acquire the Monitor Lock held by Path A
flushRemainingBytes(charsetEncoder, destination, byteBuf);
}
}
...
} When Path A and Path B execute concurrently, a deadlock is guaranteed. 2. Verification on v2.25.1I re-ran the exact same test code against version I can confirm that the deadlock no longer occurs. The application runs to completion successfully. It seems that the removal of I'd appreciate any feedback on this analysis. Thanks! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
An Elasticsearch user reported deadlock in Elasticsearch that happens occasionally (elastic/elasticsearch#131404). When this happens, they have to restart their Elasticsearch node. It looks like log4j is the source of the deadlock, and I have managed to reproduce it (or something very close to it) using only log4j. The conditions required are a little strange, but they happen to all occur inside of Elasticsearch. They are:
Below is a test that does all of the above, and causes deadlock every time. If you run
kill -3
on the process you'll see that it reportsFound 1 deadlock.
Beta Was this translation helpful? Give feedback.
All reactions