Replies: 5 comments 6 replies
-
Probably the issue is that each call to |
Beta Was this translation helpful? Give feedback.
-
Ok I'm back... The answer is: yes it works. But, fun fact: the problem only exists with Java17. We had switched to 17 for the project a few weeks ago, but after become suspicious that some production instability was potentially due to its use, we switched back to 11. I initially couldn't reproduce the original finding when I re-tested with the patched agent jar today. Eventually I realized that I was using Java11. Some results from the test program. This is running under JRE17: No agent:
Stock 1.19:
Locally built main branch code:
Same code but with this patch:
|
Beta Was this translation helpful? Give feedback.
-
Here are the same tests run under JRE11: No agent:
Stock 1.19:
Locally built main branch code:
Patched:
|
Beta Was this translation helpful? Give feedback.
-
I created a PR with the fix tested above: #7116 |
Beta Was this translation helpful? Give feedback.
-
So, for JDK8 the fix is not required. This is because the class loader (jdk.nashorn.internal.runtime.ScriptLoader.installClass) is already ignored thanks to this config: https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/main/javaagent-tooling/src/main/java/io/opentelemetry/javaagent/tooling/ignore/GlobalIgnoredTypesConfigurer.java#L29 The small overhead with the agent in place that I measured above I conclude is due to the stretched code path through the ignore matcher and so on, but the big overhead due to examining jar files etc seen with JDK11/17 is not happening. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello. I've been using OT to analyze performance in some fairly complex Kotlin code the past few weeks. In general this has been a positive experience. However, I am beginning to wonder if the OT agent itself is responsible for the last remaining performance mystery.
What I see is the following: application code executes JS scripts using Nashorn. Each call to Nashorn's
eval()
method seems to waste roughly 10ms before it does anything useful (typically it calls back through a bound object into an application function which in turn emits a trace span and its the gap between the parent span beginning and this child span that is the area of interest). Initially I assumed Nashorn is just a thing that wastes 10ms whenever you ask it to do anything. However, experiments with a simple stand-alone Kotlin program that only executes JS scripts in Nashorn suggests that isn't true. Each script invocation takes less than 1ms.Some time-sampling profiling suggested that the missing time is spent in the OT agent's instrumentation code, which appears to be triggered by Nashorn calling the class loader. Obviously calling the class loader frequently isn't the best of plans, but the OT agent seems to turn an inadvisable venture into a disaster.
I reproduced the problem by adding the OT agent to the aforementioned small program test that does nothing but repeatedly execute a simple JS script. Just loading the OT agent into the JVM makes it run 3x slower (no actual tracing is being done): roughly 3ms per script execution vs < 1ms. Code is on this branch: https://github.com/dboreham/kotlin-test/tree/nashorn-perf
Profiling, I see it spending most of its time in this area:
Profiling my main application shows a similar pattern, although the problem appears to be worse there, possibly because there are many more jar files present than the test program has.
Any thoughts on this before I dive deeper are welcome.
Should mention I have tested with 1.19.1 and JDK 17.
Beta Was this translation helpful? Give feedback.
All reactions