Skip to content

Conversation

jaydeluca
Copy link
Member

@jaydeluca jaydeluca commented Oct 16, 2025

There have been a lot of flaky opensearch tests today (example)

I figured we could try disabling container support (-XX:-UseContainerSupport) and Log4j JMX to fix some NullPointerExceptions in CgroupV2Subsystem.getInstance() during container startup.

Not sure if this will fix it, but didnt seem like it could hurt. Seems like it helped for someone else

Hopefully resolves #15024

Some stack traces:

2025-10-16 20:04:05,784 main ERROR Could not reconfigure JMX java.lang.NullPointerException |  
-- | --
at java.base/jdk.internal.platform.cgroupv2.CgroupV2Subsystem.getInstance(CgroupV2Subsystem.java:81) |  
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:113) |  
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167) |  
at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29) |  
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58) |  
at java.base/jdk.internal.platform.Container.metrics(Container.java:43) |  
at jdk.management/com.sun.management.internal.OperatingSystemImpl.(OperatingSystemImpl.java:182) |  
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:281) |  
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198) |  
at java.management/java.lang.management.ManagementFactory.lambda$getPlatformMBeanServer$0(ManagementFactory.java:487) |  
at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:271) |  
at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:177) |  
at java.base/java.util.HashMap$ValueSpliterator.forEachRemaining(HashMap.java:1693) |  
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) |  
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) |  
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:150) |  
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:173) |  
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) |  
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:497) |  
at java.management/java.lang.management.ManagementFactory.getPlatformMBeanServer(ManagementFactory.java:488) |  
at org.apache.logging.log4j.core.jmx.Server.reregisterMBeansAfterReconfigure(Server.java:140)

fatal error in thread [main], exiting |  
-- | --
java.lang.ExceptionInInitializerError |  
at org.opensearch.bootstrap.Bootstrap.initializeProbes(Bootstrap.java:177) |  
at org.opensearch.bootstrap.Bootstrap.setup(Bootstrap.java:199) |  
at org.opensearch.bootstrap.Bootstrap.init(Bootstrap.java:412) |  
at org.opensearch.bootstrap.OpenSearch.init(OpenSearch.java:178) |  
at org.opensearch.bootstrap.OpenSearch.execute(OpenSearch.java:169) |  
at org.opensearch.cli.EnvironmentAwareCommand.execute(EnvironmentAwareCommand.java:100) |  
at org.opensearch.cli.Command.mainWithoutErrorHandling(Command.java:138) |  
at org.opensearch.cli.Command.main(Command.java:101) |  
at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:135) |  
at org.opensearch.bootstrap.OpenSearch.main(OpenSearch.java:101) |  
Caused by: java.lang.NullPointerException |  
at java.base/jdk.internal.platform.cgroupv2.CgroupV2Subsystem.getInstance(CgroupV2Subsystem.java:81) |  
at java.base/jdk.internal.platform.CgroupSubsystemFactory.create(CgroupSubsystemFactory.java:113) |  
at java.base/jdk.internal.platform.CgroupMetrics.getInstance(CgroupMetrics.java:167) |  
at java.base/jdk.internal.platform.SystemMetrics.instance(SystemMetrics.java:29) |  
at java.base/jdk.internal.platform.Metrics.systemMetrics(Metrics.java:58) |  
at java.base/jdk.internal.platform.Container.metrics(Container.java:43) |  
at jdk.management/com.sun.management.internal.OperatingSystemImpl.(OperatingSystemImpl.java:182) |  
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl.getOperatingSystemMXBean(PlatformMBeanProviderImpl.java:281) |  
at jdk.management/com.sun.management.internal.PlatformMBeanProviderImpl$3.nameToMBeanMap(PlatformMBeanProviderImpl.java:198) |  
at java.management/sun.management.spi.PlatformMBeanProvider$PlatformComponent.getMBeans(PlatformMBeanProvider.java:195) |  
at java.management/java.lang.management.ManagementFactory.getPlatformMXBean(ManagementFactory.java:686) |  
at java.management/java.lang.management.ManagementFactory.getOperatingSystemMXBean(ManagementFactory.java:388) |  
at org.opensearch.monitor.process.ProcessProbe.(ProcessProbe.java:46) |  
... 10 more

@jaydeluca jaydeluca requested a review from a team as a code owner October 16, 2025 22:05
// limit memory usage and disable Log4j JMX to avoid cgroup detection issues in containers
opensearch.withEnv(
"OPENSEARCH_JAVA_OPTS",
"-Xmx256m -Xms256m -Dlog4j2.disableJmx=true -Dlog4j2.disable.jmx=true -XX:-UseContainerSupport");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaydeluca jaydeluca changed the title Disable jmx in opensearch test containers Disable container support flag in opensearch test containers Oct 16, 2025
@jaydeluca jaydeluca changed the title Disable container support flag in opensearch test containers Disable container support in opensearch test containers Oct 16, 2025
@trask
Copy link
Member

trask commented Oct 17, 2025

@jaydeluca heads up looks like also impacting elasticsearch

@jaydeluca
Copy link
Member Author

looking at some of the build failures, and researching this more, it might have to do with the linux kernel version in use. It looks like tests are passing on 6.11.x (example success - Linux 6.11.0-1018-azure) but not higher than that (example failure - Linux 6.14.0-1012-azure)

More resources:

@laurit laurit merged commit 7536867 into open-telemetry:main Oct 17, 2025
81 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workflow failed: Build (daily) (#1312)

3 participants