-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
We had a report of a repeatable seg fault from a GATK user running HaplotypeCaller. They're using gatk 4.6.0.0 which is using the most recent GKL 0.8.11 .
I've repeated their report below. (from broadinstitute/gatk#8988)
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f06ed243291, pid=1058615, tid=1058616
#
# JRE version: OpenJDK Runtime Environment (17.0.2+8) (build 17.0.2+8-86)
# Java VM: OpenJDK 64-Bit Server VM (17.0.2+8-86, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C [libc.so.6+0xcf291] __memset_avx2_erms+0x11
#
# Core dump will be written. Default location: Core dumps may be processed with "/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e" (or dumping to /bigdata/ramadugulab/luy/SNPcallingBreeding/core.1058615)
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- S U M M A R Y ------------
Command Line: -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/gatk/4.6.0.0/gatk-package-4.6.0.0-local.jar HaplotypeCaller -R /rhome/luy/bigdata/genomes/Cclementina_182_v1_2.fa -I AlignedCalToCcl_Scaffolds_MarkDupOut.bam -O AlignedCalToCcl_Scaffolds.vcf.gz -ERC GVCF
Host: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz, 64 cores, 20G, Rocky Linux release 8.8 (Green Obsidian)
Time: Sat Sep 28 04:11:19 2024 PDT elapsed time: 58592.788414 seconds (0d 16h 16m 32s)
--------------- T H R E A D ---------------
Current thread (0x00007f06e4025b70): JavaThread "main" [_thread_in_native, id=1058616, stack(0x00007f06edc7a000,0x00007f06edd7b000)]
Stack: [0x00007f06edc7a000,0x00007f06edd7b000], sp=0x00007f06edbe6458, free space=18014398509481393k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [libc.so.6+0xcf291] __memset_avx2_erms+0x11
C [libgkl_pairhmm_omp5311772482084658743.so+0x1500f] Java_com_intel_gkl_pairhmm_IntelPairHmm_computeLikelihoodsNative._omp_fn.0+0xcf
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J 8942 com.intel.gkl.pairhmm.IntelPairHmm.computeLikelihoodsNative([Ljava/lang/Object;[Ljava/lang/Object;[D)V (0 bytes) @ 0x00007f06d563401c [0x00007f06d5633fa0+0x000000000000007c]
J 10003 c2 com.intel.gkl.pairhmm.IntelPairHmm.computeLikelihoods([Lorg/broadinstitute/gatk/nativebindings/pairhmm/ReadDataHolder;[Lorg/broadinstitute/gatk/nativebindings/pairhmm/HaplotypeDataHolder;[D)V (119 bytes) @ 0x00007f06d5bff3e0 [0x00007f06d5bff3a0+0x0000000000000040]
J 6781 c2 org.broadinstitute.hellbender.utils.pairhmm.VectorLoglessPairHMM.computeLog10Likelihoods(Lorg/broadinstitute/hellbender/utils/genotyper/LikelihoodMatrix;Ljava/util/List;Lorg/broadinstitute/hellbender/utils/pairhmm/PairHMMInputScoreImputator;)V (450 bytes) @ 0x00007f06d54f8cc8 [0x00007f06d54f8a00+0x00000000000002c8]
J 10022 c2 org.broadinstitute.hellbender.tools.walkers.haplotypecaller.PairHMMLikelihoodCalculationEngine.computeReadLikelihoods(Lorg/broadinstitute/hellbender/tools/walkers/haplotypecaller/AssemblyResultSet;Lorg/broadinstitute/hellbender/utils/genotyper/SampleList;Ljava/util/Map;Z)Lorg/broadinstitute/hellbender/utils/genotyper/AlleleLikelihoods; (25 bytes) @ 0x00007f06d5c0cb30 [0x00007f06d5c0b540+0x00000000000015f0]
J 9971 c2 org.broadinstitute.hellbender.tools.walkers.haplotypecaller.HaplotypeCallerEngine.callRegion(Lorg/broadinstitute/hellbender/engine/AssemblyRegion;Lorg/broadinstitute/hellbender/engine/FeatureContext;Lorg/broadinstitute/hellbender/engine/ReferenceContext;)Ljava/util/List; (2286 bytes) @ 0x00007f06d5bdef08 [0x00007f06d5bdcd60+0x00000000000021a8]
J 10571% c2 org.broadinstitute.hellbender.engine.AssemblyRegionWalker.processReadShard(Lorg/broadinstitute/hellbender/engine/MultiIntervalLocalReadShard;Lorg/broadinstitute/hellbender/engine/ReferenceDataSource;Lorg/broadinstitute/hellbender/engine/FeatureManager;)V (154 bytes) @ 0x00007f06d5c8e5c0 [0x00007f06d5c8dd20+0x00000000000008a0]
j org.broadinstitute.hellbender.engine.AssemblyRegionWalker.traverse()V+83
j org.broadinstitute.hellbender.engine.GATKTool.doWork()Ljava/lang/Object;+19
j org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool()Ljava/lang/Object;+34
j org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs()Ljava/lang/Object;+225
j org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain([Ljava/lang/String;)Ljava/lang/Object;+14
j org.broadinstitute.hellbender.Main.runCommandLineProgram(Lorg/broadinstitute/hellbender/cmdline/CommandLineProgram;[Ljava/lang/String;)Ljava/lang/Object;+20
j org.broadinstitute.hellbender.Main.mainEntry([Ljava/lang/String;)V+22
j org.broadinstitute.hellbender.Main.main([Ljava/lang/String;)V+8
v ~StubRoutines::call_stub
siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x00007f06edc39d00
Register to memory mapping:
RAX=0x0 is NULL
RBX=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
RCX=0x0000000000028318 is an unknown value
RDX=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
RSP=0x00007f06edbe6458 points into unknown readable memory: 0x00007f0673c89bc4 | c4 9b c8 73 06 7f 00 00
RBP=0x00007f06edd78f50 is pointing into the stack for thread: 0x00007f06e4025b70
RSI=0x0 is NULL
RDI=0x00007f06edc39d00: <offset 0x0000000000006d00> in /bigdata/operations/pkgadmin/opt/linux/centos/8.x/x86_64/pkgs/java/17.0.2/lib/libjava.so at 0x00007f06edc33000
R8 =0x0000000000004f9a is an unknown value
R9 =0x0000000000000001 is an unknown value
R10=0x00000000000000c3 is an unknown value
R11=0x00007f06e47c9840 points into unknown readable memory: 0x4141474141414143 | 43 41 41 41 41 47 41 41
R12=0x00007f06edc119e0 points into unknown readable memory: 0x0000000000000000 | 00 00 00 00 00 00 00 00
R13=0x00007f06edbe96c0 points into unknown readable memory: 0x00007f06e4f65c50 | 50 5c f6 e4 06 7f 00 00
R14=0x0000000000028318 is an unknown value
R15=0x0000000000005063 is an unknown value
Registers:
RAX=0x0000000000000000, RBX=0x00007f06edc39d00, RCX=0x0000000000028318, RDX=0x00007f06edc39d00
RSP=0x00007f06edbe6458, RBP=0x00007f06edd78f50, RSI=0x0000000000000000, RDI=0x00007f06edc39d00
R8 =0x0000000000004f9a, R9 =0x0000000000000001, R10=0x00000000000000c3, R11=0x00007f06e47c9840
R12=0x00007f06edc119e0, R13=0x00007f06edbe96c0, R14=0x0000000000028318, R15=0x0000000000005063
RIP=0x00007f06ed243291, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000007
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007f06edbe6458)
0x00007f06edbe6458: 00007f0673c89bc4 7b8f04462509c62f
0x00007f06edbe6468: 8010180048120140 0000c12912a02890
0x00007f06edbe6478: 0460229080441000 ffffffffffffffff
0x00007f06edbe6488: 4a03ed807b023001 3040120080800100
Steps to reproduce
The command ran was
gatk HaplotypeCaller -R /rhome/luy/bigdata/genomes/Cclementina_182_v1_2.fa -I AlignedCalToCcl_Scaffolds_MarkDupOut.bam \
-O AlignedCalToCcl_Scaffolds.vcf.gz \
-ERC GVCF
Submitted to an HPC cluster using Slurm. Multiple machines tested, one Intel with an Xeon CPU E5-2683 v4 CPU and additionally tested on AMD with an EPYC 7713 CPU.
This has also been run multiple times, all crashing at the same __memset_avx2_erms+0x11
instruction.
Other package versions that might be relevant:
java/17.0.2
glibc-common-2.28-225
Metadata
Metadata
Assignees
Labels
No labels