-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Open
Description
During the integration of the main ticket for the 2025 PbPb menu CMSHLT-3658, I stumbled upon non-reproducibilities and crashes related to the pixel tracking package, for example:
Thread 18 (Thread 0x7f5e5f083640 (LWP 2474523) "cmsRun"):
#0 0x00007f5eae30200f in poll () from /lib64/libc.so.6
#1 0x00007f5ea9d92297 in edm::service::InitRootHandlers::stacktraceFromThread() () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#2 0x00007f5ea9d92494 in sig_dostack_then_abort () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginFWCoreServicesPlugins.so
#3 <signal handler called>
#4 0x00007f5e47f1b10d in alpaka::TaskKernelCpuSerial<std::integral_constant<unsigned long, 1ul>, unsigned int, alpaka_serial_sync::caHitNtupletGeneratorKernels::Kernel_fillGenericPair, caStructures::CAP\
airLayout<128ul, false>::ViewTemplateFreeParams<128ul, false, true, true>&, unsigned int*, cms::alpakatools::OneToManyAssocRandomAccess<unsigned int, -1, -1>*>::operator()() const () from /cvmfs/cms.cern\
.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginRecoTrackerPixelSeedingPortableSerialSync.so
#5 0x00007f5e47fba716 in void alpaka::exec<alpaka::AccCpuSerial<std::integral_constant<unsigned long, 1ul>, unsigned int>, alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>, alpaka::WorkDivMembers<std\
::integral_constant<unsigned long, 1ul>, unsigned int>, alpaka_serial_sync::caHitNtupletGeneratorKernels::Kernel_fillGenericPair, caStructures::CAPairLayout<128ul, false>::ViewTemplateFreeParams<128ul, f\
alse, true, true>&, unsigned int*, cms::alpakatools::OneToManyAssocRandomAccess<unsigned int, -1, -1>*>(alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>&, alpaka::WorkDivMembers<std::integral_constant\
<unsigned long, 1ul>, unsigned int> const&, alpaka_serial_sync::caHitNtupletGeneratorKernels::Kernel_fillGenericPair const&, caStructures::CAPairLayout<128ul, false>::ViewTemplateFreeParams<128ul, false,\
true, true>&, unsigned int*&&, cms::alpakatools::OneToManyAssocRandomAccess<unsigned int, -1, -1>*&&) [clone .constprop.0] [clone .isra.0] () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1\
_0/lib/el9_amd64_gcc12/pluginRecoTrackerPixelSeedingPortableSerialSync.so
#6 0x00007f5e47f1d87f in alpaka_serial_sync::CAHitNtupletGeneratorKernels<pixelTopology::HIonPhase1>::launchKernels(reco::TrackingHitsLayout<128ul, false>::ConstViewTemplateFreeParams<128ul, false, true\
, true> const&, unsigned int, unsigned short, reco::TrackLayout<128ul, false>::ViewTemplateFreeParams<128ul, false, true, true>&, reco::TrackHitsLayout<128ul, false>::ViewTemplateFreeParams<128ul, false,\
true, true>&, reco::CALayersLayout<128ul, false>::ConstViewTemplateFreeParams<128ul, false, true, true> const&, reco::CAGraphLayout<128ul, false>::ConstViewTemplateFreeParams<128ul, false, true, true> c\
onst&, alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>&) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginRecoTrackerPixelSeedingPortableSerialSync.so
#7 0x00007f5e47f12a5e in alpaka_serial_sync::CAHitNtupletGenerator<pixelTopology::HIonPhase1>::makeTuplesAsync(reco::TrackingRecHitHost const&, PortableHostMultiCollection<reco::CALayersLayout<128ul, fa\
lse>, reco::CAGraphLayout<128ul, false>, reco::CAModulesLayout<128ul, false> > const&, float, unsigned int, unsigned int, alpaka::QueueGenericThreadsBlocking<alpaka::DevCpu>&) const () from /cvmfs/cms.ce\
rn.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginRecoTrackerPixelSeedingPortableSerialSync.so
#8 0x00007f5e47f12dc8 in alpaka_serial_sync::CAHitNtupletAlpaka<pixelTopology::HIonPhase1>::produce(alpaka_serial_sync::device::Event&, alpaka_serial_sync::device::EventSetup const&) () from /cvmfs/cms.\
cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginRecoTrackerPixelSeedingPortableSerialSync.so
#9 0x00007f5e47f0aed9 in alpaka_serial_sync::stream::EDProducer<edm::GlobalCache<reco::CAGeometryParams>, edm::RunCache<cms::alpakatools::MoveToDeviceCache<alpaka::DevCpu, PortableHostMultiCollection<re\
co::CALayersLayout<128ul, false>, reco::CAGraphLayout<128ul, false>, reco::CAModulesLayout<128ul, false> > > > >::produce(edm::Event&, edm::EventSetup const&) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/c\
ms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/pluginRecoTrackerPixelSeedingPortableSerialSync.so
#10 0x00007f5eaf655d75 in edm::stream::EDProducerAdaptorBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12\
/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/libFWCoreFramework.so
#11 0x00007f5eaf63a59c in edm::WorkerT<edm::stream::EDProducerAdaptorBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/\
CMSSW_15_1_0/lib/el9_amd64_gcc12/libFWCoreFramework.so
#12 0x00007f5eaf5c0d39 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::excepti\
on_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchA\
ctionType)1>::Context const*) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/lib/el9_amd64_gcc12/libFWCoreFramework.so
#13 0x00007f5eaf5c1234 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CMSSW_15_1_0/li\
b/el9_amd64_gcc12/libFWCoreFramework.so
#14 0x00007f5eaf802388 in tbb::detail::d2::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /cvmfs/cms.cern.ch/el9_amd64_gcc12/cms/cmssw/CM\
SSW_15_1_0/lib/el9_amd64_gcc12/libFWCoreConcurrency.so
#15 0x00007f5eaf75b5da in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=<optimized out>, waiter=..., this=0x7f5ead5d1f00) at /data/cmsbld/jenkin\
s/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf84e40dc1fc78f6/tbb-v2022.0.0/src/tbb/task_dispatcher.h:334
#16 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7f5ead5d1f00) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd6\
4_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf84e40dc1fc78f6/tbb-v2022.0.0/src/tbb/task_dispatcher.h:470
#17 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf84e40dc1fc78f6/tbb-v202\
2.0.0/src/tbb/arena.cpp:215
#18 tbb::detail::r1::thread_dispatcher_client::process (td=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf84e40\
dc1fc78f6/tbb-v2022.0.0/src/tbb/thread_dispatcher_client.h:41
#19 tbb::detail::r1::thread_dispatcher::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf84e40dc1fc78f\
6/tbb-v2022.0.0/src/tbb/thread_dispatcher.cpp:195
#20 0x00007f5eaf753688 in tbb::detail::r1::rml::private_worker::run (this=0x7f5eaabc7080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf8\
4e40dc1fc78f6/tbb-v2022.0.0/src/tbb/private_server.cpp:271
#21 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7f5eaabc7080) at /data/cmsbld/jenkins/workspace/build-any-ib/w/BUILD/el9_amd64_gcc12/external/tbb/v2022.0.0-1feaa53e42a55cbacf84e40dc1fc78f\
6/tbb-v2022.0.0/src/tbb/private_server.cpp:221
[...]
Current Modules:
Module: alpaka_serial_sync::CAHitNtupletAlpakaHIonPhase1:hltPixelTracksPPOnAASoA (crashed)
Module: RawDataCollectorByLabel:rawDataCollector
Module: RawDataCollectorByLabel:rawDataCollector
Module: L1TDigiToRaw:packGtStage2
A fatal system signal has occurred: segmentation violation
A simple reproducer is available here (to be run in CMSSW_15_1_0_patch1):
#!/bin/bash -ex
for i in {1..20}; do
dirname="run_${i}" # or replace with your actual directory naming pattern
echo ">>> Running test in directory: ${dirname}"
hltIntegrationTests /dev/CMSSW_15_1_0/HIon/V10 \
-n 1000 \
--input /store/hidata/HIRun2024B/HIEphemeralZeroBias0/RAW/v1/000/388/769/00000/1c181bd8-e9cf-4621-b68c-768ec5d49ff3.root \
-x "--globaltag 150X_dataRun3_HLT_v1" \
-x "--no-output" \
-x "--eras Run3_2025 --l1-emulator uGT --l1 L1Menu_CollisionsHeavyIons2024_v1_0_6_xml" \
-x "--open" \
--paths "DQM_HIPixelReconstruction_v*" \
--dir "${dirname}"
echo ">>> Done with ${dirname}"
echo "--------------------------------------"
doneit will eventually crash, over enough trials.