Skip to content

Feature/cerati ng2caf #532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 8, 2025
Merged

Conversation

cerati
Copy link
Contributor

@cerati cerati commented Apr 11, 2025

Description

This PR adds a few NuGraph-related variables to CAFs, along the lines of what presented on docdb 40585. Once icaruscode PR815 is merged the needed data products can be produced, but it won't break CAF making if they are not there. Requires sbnanaobj PR137.

@cerati cerati added the enhancement New feature or request label Apr 11, 2025
@kjplows kjplows moved this to Open pull requests in SBN software development Apr 17, 2025
@@ -95,6 +95,7 @@
#include "lardataobj/RecoBase/Vertex.h"
#include "lardataobj/RecoBase/Shower.h"
#include "lardataobj/RecoBase/MCSFitResult.h"
#include "lardataobj/RecoBase/Cluster.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the inclusion of FillReco prevents this from being an actual problem, but I am trying to recall if our style guidelines suggest that

#include "lardataobj/AnalysisBase/MVAOutput.h"

should still be included here too since FeatureVector is used directly in CAFMaker_module?

@@ -95,6 +95,7 @@
#include "lardataobj/RecoBase/Vertex.h"
#include "lardataobj/RecoBase/Shower.h"
#include "lardataobj/RecoBase/MCSFitResult.h"
#include "lardataobj/RecoBase/Cluster.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only leaving this as comments for now as I have more pieces to look at and so don't want to leave an actual "review" result yet, but I'll put a couple thoughts here.

I guess the inclusion of FillReco prevents this from being an actual problem, but I'm trying to recall if our style guidelines suggest that you should actually put

#include "lardataobj/AnalysisBase/MVAOutput.h"

here as an include too since FeatureVector is directly used in the CAFMaker module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll add this include!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding!

//need to double check that the slice processed by NuGraph is the same under consideration
//std::cout << "sizes=" << inputHits.size() << " " << sliceHitsMap.size() << " " << ngFilterResult.size() << " " << ngSemanticResult.size() << std::endl;
unsigned int nHits = inputHits.size();
if (nHits==0 || nHits!=sliceHitsMap.size() || inputHits[0].key()!=sliceHitsMap[0]) return;//not the same slice!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be a silent error or would it be a sign of a bigger problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an error. We just need to skip the slices that were not considered by NuGraph

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, since NuGraph only runs on one slice!

Comment on lines 981 to 983
auto arg_max = [](std::vector<float> const& vec) {
return std::distance(vec.begin(), std::max_element(vec.begin(), vec.end()));
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question on the SBN coding conventions: are lambdas okay within other methods? (If @cerati himself isn't sure then maybe @PetrilloAtWork is the best person to answer this (especially as he is also tagged as the other reviewer))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved by Giuseppe's recent commit, which just uses this one-liner in place of the 2 lines this function would have been used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing about this case in our guidelines.
Personally I do recommend the use of lambdas in methods where it makes sense (typically, to avoid the repetition of simple code). However, if the same lambda appears in different methods, then it should likely be a member function if possible.

Comment on lines 1001 to 1005
for (size_t i=0;i<scores->size();i++) ng2semscores.push_back(scores->at(i));
size_t sem_label = arg_max(ng2semscores);
ng2sempfpcounts[sem_label]++;
auto bkgscore = ngFilterResult.at(pos);
if (bkgscore->at(0)<0.5) ng2bkgpfpcount++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine, but I was thinking: is there an argument to be made that one should only fill the semantic label fractions for hits that are not tagged as background?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good point. Ideally this should be studied, to find what works best. In general, the semantic labels can be meaningful also when the filter does not pass (e.g. for cosmics). But since noise is a bigger issues than cosmics (unlike MicroBooNE), your suggestion makes sense and I will implement it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking more about this. I think I agree with your point if the cosmic hits were considered background. Isn't the current training considering them signal though? Apologies if I'm misremembering?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the current training is considering them signal but this is going to change soon!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah then I guess I'm okay with either way, though I suppose in this updated schema the background fraction would still be meaningful to distinguish a cosmic PFP from others? And the semantic label fractions would help distinguish if there is some overlapping particle what it was, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right. My point is that for ICARUS noise is likely more of an issue than cosmics, so the semantic labels for the background hits are more likely to be meaningless (as they are more likely to be noise and not real cosmic particles). So the new version makes more sense as we don't want to pollute the semantic counts with noise. But of course the background fraction is still useful to reject cosmics!

Comment on lines +1826 to +1844
art::FindManyP<recob::Cluster> fmPFPClusters =
FindManyPStrict<recob::Cluster>(fmPFPart, evt, fParams.PFParticleLabel() + slice_tag_suff);

std::vector<std::vector<art::Ptr<recob::Hit>>> fmPFPartHits;
// make Ptr's to clusters for cluster -> other object associations
if (fmPFPClusters.isValid()) {
for (size_t ipf=0; ipf<fmPFPart.size();++ipf) {
std::vector<art::Ptr<recob::Hit>> pfphits;
std::vector<art::Ptr<recob::Cluster>> pfclusters = fmPFPClusters.at(ipf);
art::FindManyP<recob::Hit> fmCluHits = FindManyPStrict<recob::Hit>(pfclusters, evt, fParams.PFParticleLabel() + slice_tag_suff);
for (size_t icl=0; icl<fmCluHits.size();icl++) {
for (auto hit : fmCluHits.at(icl)) {
pfphits.push_back(hit);
}
}
fmPFPartHits.push_back(pfphits);
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we had a quick chat once about using index directly vs using the art Ptr key() and that both of us are more used to key, but you found that index was working okay (and seems to be used elsewhere here). Please remind me if I'm missing details? I wanted to ask just to double check that this is guaranteed to work as desired!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when looping plainly over a collection, the way one would use key is to first create an art::Ptr based on the handle and the index, which guarantees that the key is the same as the index

@@ -950,6 +970,58 @@ namespace caf
srpfp.cnnscore.nclusters = cnnscore->nClusters;
}

void FillPPFNuGraph(const std::vector<unsigned int> &sliceHitsMap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed this is PPF not PFP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

Comment on lines +982 to +989
std::vector<size_t> mappedhits;
for (auto& hit : pfpHits) {
auto it = std::find(sliceHitsMap.begin(), sliceHitsMap.end(), hit.key());
if (it != sliceHitsMap.end()) {
size_t index = std::distance(sliceHitsMap.begin(), it);
mappedhits.push_back(index);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking my understanding, I assume that the NuGraph producer is filling up the map that you grab above with the hit keys for the slice and so then this is why you grab things in this way from the map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. See icaruscode/NuGraphIcarus/IcarusNuSliceHitsProducer_module.cc being added in the icaruscode PR. NuGraph runs over a hit collection. This hit collection is created based on the hits in the best neutrino slice. This means that NuGraph labels are applied to a hit collection which is a subset of the one used by Pandora. So in order to get the NuGraph label of hits in Pandora PFPs, a mapping is needed

Copy link
Member

@PetrilloAtWork PetrilloAtWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No mistakes spotted, asking for some documentation and a bit of code cleanup (in particular, documentation about the hard-coded values and what they do).


unsigned int npass = 0;
for ( unsigned int i = 0; i < nHits; i++ ) {
if (ngFilterResult.at(i)->at(0)>0.5) npass++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded 0.5: can you document it and maybe turn its value into a C++ constant?


unsigned int npass = 0;
for ( unsigned int i = 0; i < nHits; i++ ) {
if (ngFilterResult.at(i)->at(0)>0.5) npass++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the cut value is included:

Suggested change
if (ngFilterResult.at(i)->at(0)>0.5) npass++;
if (ngFilterResult.at(i)->at(0)>=0.5) npass++;

Of course practically it seldom matters, but it would be good to pick a convention and stick with it (below there is a < 0.5).

size_t ng2bkgpfpcount = 0;
for (size_t pos : mappedhits) {
auto bkgscore = ngFilterResult.at(pos);
if (bkgscore->at(0)<0.5) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Is this the same 0.5 as above?"
Please use a constant to describe this one too (I suppose it is different than the one before).

srpfp.ng_dif_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[4])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ng_bkg_frac = float(ng2bkgpfpcount)/pfpHits.size();
} else {
srpfp.ng_sem_cat = -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the enum suggestion on the other PR.

std::vector<float> ng2sempfpcounts(5,0);
size_t ng2bkgpfpcount = 0;
for (size_t pos : mappedhits) {
auto bkgscore = ngFilterResult.at(pos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid unnecessary copies:

Suggested change
auto bkgscore = ngFilterResult.at(pos);
auto const& bkgscore = ngFilterResult.at(pos);

(this is probably still fine, since it's implemented as a static array).

@brucehoward-physics
Copy link
Contributor

I think I am happy to consider this approved, based on Giuseppe's responses and changes e.g., once Gianluca is satisfied. I don't know if me hitting "approve" and being otherwise silent will somehow supersede Gianluca's comments in the eyes of @kjplows or other manager, so remarking this here :)

Copy link
Contributor

@brucehoward-physics brucehoward-physics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved with the comment given about pending Gianluca's satisfaction as well.

@kjplows
Copy link
Contributor

kjplows commented May 7, 2025

trigger build SBNSoftware/sbnanaobj#137

@FNALbuild
Copy link

✔️ CI build for LArSoft Succeeded on slf7 for e26:prof -- details available through the CI dashboard

@FNALbuild
Copy link

✔️ CI build for LArSoft Succeeded on slf7 for c14:prof -- details available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for ICARUS Failed at phase build ICARUS on slf7 for c14:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build ICARUS phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for SBND Failed at phase build SBND on slf7 for c14:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build SBND phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

⚠️ CI build for SBND Warning at phase ci_tests SBND on slf7 for e26:prof -- details available through the CI dashboard

🚨 For more details about the warning phase, check the ci_tests SBND phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

⚠️ CI build for ICARUS Warning at phase ci_tests ICARUS on slf7 for e26:prof - ignored failure for unit_test -- details available through the CI dashboard

🚨 For more details about the warning phase, check the ci_tests ICARUS phase logs

parent CI build details are available through the CI dashboard

@kjplows kjplows moved this from Open pull requests to Testing in SBN software development May 7, 2025
Comment on lines 1061 to 1065
srpfp.ngscore.mip_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[0])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.hip_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[1])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.shr_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[2])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.mhl_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[3])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.dif_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[4])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More maintenance- and reader-friendly:

Suggested change
srpfp.ngscore.mip_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[0])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.hip_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[1])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.shr_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[2])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.mhl_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[3])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
srpfp.ngscore.dif_frac = (pfpHits.size()>ng2bkgpfpcount ? float(ng2sempfpcounts[4])/(pfpHits.size()-ng2bkgpfpcount) : -1.);
size_t hitExcess = pfpHits.size()>ng2bkgpfpcount ? pfpHits.size()-ng2bkgpfpcount: 0;
srpfp.ngscore.mip_frac = hitExcess > 0 ? float(ng2sempfpcounts[0])/hitExcess : -1.;
srpfp.ngscore.hip_frac = hitExcess > 0 ? float(ng2sempfpcounts[1])/hitExcess : -1.;
srpfp.ngscore.shr_frac = hitExcess > 0 ? float(ng2sempfpcounts[2])/hitExcess : -1.;
srpfp.ngscore.mhl_frac = hitExcess > 0 ? float(ng2sempfpcounts[3])/hitExcess : -1.;
srpfp.ngscore.dif_frac = hitExcess > 0 ? float(ng2sempfpcounts[4])/hitExcess : -1.;

@kjplows
Copy link
Contributor

kjplows commented May 8, 2025

trigger build SBNSoftware/sbnanaobj#137

@FNALbuild
Copy link

✔️ CI build for LArSoft Succeeded on slf7 for c14:prof -- details available through the CI dashboard

@FNALbuild
Copy link

✔️ CI build for LArSoft Succeeded on slf7 for e26:prof -- details available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for SBND Failed at phase build SBND on slf7 for c14:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build SBND phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

❌ CI build for ICARUS Failed at phase build ICARUS on slf7 for c14:prof -- details available through the CI dashboard

🚨 For more details about the failed phase, check the build ICARUS phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

⚠️ CI build for ICARUS Warning at phase ci_tests ICARUS on slf7 for e26:prof - ignored failure for unit_test -- details available through the CI dashboard

🚨 For more details about the warning phase, check the ci_tests ICARUS phase logs

parent CI build details are available through the CI dashboard

@FNALbuild
Copy link

⚠️ CI build for SBND Warning at phase ci_tests SBND on slf7 for e26:prof -- details available through the CI dashboard

🚨 For more details about the warning phase, check the ci_tests SBND phase logs

parent CI build details are available through the CI dashboard

@kjplows kjplows moved this from Testing to To merge in SBN software development May 8, 2025
@kjplows kjplows merged commit fb52788 into SBNSoftware:develop May 8, 2025
@github-project-automation github-project-automation bot moved this from To merge to Done in SBN software development May 8, 2025
@kjplows kjplows moved this from Done to Recently done in SBN software development May 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Recently done
Development

Successfully merging this pull request may close these issues.

5 participants