Skip to content

Storage of job execution environment in output files [2/4] #497

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

PetrilloAtWork
Copy link
Member

@PetrilloAtWork PetrilloAtWork commented Jan 22, 2025

This PR implements a system for storing into some output files information ("SBN metadata") about the execution environment of each job.

From user perspective, adding a "plugin" to the job output module (RootOutput) configuration will have this metadata saved into all the art/ROOT output files and to the TFileService file if available.

Users can read back the metadata in art/ROOT files with a special output module (configuration dump_sbnjobmetadata.fcl provided in sbncode), and can unpack the one in the TFileService file with EvtInfo->dump(std::cout) provided that the dictionary of the class is available.

What code is where

The system includes in sbnobj the data class holding the metadata (sbn::JobEnvironmentInfo) and the small CMake library to extract the GIT branch version, and in sbncode the art-based modules and plugins and the job configuration file for dumping the metadata from art/ROOT file, plus extensive documentation of the whole system in Doxygen format.
The system requires a modification in each repository that wants the GIT branch version extracted.
This PR provides those modifications for icarusalg and icaruscode. In addition, since ICARUS uses a template RootOutput configuration, that configuration has been changed to include the plugin that will save the information.

PR summary

What is included in the metadata

This PR includes in the metadata:

  • the full environment variable set of the job execution environment, with the exception of the shell function;
  • art version;
  • art process name;
  • GIT branch description for all instrumented GIT repositories.

I attach an example of output from dump_sbnmetadata.fcl on a test file created by two "empty" jobs in chain.

If art/ROOT input files contain SBN metadata, that will also be replicated in the art/ROOT output file(s), but not in the TFileService one.

Some details of how this works and its limits

The system writes its metadata in a Results-level data product, which is unlike the ones we are used to in that is accessible basically only by the output modules. While this makes some sense, it also makes the programmatic usage of the metadata much, much harder.

The extraction of the execution environment information is pretty straightforward.
Conversely, the extraction of the GIT branch information, is not.
The pattern of the system is that the CMake building files of a repository need to include instructions to extract from GIT the branch information (git describe) and to put that into a C++ shared library (might have been something else, with different pros and cons). This is a fairly simple set of instruction, but long enough that the PR provides that in a SBNutil.cmake library in sbnobj that the repository CMakeLists.txt need to include.
Then, the art-aware repositories (so, for example sbncode but not sbnobj) need to define a art tool that links to the metadata and returns it.
An algorithm class, sbn::JobEnvironmentInfoExtractor, is provided in sbncode which calls all the tools it knows (from its configuration) and fills the list of metadata.
Finally, the RootOutput plugin SaveJobEnvironment (sbncode) is the front-end executing that algorithm (and passing it the list of known repositories/tools) and storing the result into the output.

Another important limit is that the CMake macros used here are stored in sbnobj, which is the lowest level repository we have in SBN. For once, it is questionable that they belong here; and, more fatally, there are repositories which do not depend on sbnobj (e.g. sbnanaobj) and that as a consequence can't use them. sbnana would have been a natural candidate for inclusion in the system, but it seems unlikely given that it does not depend on sbnobj either (it does depend on sbndata, which is a questionable workaround but still one; sbnanaobj, on the other end, depends almost only on ROOT, and by design).

This system is extensively described in SBNsourceMetadataSystem.dox file.

Testing

The system, in the final incarnation in this PR, has been tested with eight combinations of builds including or not including sbnobj, sbncode, icarusalg and icaruscode.
In the process, a few defects have been found and corrected in the build scripts (typically, missing stuff that was overlooked because was being fortuitously provided by some other package in the build).

Review

I am calling for the review:

  • @SFBayLaser, who five days ago was lamenting the lack of information like this;
  • @absolution1, especially for his considerations on including this system in SBND too.

This system is complicate, and honestly should have been introduced at art level.
A lot of design was involved, and there were a lot of choices made in the process.
The sooner these choices are pondered, tested, discussed and challenged, the better: it is unlikely that design reconsideration will result in a backward-compatible change. Backward compatibility here is not a strong requirement, but it does not hurt.

I don't know how badly this interact with the Spack-based build system. Provided that CMake is still there, this system should still work, although not necessarily with satisfaction.
For example, the dump of the environment implies that all UPS products are trackable; this might not be the case with Spack any more. However, when it is the time, the system can be tuned to the new build system.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • need to include the GIT extraction macros from sbnobj

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • added a new job configuration and new directory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • added a new job configuration and new directory

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • a new job configuration for dumping the SBN metadata

Copy link
Member Author

@PetrilloAtWork PetrilloAtWork Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • metadata "framework":
    • added algorithm to extract SBN metadata
    • added output plugin to write the metadata into art output
    • added the output module to dump that metadata from files to screen
  • package metadata:
    • added macros for extraction of the version of this repository
    • added plugin for reporting versions of sbncode and sbnobj

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • output module dumping SBN metadata from an input file to screen

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • template source for sbncode repository version library

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • template interface for sbncode repository version library

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • implementation of the SBN metadata extraction algorithm

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • interface of the algorithm for extracting SBN metadata

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • helpers to add information to the data product class (which we keep simple...r)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • helpers to add information to the data product class (which we keep simple...r)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • interface of the art tool to collect version information from the various repositories

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • complete documentation of the system in Doxygen format

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • output module plugin to save SBN metadata into art output

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • sbncode-specific tool reporting the version of sbncode and sbnobj repositories

These were written when I thought that libraries would register the repository versions as soon as loaded.
Eventually I moved to dynamic loading plugin objects when the information is needed.
@SFBayLaser
Copy link
Contributor

I love this! But I need to digest... so give me a bit of time...

@kjplows
Copy link
Contributor

kjplows commented Mar 26, 2025

Hi @PetrilloAtWork, @SFBayLaser, and @absolution1 - just pinging to check in on this PR. I've merged develop into the branch - does this still look good to you? Would be great if we could merge this. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request primary An issue or PR that has depending ones under
Projects
Status: Open pull requests
Development

Successfully merging this pull request may close these issues.

3 participants