Skip to content

Get DS file structure with serviceX tool #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Apr 23, 2025
Merged

Get DS file structure with serviceX tool #4

merged 29 commits into from
Apr 23, 2025

Conversation

ArturU043
Copy link
Collaborator

New function: get_structure

  1. Creates an SX python function query that reads one file of the requested DS and outputs to an ak.array an str encoding the file structure
  2. Builds a deliver spec with support for multiple samples and user-defined names
  3. Gets the result encoded str from the servicex.deliver call
  4. (opt) Prints the encoded str in a user-friendly format
  5. (opt) Returns the re-formatted str
  6. (opt) Saves the re-formatted str to samples-structure.txt
  7. (opt) Reconstructs a dummy ak.array from the encoded str and returns the type constructor

The function can be called from the terminal: servicex-get-structure
Options are added to save to .txt, load a single or multiple DS, write all DS in a .json to be loaded by the command.

Many helpers were added for this feature, run_query, build_deliver_spec, print_structure_from_str, parse_jagged_depth_and_dtype, str_to_array, run_from_command

@ArturU043 ArturU043 self-assigned this Mar 25, 2025
@ArturU043
Copy link
Collaborator Author

This is my first attempt at building this feature, and initially, I didn't expect to reconstruct ak.arrays from the encoded string, but after stepping back, I wonder if it might not be over-complex.

For eg, should I use regex matching, instead of positional methods to extract information from the encoded str?

Should I write a simpler encoded str using the awkward type constructor directly?

Copy link

@gordonwatts gordonwatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - nice! I like this and this is going to be very useful. I agree with your comment about simplifying things. Here is what I think should be done:

  1. Use json (with the built in json module) to generate the output on ServiceX
  2. Use the json module to parse it up on the client.

This should significantly simplify the code - the json builtin parser is basically bullet proof. Once that is done, then how the downstream things work can probably be significantly simplified.

@ArturU043
Copy link
Collaborator Author

Ready to be merged, please add other comments if you have some.

@ArturU043 ArturU043 requested a review from gordonwatts April 14, 2025 15:16
Copy link

@gordonwatts gordonwatts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@ArturU043 ArturU043 merged commit c87f48c into main Apr 23, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants