Skip to content

Commit 6bf056b

Browse files
authored
Merge pull request #11 from ssl-hep/docs-dev
Add Docs to main
2 parents 142e4b4 + e6c90e5 commit 6bf056b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+6311
-0
lines changed

.readthedocs.yml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Read the Docs configuration file for Sphinx projects
2+
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details
3+
4+
# Required
5+
version: 2
6+
7+
# Set the OS, Python version and other tools you might need
8+
build:
9+
os: ubuntu-22.04
10+
tools:
11+
python: "3.12"
12+
13+
# Build documentation in the "docs/" directory with Sphinx
14+
sphinx:
15+
configuration: docs/conf.py
16+
# # You can configure Sphinx to use a different builder, for instance use the dirhtml builder for simpler URLs
17+
builder: html
18+
# Fail on all warnings to avoid broken references
19+
# fail_on_warning: true
20+
21+
# Optionally build your docs in additional formats such as PDF and ePub
22+
formats:
23+
- pdf
24+
25+
# Optional but recommended, declare the Python requirements required
26+
# to build your documentation
27+
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html
28+
python:
29+
install:
30+
- method: pip
31+
path: .
32+
extra_requirements:
33+
- docs

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
33.3 KB
Binary file not shown.
24.7 KB
Binary file not shown.

docs/_build/doctrees/index.doctree

8.3 KB
Binary file not shown.
8.11 KB
Binary file not shown.
21.5 KB
Binary file not shown.

docs/_build/html/.buildinfo

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Sphinx build info version 1
2+
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3+
config: ca3626de684797951ef03443c45ac66f
4+
tags: 645f666f9bcd5a90fca523b33c5a78b7
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
# Remote File Introspecting
2+
3+
The `get_structure()` function allows users to query and inspect the internal structure of datasets available through ServiceX. This is useful for determining which branches exist in a given dataset before running a full transformation with the correct branch labelling and typing.
4+
5+
It is useful for any lightweight exploration when only metadata or structure information is required without fetching event-level data.
6+
7+
---
8+
9+
## Overview
10+
11+
The function internally issues a ServiceX request, using python function backend, for the specified dataset(s) and returns a simplified summary of the file structure, such as branches and types in a string formatted for readability.
12+
13+
It accepts both programmatic and command-line usage with parametric return types.
14+
15+
---
16+
17+
## Function
18+
19+
```python
20+
get_structure(datasets, array_out=False, **kwargs)
21+
```
22+
23+
**Parameters:**
24+
25+
- `datasets` (`dict`, `str`, or `list[str]`): One or more datasets to inspect. Made for Rucio DIDs. If a dictionary is used, keys will be used as labels for each dataset in the output string.
26+
- `array_out` (`bool`): If True, empty awkward arrays are reconstructed from the structure information. The function will return a dictionary of ak.Array.type objects. This allows for programmatic access to the dataset structure which can be further manipulated.
27+
- `**kwargs`: Additional arguments forwarded to the helper function `print_structure_from_str`, such as `filter_branch` to apply a filter to displayed branches, `do_print` to print the output during the function call, or `save_to_txt` to save the output to `samples_structure.txt`.
28+
29+
**Returns:**
30+
- `str`: The formatted file structure string.
31+
- `None`: If `do_print` or `save_to_txt` is `True`, the function will print or save the output to a file.
32+
- `dict`: keys are sample names and values are `ak.Array.type` objects with the same dataset structure.
33+
34+
---
35+
36+
## Command-Line Usage
37+
38+
The function is also available as a CLI tool:
39+
40+
```bash
41+
$ servicex-get-structure "scope:dataset-rucio-id" --filter_branch "el_"
42+
```
43+
44+
This dumps to the shell a summary of the structure of the dataset, filtered to branches that contain `"el_"` in their names.
45+
46+
```bash
47+
$ servicex-get-structure "scope:dataset-rucio-id1" "scope:dataset-rucio-id2" --filter_branch "el_"
48+
```
49+
50+
This will output a combined summary of both datasets with the same filter.
51+
52+
---
53+
54+
### Practical Output Example
55+
56+
Command:
57+
58+
```bash
59+
$ servicex-get-structure user.mtost:user.mtost.all.Mar11 --filter-branch el_pt
60+
```
61+
62+
Output on shell:
63+
64+
```bash
65+
File structure of all samples with branch filter 'el_pt':
66+
67+
---------------------------
68+
📁 Sample: user.mtost:user.mtost.all.Mar11
69+
---------------------------
70+
71+
🌳 Tree: EventLoop_FileExecuted
72+
├── Branches:
73+
74+
🌳 Tree: EventLoop_JobStats
75+
├── Branches:
76+
77+
🌳 Tree: reco
78+
├── Branches:
79+
│ ├── el_pt_NOSYS ; dtype: AsJagged(AsDtype('>f4'), header_bytes=10)
80+
│ ├── el_pt_EG_RESOLUTION_ALL__1down ; dtype: AsJagged(AsDtype('>f4'), header_bytes=10)
81+
│ ├── el_pt_EG_RESOLUTION_ALL__1up ; dtype: AsJagged(AsDtype('>f4'), header_bytes=10)
82+
│ ├── el_pt_EG_SCALE_ALL__1down ; dtype: AsJagged(AsDtype('>f4'), header_bytes=10)
83+
│ ├── el_pt_EG_SCALE_ALL__1up ; dtype: AsJagged(AsDtype('>f4'), header_bytes=10)
84+
```
85+
86+
The output lists all trees and branch names matching the specified filter pattern for each requested dataset.
87+
It shows the branch data type information as interpreted by `uproot`. This includes the vector nesting level (jagged arrays) and the base type (e.g., f4 for 32-bit floats).
88+
89+
90+
#### JSON input
91+
92+
A json file can be used as input to simplify the command for multiple samples.
93+
94+
```bash
95+
$ servicex-get-structure "path/to/datasets.jsosn"
96+
```
97+
98+
With `datasets.json` containing:
99+
100+
```
101+
{
102+
"Signal": "mc23_13TeV:signal-dataset-rucio-id",
103+
"Background W+jets": "mc23_13TeV:background-dataset-rucio-id1",
104+
"Background Z+jets": "mc23_13TeV:background-dataset-rucio-id2",
105+
"Background Drell-Yan": "mc23_13TeV:background-dataset-rucio-id3",
106+
}
107+
```
108+
109+
---
110+
111+
## Programmatic Example
112+
113+
Similarly to the CLI functionality, the output string containing the dataset structure can be retrieved such as:
114+
115+
```python
116+
from servicex_analysis_utils import get_structure
117+
118+
# Retrieve structure of a specific dataset
119+
file_structure=get_structure("mc23_13TeV:some-dataset-rucio-id")
120+
```
121+
122+
### Other options
123+
124+
With `do_print` and `save_to_txt`, the dataset-structure string can instead be routed to std_out or to a text file in the running path.
125+
126+
```python
127+
from servicex_analysis_utils import get_structure
128+
129+
# Directly dump structure to std_out
130+
get_structure("mc23_13TeV:some-dataset-rucio-id", do_print=True)
131+
# Save to samples_summaty.txt
132+
get_structure("mc23_13TeV:some-dataset-rucio-id", save_to_txt=True)
133+
```
134+
135+
136+
#### Return awkward array type
137+
138+
139+
If `array_out` is set to `True` the function reconstructs dummy arrays with the correct structre and return their `Awkward.Array.type` object.
140+
141+
```python
142+
from servicex_analysis_utils import get_structure
143+
144+
DS = {"sample1": "user.mtost:user.mtost.all.Mar11"}
145+
ak_type = get_structure(DS, array_out=True)
146+
147+
rec = ak_type["sample1"].content #get RecordType
148+
149+
# Find index of reco tree and runNumber branch
150+
reco_idx = rec.fields.index("reco")
151+
branch_idx = rec.contents[reco_idx].fields.index("runNumber")
152+
153+
print("Type for branch 'runNumber':", rec.contents[reco_idx].contents[branch_idx])
154+
```
155+
Output:
156+
157+
```bash
158+
Type for branch 'runNumber': var * int64
159+
```
160+
161+
---
162+
163+
## Notes
164+
165+
- The function does not retrieve event data — only structure/metadata.
166+
- CLI output is printed directly to stdout but can be routed to a file with ` > structure_summary.txt`
167+
- Many types will show as None or unknown when they are not interpretable by the uproot or fail to be reconstructed to ak.arrays
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
# ServiceX Analysis Utilities
2+
3+
This package provides tools interacting with [ServiceX](https://github.com/ssl-hep/ServiceX_frontend), a data extraction, transformation and delivery system built for ATLAS and CMS analyses on large datasets. The Analysis Utils package offers helper functions that streamline the usage of ServiceX and simplify its integration on workflows. But it also contains specific use case tools that benefit from the service.
4+
5+
---
6+
7+
## Installation
8+
9+
Install the package from PyPI:
10+
11+
```bash
12+
pip install servicex-analysis-utils
13+
```
14+
More information can be found in [Instalation and requirements](installation.md)
15+
16+
## Documentation Contents
17+
18+
```{toctree}
19+
:maxdepth: 2
20+
:caption: Documentation Contents:
21+
22+
installation
23+
materialization
24+
file_introspecting
25+
```
26+
27+
---
28+
29+
## Utility Functions
30+
31+
### `to_awk()`
32+
Load an Awkward Array from ServiceX output easily.
33+
34+
See detailed usage here: [Materiazlization documentation](materialization.md)
35+
36+
### `get_structure()`
37+
Create and send ServiceX requests to retrieve file structures with a CLI implementation.
38+
39+
See detailed usage here: [File introspection documentation](file_introspecting.md)
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Installation
2+
3+
This section provides instructions for installing the ServiceX Analysis Utilities package.
4+
5+
## Prerequisites
6+
7+
Before installing, ensure the following requirements are satisfied:
8+
9+
- Python 3.9 or higher is installed.
10+
- `pip` is updated to the latest version (`pip install --upgrade pip`).
11+
- Access to a ServiceX endpoit is granted
12+
- A valid `servicex.yaml` configuration file is on your local machine.
13+
14+
For instructions on setting up ServiceX, refer to the [ServiceX Installation Guide](https://servicex-frontend.readthedocs.io/en/stable/connect_servicex.html).
15+
16+
## Installation from PyPI
17+
18+
The package is available on PyPI and can be installed via:
19+
20+
```bash
21+
pip install servicex-analysis-utils
22+
```
23+
24+
## Installation from Source
25+
26+
Alternatively, the package can be installed from the GitHub repository:
27+
28+
```bash
29+
git clone https://github.com/ssl-hep/ServiceX_analysis_utils.git
30+
cd ServiceX_analysis_utils
31+
pip install .
32+
```
33+
34+
## Verifying the Installation
35+
36+
After installation, you can verify that the package is accessible by running:
37+
38+
```bash
39+
python -c "import servicex_analysis_utils"
40+
```
41+
42+
No output indicates a successful installation.

0 commit comments

Comments
 (0)