Skip to content

Commit 3874cda

Browse files
authored
Quick Proofread (#472)
* Clarifications * Fix up bad servicex.yaml file * Add more robust funcadl query and clean up * Enhance documentation for FuncADL sequences and Select call usage * Clean up, filling things out a little bit. * Updates * Make it work for just the front end
1 parent cf19258 commit 3874cda

6 files changed

+93
-177
lines changed

docs/command_line.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,10 @@ Command Line Interface (Experimental)
22
======================================
33
*\*The command line interface is an under-development feature that is not supported in the 3.0.0 release*
44

5-
The command line interface (CLI) is a text-based interface used to interact with the system.
6-
When installed, the client provides a new command in your shell,
5+
The command line interface (CLI) is a text-based interface used to interact with the ServiceX backend.
6+
The client provides a new command in your shell,
77
``servicex``. This command uses a series of subcommands to work with
8-
various functions of serviceX.
8+
various functions of serviceX. It is installed automatically when you install the servicex frontend package.
99

1010
Common command line arguments:
1111

docs/connect_servicex.rst

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,9 @@ You need a `ServiceX endpoint <select-endpoint_>`_ where transformation is happe
55
a `client library <client-installation_>`_ to submit a transformation request.
66

77
.. _select-endpoint:
8+
89
Selecting an ServiceX endpoint
9-
----------------------
10+
------------------------------
1011

1112
ServiceX is a hosted service. Each ServiceX instance is deployed at the server
1213
and dedicated to a specific experiment. Depending on which experiment you work in,
@@ -49,33 +50,41 @@ downloaded to your computer.
4950

5051

5152
ServiceX Access File
52-
~~~~~~~~~~~~~
53+
~~~~~~~~~~~~~~~~~~~~
5354

5455
The client relies on a ``servicex.yaml`` file to obtain the URLs of different
5556
servicex deployments, as well as tokens to authenticate with the
56-
service. The format of this file is as follows:
57+
service.
58+
59+
The client library will search for this file in the current working directory
60+
and then start looking in parent directories and your home directory until a file
61+
is found.
62+
63+
The format of this file is as follows:
5764

5865
.. code:: yaml
59-
66+
67+
api_endpoints:
6068
- endpoint: https://servicex.af.uchicago.edu
6169
name: servicex-uc-af
6270
token: <YOUR TOKEN>
6371
6472
cache_path: /tmp/ServiceX_Client/cache-dir
6573
shortened_downloaded_filename: true
6674
75+
``cache_path`` and ``shortened_downloaded_filename`` are optional fields and default to
76+
reasonable values.
77+
6778
The cache database and downloaded files will be stored in the directory
6879
specified by ``cache_path``.
6980

7081
The ``shortened_downloaded_filename`` property controls whether
7182
downloaded files will have their names shortened for convenience.
72-
Setting to false preserves the full filename from the dataset. \`
73-
74-
The client library will search for this file in the current working directory
75-
and then start looking in parent directories until a file is found.
83+
Setting to false preserves the full filename from the dataset.
7684

7785

7886
.. _client-installation:
87+
7988
ServiceX Client Installation
8089
----------------------------
8190
ServiceX client Python package is a python library for users to communicate
@@ -87,7 +96,7 @@ Prerequisites
8796
~~~~~~~~~~~~~
8897

8998
- Python 3.8, or above
90-
- Access to ServiceX endpoint (Member of the ATLAS or CMS collaborations)
99+
- Access to ServiceX endpoint
91100

92101
Installation
93102
~~~~~~~~~~~~

docs/contribute.rst

Lines changed: 20 additions & 143 deletions
Original file line numberDiff line numberDiff line change
@@ -6,65 +6,52 @@ Welcome to the ServiceX contributor guide, and thank you for your interest in co
66
Overview
77
--------
88

9-
ServiceX uses a microservice architecture,
10-
and is designed to be hosted on a Kubernetes cluster.
11-
The ServiceX project uses a polyrepo strategy for source code management:
12-
the source code for each microservice is located in a dedicated repo.
9+
The ``servicex`` frontend code uses standard python packaging and open-source development methodologies. The code is hosted on GitHub,
10+
and we use the GitHub issue tracker to manage bugs and feature requests. We also use GitHub pull requests for code review and merging.
1311

14-
Below is a partial list of these repositories:
15-
16-
- `ServiceX <https://github.com/ssl-hep/ServiceX>`_ - Main repository, contains Helm charts for deployment to Kubernetes
1712
- `ServiceX_frontend <https://github.com/ssl-hep/ServiceX_frontend>`_ - The ServiceX Python library, which enables users to send requests to ServiceX. Currently, this is the only ServiceX frontend client.
18-
- `ServiceX_App <https://github.com/ssl-hep/ServiceX_App>`_ - The ServiceX API Server, written in Flask.
1913

20-
Additional repositories related to the project can be found in the `ssl-hep GitHub organization <https://github.com/ssl-hep>`_.
14+
Additional repositories related to the ServiceX project can be found in the `ssl-hep GitHub organization <https://github.com/ssl-hep>`_.
2115

22-
Please read our `architecture document <https://servicex.readthedocs.io/en/latest/development/architecture/>`_ for more details.
16+
Join us on Slack
17+
-----------------
18+
19+
We coordinate our efforts on the `IRIS-HEP Slack <http://iris-hep.slack.com>`_.
20+
Come join this intellectual hub!
21+
22+
Issues
23+
------
24+
25+
All development work on the code should start with an issue. Please submit issues for bugs and feature
26+
requests to the `repository <https://github.com/ssl-hep/ServiceX_frontend>`_.
2327

2428
Branching Strategy
2529
-------------------
2630

27-
ServiceX uses a slightly modified GitLab flow. Each repository has a main branch, usually named `develop` (or `master` for the Python frontend). All changes should be made on feature branches and submitted as PRs to the main branch. Releases are frozen on dedicated release branches, e.g. `v1.0.0-RC.2`.
31+
ServiceX uses a slightly modified GitLab flow. The `master` branch is used for releases, and
32+
all development work occurs on feature branches.
2833

2934
Development Workflow
3035
---------------------
3136

3237
1. Set up a local development environment:
33-
- Decide which microservice (or Helm chart) you'd like to change, and locate the corresponding repository.
34-
- If you are a not a member of the ``ssl-hep`` GitHub organization, fork the repository.
38+
- Fork the ``ServiceX_frontend``
3539
- Clone the (forked) repository to your local machine:
3640

37-
.. code-block:: bash
38-
39-
git clone git@github.com:<GitHub username>/ServiceX_App.git
40-
41-
- If you created a fork, add the upstream repository as remote:
42-
43-
.. code-block:: bash
44-
45-
git remote add upstream git@github.com:ssl-hep/ServiceX_App.git
46-
4741
- Set up a new environment via ``conda`` or ``virtualenv``.
4842
- Install dependencies, including test dependencies:
4943

5044
.. code-block:: bash
5145
52-
python3 -m pip install -e .[test]
53-
54-
- If the root directory contains a file named ``.pre-commit-config.yaml``, you can install the `pre-commit <https://pre-commit.com/>`_ hooks with:
55-
56-
.. code-block:: bash
57-
58-
pip install pre-commit
59-
pre-commit install
46+
python3 -m pip install -e .[develop]
6047
6148
2. Develop your contribution:
6249
- Pull latest changes from upstream:
6350

6451
.. code-block:: bash
6552
66-
git checkout develop
67-
git pull upstream develop
53+
git checkout master
54+
git pull upstream master
6855
6956
- Create a branch for the feature you want to work on:
7057

@@ -77,115 +64,5 @@ Development Workflow
7764
3. Test your changes:
7865
- Run the full test suite with ``python -m pytest``, or target specific test files with ``python -m pytest tests/path/to/file.py``.
7966
- Please write new unit tests to cover any changes you make.
80-
- You can also manually test microservice changes against a full ServiceX deployment by building the Docker image, pushing it to DockerHub, and setting the `image` and `tag` values as follows:
81-
82-
.. code-block:: yaml
83-
84-
app:
85-
image: <organization>/<image repository>
86-
tag: my-feature-branch
87-
88-
- For more details, please read our full `deployment guide <https://servicex.readthedocs.io/en/latest/deployment/basic>`_.
8967

9068
4. Submit a pull request to the upstream repository
91-
92-
93-
Issues
94-
------
95-
96-
Please submit issues for bugs and feature requests to the `main ServiceX repository <https://github.com/ssl-hep/ServiceX>`_, unless the issue is specific to a single microservice.
97-
98-
We manage project priorities with a `ZenHub board <https://app.zenhub.com/workspaces/servicex-5caba4288d0ceb76ea94ae1f/board?repos=180217333,180236972,185614791,182823774,202592339>`_.
99-
100-
Join us on Slack
101-
-----------------
102-
103-
We coordinate our efforts on the `IRIS-HEP Slack <http://iris-hep.slack.com>`_.
104-
Come join this intellectual hub!
105-
106-
Running the Full ServiceX Chart Locally
107-
----------------------------------------
108-
109-
You can run ServiceX on your laptop using ``docker`` or another similar tool that supports kubernetes.
110-
111-
Prerequisites
112-
--------------
113-
114-
1. ``docker`` is installed and ``kubernetes`` is running (see configuration options).
115-
2. Make sure ``kubectl`` and ``helm`` are both installed in the shell you'll be doing your development work.
116-
3. Follow instructions in the deployment guide to install your x509 certificate if you are going to be using any `rucio` or GRID services for your testing.
117-
118-
Running the chart
119-
------------------
120-
121-
122-
1. In the ``Servicex/helm`` directory run ``helm dependency update servicex/``
123-
2. And install the chart with ``helm install -f values.yaml servicex-testing .\servicex\``
124-
3. As in the deployment guide, you can now port-forward your servicex ``app`` and ``minio``.
125-
126-
How you write your ``values.yaml`` will depend a lot on what you are testing. Here is an example of a minimal one that will load up the `develop` tag for all the container images, and expects an ATLAS GRID cert:
127-
128-
.. code-block:: yaml
129-
130-
postgres:
131-
enabled: true
132-
objectStore:
133-
publicURL: localhost:9000
134-
135-
gridAccount: <your-user>
136-
137-
x509Secrets:
138-
# For ATLAS
139-
vomsOrg: atlas
140-
141-
app:
142-
ingress:
143-
host: localhost:5000
144-
145-
transformer:
146-
cachePrefix: '""'
147-
148-
149-
Making Changes
150-
---------------
151-
152-
153-
The best way to work on ServiceX is using the unit tests. That isn't always possible, of course. When it isn't your development cycle will require you to build any changed containers. A possible workflow is:
154-
155-
1. Redeploy the ``helm`` chart (or perhaps use ``upgrade`` rather than ``install`` in the ``helm`` command) and add ``pullPolicy: Never`` to the appropriate app section. For example, add it under ``app:`` in the example file above if you are working on ``servicex_app``.
156-
2. Change your code (say, in ``servicex_app``).
157-
3. In the directory for the app should be a ``Dockerfile``. Do the build, and pay attention to the tag. For example, ``docker build -t sslhep/servicex_app:develop .``.
158-
4. Finally restart the pod, which should cause it to pick up the new build. This might kill a port-forward you have in place, so don't forget to restart that!
159-
160-
Debugging Tips
161-
---------------
162-
163-
Microservice architectures can be difficult to test and debug. Here are some
164-
helpful hints to make this easier.
165-
166-
1. Instead of relying on the DID Finder to locate some particular datafile, you
167-
can mount one of your local directories into the transformer pod and then
168-
instruct the DID Finder to always offer up the path to that file regardless of
169-
the submitted DID. You can use the ``hostMount`` value to have a local directory
170-
mounted into each transformer pod under ``/data``. You can use the
171-
``didFinder.staticFile`` value to instruct DID Finder to offer up a file from that
172-
directory.
173-
2. You can use port-forwarding to expose port 15672 from the RabbitMQ pod to
174-
your laptop and log into the Rabbit admin console using the username: ``user`` and
175-
password ``leftfoot1``. From here you can monitor the queues, purge old messages
176-
and inject your own messages
177-
178-
Notes for Maintainers
179-
---------------------
180-
181-
Hotfixes
182-
--------
183-
184-
If a critical bugfix or hotfix must be applied to a previous release, it should be merged to the main branch and then applied to each affected release branch using
185-
186-
.. code-block:: bash
187-
188-
git cherry-pick <merge commit hash> -m 1
189-
190-
Merge commits have 2 parents, so the ``-m 1`` flag is used to specify that the first parent (i.e. previous commit on the main branch) should be used.
191-

docs/index.rst

Lines changed: 11 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ The High Luminosity Large Hadron Collider (HL-LHC) faces enormous computational
1111
structure due to high pileup conditions. The ATLAS and CMS experiments will record ~ 10 times as
1212
much data from ~ 100 times as many collisions as were used to discover the Higgs boson.
1313

14-
ServiceX is a scalable data extraction, transformation and delivery system deployed in a Kubernetes cluster.
14+
ServiceX is a scalable data extraction, transformation and delivery system deployed in a Kubernetes cluster
15+
designed to efficiently extract columnar data from large datasets.
1516

1617
.. image:: img/organize2.png
1718
:alt: organize
@@ -24,39 +25,39 @@ This section describes the concepts that are important to understand when workin
2425
Datasets
2526
^^^^^^^^^
2627
Datasets are groups of experimental data from which columnar data can be extracted. ServiceX
27-
supports four sources of of datasets:
28+
supports four sources of data:
29+
2830
1. Rucio
2931
2. CERN Open Data Portal
30-
3. File List
32+
3. List of File accessible via HTTP or XRootD
3133
4. EOS Directory
3234

3335
Queries
3436
^^^^^^^
3537
Queries are used to extract data from a dataset. They specify the columns to extract, the events to
3638
include in the output. There are several types of queries supported by ServiceX:
39+
3740
1. func-adl
3841
2. Python Function
3942
3. Dictionary of uproot selections
4043

41-
4244
Sample
4345
^^^^^^
44-
A sample is a request to extract columnar data from a given dataset, using a specific
45-
query. It results in a set of output files that can be used in an analysis.
46+
A sample is a request to extract columnar data from a specified dataset, using a specific
47+
query. It results in a set of output files containing the requested data that can be used
48+
in an analysis via `awkward`, `RDF`, etc..
4649

4750
Transformation Request
4851
^^^^^^^^^^^^^^^^^^^^^^
4952
Multiple samples can be submitted to ServiceX at the same time. Each sample is processed
5053
independently, and the results can be retrieved as files downloaded to a local directory or
51-
a list of URLs.
54+
directly accessed via a URL from ServiceX's output cache.
5255

5356
Local Cache
5457
^^^^^^^^^^^
55-
ServiceX maintains a local cache of the results of queries. This cache can be used to avoid
58+
ServiceX maintains a local cache of the performed queries and their results. This cache can be used to avoid
5659
re-running queries that have already been executed.
5760

58-
59-
6061
.. toctree::
6162
:maxdepth: 2
6263
:caption: Contents:

docs/query_types.rst

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ This table sumarizes the query types supported by ServiceX and the data formats
5353
A brief introduction to the query languages
5454
-------------------------------------------
5555

56-
* **FuncADL** is an Analysis Description Language inspired by functional languages. Sophisticated filtering and computation of new values can be expressed by chaining a series of simple functions. Because FuncADL is written independently of the underlying data libraries, it can run on many data formats.
56+
* **FuncADL** is an Analysis Description Language inspired by functional languages and C#'s LINQ. Sophisticated filtering and computation of new values can be expressed by chaining a series of simple functions. Because FuncADL is written independently of the underlying data libraries, it can run on many data formats.
5757

5858
* **Uproot-Raw** passes user requests to the ``.arrays()`` function in ``uproot``. In particular, the branches of the input ``TTrees`` can be filtered, cuts can be specified to select events, and additional expressions can be computed. Additional non-``TTree`` objects can be copied from the inputs to the outputs.
5959

@@ -107,15 +107,39 @@ Each dictionary either has a ``treename`` key (indicating that it is a query on
107107

108108
FuncADL Query Type
109109
------------------
110-
The FuncADL Query type is very powerful. It is based on functional programming concepts and allows
111-
the user to specify complex queries in a very compact form. The query is written in a functional
110+
FuncADL queries are based on functional programming concepts and allow
111+
the user to specify complex queries in a compact form. The query is written in a functional
112112
style, with a series of functions that are applied to the data in sequence. The query is written
113113
in a string or as typed python objects. Depending on the source file format, the query is translated
114114
into C++ `EventLoop <https://atlassoftwaredocs.web.cern.ch/analysis-software/AnalysisTools/el_intro/>`_
115115
code, or uproot python code.
116116

117-
Full documentation on the func-adl query language can be found at this `JupyterBook <https://gordonwatts.github.io/xaod_usage/intro.html>`_.
117+
An example that fetches the :math:`p_T, \eta` and EM fraction of jets from an ATLAS PHYSLITE file is as follows:
118+
119+
.. code-block:: python
120+
121+
from func_adl_servicex_xaodr22 import FuncADLQueryPHYSLITE, cpp_float
122+
123+
query = FuncADLQueryPHYSLITE()
124+
jets_per_event = query.Select(lambda e: e.Jets('AnalysisJets'))
125+
jet_info_per_event = jets_per_event.Select(
126+
lambda jets: {
127+
'pt': jets.Select(lambda j: j.pt()),
128+
'eta': jets.Select(lambda j: j.eta()),
129+
'emf': jets.Select(lambda j: j.getAttribute[cpp_float]('EMFrac')) # type: ignore
130+
}
131+
)
132+
133+
FuncADL is based on the concept of sequences. The events in a dataset are a sequence of events. The jets in an event are a sequence of jets.
134+
The ``Select`` call applies a function that transforms the input sequence, element-by-element, into an output sequence. In the above example,
135+
the first ``Select`` call is used to transform the sequence of events into a sequence of a sequence of jets (e.g. a sequence of jets representing
136+
the jets in an event - a 2D array, if you will). The lambda function passed to the ``Select`` call
137+
is applied to each event in the input sequence, and the result is a sequence of jets for each event.
138+
139+
The dictionary defines the columns of the output file (e.g. the leaves in a ``TTree``). In each case, the three ``lambda`` functions are applied
140+
to each jet, transforming the sequence of jets into a sequence of :math:`p_T` values, a sequence of :math:`\eta` values, and a sequence of EM fractions.
118141

142+
Full documentation on the func-adl query language can be found at this `JupyterBook <https://gordonwatts.github.io/xaod_usage/intro.html>`_.
119143

120144
Python Function Query Type
121145
--------------------------

0 commit comments

Comments
 (0)