@@ -11,51 +11,11 @@ The High Luminosity Large Hadron Collider (HL-LHC) faces enormous computational
11
11
structure due to high pileup conditions. The ATLAS and CMS experiments will record ~ 10 times as
12
12
much data from ~ 100 times as many collisions as were used to discover the Higgs boson.
13
13
14
-
15
- Columnar data delivery
16
- ----------------------
17
-
18
- ServiceX seeks to enable on-demand data delivery of columnar data in a variety of formats for
19
- physics analyses. It provides a uniform backend to data storage services, ensuring the user doesn't
20
- have to know how or where the data is stored, and is capable of on-the-fly data transformations
21
- into a variety of formats (ROOT files, Arrow arrays, Parquet files, ...) The service offers
22
- preprocessing functionality via an analysis description language called
23
- `func-adl <https://pypi.org/project/func-adl/ >`_ that allows users to filter events, request columns,
24
- and even compute new variables. This enables the user to start from any format and extract only the
25
- data needed for an analysis.
14
+ ServiceX is a scalable data extraction, transformation and delivery system deployed in a Kubernetes cluster.
26
15
27
16
.. image :: img/organize2.png
28
- :alt: Organization
29
-
30
- ServiceX is designed to feed columns to a user running an analysis (e.g. via
31
- `Awkward <https://github.com/scikit-hep/awkward-array >`_ or
32
- `Coffea <https://github.com/CoffeaTeam/coffea >`_ tools) based on the results of a query designed by
33
- the user.
34
-
35
- Connecting to ServiceX
36
- ----------------------
37
- ServiceX is a hosted service. Depending on which experiment you work in, there are different
38
- instances you can connect to. Some can be connected to from the outside world, while others are
39
- accessible only from a Jupyter notebook running inside the analysis facility.
40
-
41
- .. list-table ::
42
- :widths: 20 40 40
43
- :header-rows: 1
44
-
45
- * - Collaboration
46
- - Name
47
- - URL
48
- * - ATLAS
49
- - Chicago Analysis Facility
50
- - `<https://servicex.af.uchicago.edu/ >`_
51
- * - CMS
52
- - Coffea-Casa Nebraska
53
- - `<https://coffea.casa/hub >`_
54
- * - CMS
55
- - FNAL Elastic Analysis Facility
56
- - `<https://servicex.apps.okddev.fnal.gov >`_
57
-
58
- Follow the links to learn how to enable an account and launch a Jupyter notebook.
17
+ :alt: organize
18
+
59
19
60
20
Concepts
61
21
--------
@@ -95,91 +55,18 @@ Local Cache
95
55
ServiceX maintains a local cache of the results of queries. This cache can be used to avoid
96
56
re-running queries that have already been executed.
97
57
98
- Specify a Request
99
- -----------------
100
- Transform requests are specified with a General section, one or more Sample specifications, and
101
- optionally one or more definitions which are substituted into the Sample specifications.
102
-
103
- These requests can be defined as:
104
-
105
- 1. A YAML file
106
- 2. A Python dictionary
107
- 3. Typed python objects
108
-
109
- Regardless of how the request is specified, the request is submitted to ServiceX using the
110
- ``deliver `` function, which returns either a list of URLs or a list of local file paths.
111
-
112
- The General Section
113
- ^^^^^^^^^^^^^^^^^^^
114
- The General section of the request includes the following fields:
115
-
116
- * OutputFormat: Can be ``root-ttree `` or ``parquet ``
117
- * Delivery: Can be ``URLs `` or ``LocalCache ``
118
-
119
- The Sample Sections
120
- ^^^^^^^^^^^^^^^^^^^
121
- Each Sample section represents a single query to be executed. It includes the following fields:
122
-
123
- * Name: A title for this sample.
124
- * RucioDID: A Rucio Dataset Identifier
125
- * XRootDFiles: A list of files to be processed without using Rucio. You must use either ``RucioDID `` or ``XRootDFiles `` but not both.
126
- * NFiles: An optional limit on the number of files to process
127
- * Query: The query to be executed. This can be a func-adl query, a Python function, or a dictionary of uproot selections.
128
- * IgnoreLocalCache: If set to true, don't use a local cache for this sample and always submit to ServiceX
129
-
130
- The Definitions Sections
131
- ^^^^^^^^^^^^^^^^^^^^^^^^
132
- The Definitions section is a dictionary of values that can be substituted into fields in the Sample
133
- sections. This is useful for defining common values that are used in multiple samples.
134
-
135
-
136
- Configuration
137
- -------------
138
-
139
- The client relies on a YAML file to obtain the URLs of different
140
- servicex deployments, as well as tokens to authenticate with the
141
- service. The file should be named ``.servicex `` and the format of this
142
- file is as follows:
143
-
144
- .. code :: yaml
145
-
146
- api_endpoints :
147
- - endpoint : http://localhost:5000
148
- name : localhost
149
-
150
- - endpoint : https://servicex-release-testing-4.servicex.ssl-hep.org
151
- name : testing4
152
- token : ...
153
-
154
- default_endpoint : testing4
155
-
156
- cache_path : /tmp/ServiceX_Client/cache-dir
157
- shortened_downloaded_filename : true
158
-
159
- The ``default_endpoint `` will be used if otherwise not specified. The
160
- cache database and downloaded files will be stored in the directory
161
- specified by ``cache_path ``.
162
-
163
- The ``shortened_downloaded_filename `` property controls whether
164
- downloaded files will have their names shortened for convenience.
165
- Setting to false preserves the full filename from the dataset. \`
166
58
167
- The library will search for this file in the current working directory
168
- and then start looking in parent directories until a file is found.
169
59
170
60
.. toctree ::
171
61
:maxdepth: 2
172
62
:caption: Contents:
173
63
174
- installation
64
+ connect_servicex
175
65
query_types
66
+ transform_request
176
67
examples
177
- databinder
178
68
command_line
179
- getting_started
180
- transformer_matrix
181
69
contribute
182
- troubleshoot
183
70
about
184
71
modules
185
72
Github <https://github.com/ssl-hep/ServiceX_frontend >
0 commit comments