Skip to content

Commit 208de13

Browse files
p7novDifferentialOrangeandreyaksenov
authored
Add tt tdg2 import and export (#4171)
Resolves #4145 Co-authored-by: Georgy Moiseev <moiseev.georgii@gmail.com> Co-authored-by: Andrey Aksenov <38073144+andreyaksenov@users.noreply.github.com>
1 parent 33504e6 commit 208de13

File tree

4 files changed

+220
-15
lines changed

4 files changed

+220
-15
lines changed

doc/reference/tooling/tt_cli/commands.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,8 @@ help for the given command.
7070
- Get the current status of a Tarantool instance
7171
* - :doc:`stop <stop>`
7272
- Stop a Tarantool instance
73+
* - :doc:`tdg2 <tdg2>`
74+
- Interact with `Tarantool Data Grid 2 <https://www.tarantool.io/ru/tdg/latest/>`_ clusters
7375
* - :doc:`uninstall <uninstall>`
7476
- Uninstall Tarantool or ``tt``
7577
* - :doc:`version <version>`
@@ -107,5 +109,6 @@ help for the given command.
107109
start <start>
108110
status <status>
109111
stop <stop>
112+
tdg2 <tdg2>
110113
uninstall <uninstall>
111114
version <version>

doc/reference/tooling/tt_cli/export.rst

Lines changed: 91 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,17 @@ Exporting data
1111

1212
.. code-block:: console
1313
14-
$ tt [crud] export URI SPACE:FILE ... [EXPORT_OPTION ...]
14+
$ tt [crud|tdg2] export URI SPACE:FILE ... [EXPORT_OPTION ...]
1515
16-
``tt [crud] export`` exports a space's data to a file.
17-
The ``crud`` command is optional and can be used to export a cluster's data by using the `CRUD <https://github.com/tarantool/crud>`_ module. Without ``crud``, data is exported using the :ref:`box.space <box_space>` API.
16+
``tt [crud|tdg2] export`` exports a space's data to a file. Three export commands
17+
cover the following cases:
1818

19-
``tt [crud] export`` takes the following arguments:
19+
* ``tt export`` exports data from a replica set using the :ref:`box.space <box_space>` API.
20+
* ``tt crud export`` exports data from a sharded cluster through a router using the `CRUD <https://github.com/tarantool/crud>`_ module.
21+
* ``tt tdg2 export`` exports data from a `Tarantool Data Grid 2 <https://www.tarantool.io/ru/tdg/latest/>`_ cluster
22+
through its `connector <https://www.tarantool.io/ru/tdg/latest/architecture/#connector>`_ using `TDG2 Repository API <https://www.tarantool.io/en/tdg/latest/reference/sandbox/repository-api/#repository-api>`_.
23+
24+
``tt [crud|tdg2] export`` takes the following arguments:
2025

2126
* ``URI``: The URI of a router instance if ``crud`` is used. Otherwise, it should specify the URI of a storage.
2227
* ``FILE``: The name of a file for storing exported data.
@@ -26,6 +31,16 @@ The ``crud`` command is optional and can be used to export a cluster's data by u
2631

2732
:ref:`Read access <authentication-owners_privileges>` to the space is required to export its data.
2833

34+
.. _tt-export-output-format:
35+
36+
Output format
37+
-------------
38+
39+
``tt export`` exports data in the following formats:
40+
41+
* ``tt export`` and ``tt crud export``: CSV
42+
* ``tt tdg2 export``: JSON lines
43+
2944
.. _tt-export-limitations:
3045

3146
Limitations
@@ -66,7 +81,7 @@ If a tuple contains a ``null`` value, for example, ``[1, 477, 'Andrew', null, 38
6681
Exporting headers
6782
-----------------
6883

69-
To export data with a space's field names in the first row, use the ``--header`` option:
84+
To export data with a space's field names in the first row of the CSV file, use the ``--header`` option:
7085

7186
.. code-block:: console
7287
@@ -89,15 +104,69 @@ In this case, field values start from the second row, for example:
89104
Exporting compound data
90105
-----------------------
91106

92-
By default, ``tt`` exports empty values for fields containing compound data such as arrays or maps.
107+
In the CSV format, ``tt`` exports empty values by default for fields containing compound data such as arrays or maps.
93108
To export compound values in a specific format, use the ``--compound-value-format`` option.
94-
For example, the command below exports compound values serialized in JSON:
109+
For example, the command below exports compound values to CSV serialized in JSON:
95110

96111
.. code-block:: console
97112
98113
$ tt crud export localhost:3301 customers:customers.csv \
99114
--compound-value-format json
100115
116+
.. _tt-export-tdg2:
117+
118+
Exporting from Tarantool Data Grid 2
119+
------------------------------------
120+
121+
.. note::
122+
123+
In the TDG2 data model, a **type** represents a Tarantool space, and an **object**
124+
of a type represents a tuple in the type's underlying space.
125+
126+
The command below exports data of the ``customers`` type from a TDG2 cluster to
127+
the ``customers.jsonl`` file:
128+
129+
.. code-block:: console
130+
131+
$ tt tdg2 export localhost:3301 customers:customers.jsonl
132+
133+
If the ``customers`` type has four fields (``id``, ``firstname``, ``lastname``, and ``age``), the file with exported data might look like this:
134+
135+
.. code-block:: json
136+
137+
{"age":30,"first_name":"Samantha","id":1,"second_name":"Carter"}
138+
{"age":41,"first_name":"Fay","id":2,"second_name":"Rivers"}
139+
{"age":74,"first_name":"Milo","id":4,"second_name":"Walters"}
140+
141+
If an object contains a ``null`` value in a field, this field skipped:
142+
143+
.. code-block:: json
144+
145+
{"age":13,"first_name":"Zachariah","id":3}
146+
147+
Object fields that contain maps with non-string keys are converted to maps with string keys.
148+
149+
TDG2 sets a limit on the number of objects transferred from each storage during a query execution
150+
(the `hard-limits.returned <https://www.tarantool.io/en/tdg/latest/reference/config/config_logic/#hard-limits>`_
151+
TDG2 configuration parameter). If an export batch size (``--batch-size`` parameter)
152+
is greater than this limit, it is possible that more than ``hard-limits.returned`` objects
153+
will be requested from one storage and export will fail.
154+
To make sure that ``hard-limits.returned`` is never exceeded during an export operation,
155+
set the export batch size less or equal to this limit.
156+
157+
For example, if your TDG2 cluster has a 1000 objects ``hard-limits.returned`` limit:
158+
159+
.. code-block:: yaml
160+
161+
# tdg2 config.yaml
162+
# ...
163+
hard-limits.returned: 1000
164+
165+
Set the ``tt tdg2 export`` batch size less or equal to 1000:
166+
167+
.. code-block:: console
168+
169+
$ tt tdg2 export localhost:3301 customers:customers.jsonl --batch-size=1000
101170
102171
.. _tt-export-options:
103172

@@ -118,10 +187,20 @@ Options
118187

119188
.. option:: --batch-size INT
120189

121-
The number of tuples to transfer per request (the default is ``10000``).
190+
The number of tuples to transfer per request. The default is:
191+
192+
* ``10000`` for ``tt export`` and ``tt crud export``.
193+
* ``100`` for ``tt tdg2 export``.
194+
195+
.. important::
196+
197+
When using ``tt tdg2 export``, make sure that the batch size does not exceed
198+
the ``hard-limits.returned`` TDG2 parameter value set on the cluster.
122199

123200
.. option:: --compound-value-format STRING
124201

202+
**Applicable to:** ``tt export``, ``tt crud export``
203+
125204
A format used to export compound values like arrays or maps.
126205
By default, ``tt`` exports empty values for fields containing such values.
127206

@@ -131,6 +210,8 @@ Options
131210

132211
.. option:: --header
133212

213+
**Applicable to:** ``tt export``, ``tt crud export``
214+
134215
Add field names in the first row.
135216

136217
See also: :ref:`Exporting headers <tt-export-header>`.
@@ -141,6 +222,8 @@ Options
141222

142223
.. option:: --readview
143224

225+
**Applicable to:** ``tt export``, ``tt crud export``
226+
144227
Export data using a :ref:`read view <read_views>`.
145228

146229
.. option:: --username STRING

doc/reference/tooling/tt_cli/import.rst

Lines changed: 107 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,19 @@ Importing data
1111

1212
.. code-block:: console
1313
14-
$ tt [crud] import URI FILE:SPACE [IMPORT_OPTION ...]
14+
$ tt [crud|tdg2] import URI FILE:SPACE [IMPORT_OPTION ...]
1515
# or
16-
$ tt [crud] import URI :SPACE < FILE [IMPORT_OPTION ...]
16+
$ tt [crud|tdg2] import URI :SPACE < FILE [IMPORT_OPTION ...]
1717
18-
``tt [crud] import`` imports data from a file to a space.
19-
The ``crud`` command is optional and can be used to import data to a cluster by using the `CRUD <https://github.com/tarantool/crud>`_ module. Without ``crud``, data is imported using the :ref:`box.space <box_space>` API.
18+
``tt [crud|tdg] import`` imports data from a file to a space. Three import commands
19+
cover the following cases:
2020

21-
This command takes the following arguments:
21+
* ``tt import`` imports data into a replica set through its master instance using the :ref:`box.space <box_space>` API.
22+
* ``tt crud import`` imports data into a sharded cluster through a router using the `CRUD <https://github.com/tarantool/crud>`_ module.
23+
* ``tt tdg2 import`` imports data into a `Tarantool Data Grid 2 <https://www.tarantool.io/ru/tdg/latest/>`_ cluster
24+
through its router using the ``repository.put`` function of the `TDG2 Repository API <https://www.tarantool.io/en/tdg/latest/reference/sandbox/repository-api/#repository-api>`_.
25+
26+
``tt [crud|tdg2] import`` takes the following arguments:
2227

2328
* ``URI``: The URI of a router instance if ``crud`` is used. Otherwise, it should specify the URI of a storage.
2429
* ``FILE``: The name of a file containing data to be imported.
@@ -28,6 +33,15 @@ This command takes the following arguments:
2833

2934
:ref:`Write access <authentication-owners_privileges>` to the space and `execute` access to `universe` are required to import data.
3035

36+
.. _tt-import-format:
37+
38+
Input file format
39+
-----------------
40+
41+
``tt import`` imports data from the following formats:
42+
43+
* ``tt import`` and ``tt crud import``: CSV
44+
* ``tt tdg2 import``: JSON lines
3145

3246
.. _tt-import-limitations:
3347

@@ -134,14 +148,77 @@ To skip rows whose data cannot be parsed correctly, use the ``--on-error`` optio
134148
$ tt crud import localhost:3301 customers.csv:customers \
135149
--on-error skip
136150
151+
.. _tt-import-tdg2:
152+
153+
Importing into Tarantool Data Grid 2
154+
------------------------------------
155+
156+
.. note::
157+
158+
In the TDG2 data model, a **type** represents a Tarantool space, and an **object**
159+
of a type represents a tuple in the type's underlying space.
160+
161+
The command below imports objects of the ``customers`` type into a TDG2 cluster.
162+
The objects are described in the ``customers.jsonl`` file.
163+
164+
.. code-block:: console
165+
166+
$ tt tdg2 import localhost:3301 customers.jsonl:customers
167+
168+
The input file can look like this:
169+
170+
.. code-block:: json
171+
172+
{"age":30,"first_name":"Samantha","id":1,"second_name":"Carter"}
173+
{"age":41,"first_name":"Fay","id":2,"second_name":"Rivers"}
174+
{"age":74,"first_name":"Milo","id":4,"second_name":"Walters"}
175+
176+
.. note::
177+
178+
Since JSON describes objects in maps with string keys, there is no way to
179+
import a field value that is a map with a non-string key.
180+
181+
In case of an error during TDG2 import, ``tt tdg2 import`` rolls back the changes made
182+
*within the current batch* on the *storage where the error has happened* (per-storage rollback)
183+
and reports an error. On other storages, objects from the same batch can be successfully
184+
imported. So, the rollback process of ``tt tdg2 import``
185+
is the same as the one of ``tt crud import`` with the ``--rollback-on-error`` option.
186+
187+
Since object batches can be imported partially (per-storage rollback), the absence
188+
of error matching complicates the debugging in case of errors. To minimize this
189+
effect, the default batch size (``--batch-size``) for ``tt tdg2 import`` is 1.
190+
This makes the debugging straightforward: you always know which object caused the error.
191+
On the other hand, this decreases the performance in comparison to import in larger batches.
192+
193+
If you increase the batch size, ``tt`` informs you about the possible issues and
194+
asks for an explicit confirmation to proceed.
195+
To automatically confirm a batch import operation, add the ``--force`` option:
196+
197+
.. code-block:: console
198+
199+
$ tt tdg2 import localhost:3301 customers.jsonl:customers \
200+
--batch-size=100 \
201+
--force
202+
137203
138204
.. _tt-import-options:
139205

140206
Options
141207
-------
142208

209+
.. option:: --batch-size INT
210+
211+
**Applicable to:** ``tt crud import``, ``tt tdg2 import``
212+
213+
The number of tuples to transfer per request. The default is:
214+
215+
* ``100`` for ``tt crud import``.
216+
* ``1`` for ``tt tdg2 import``. See :ref:`tt-import-tdg2` for details.
217+
143218
.. option:: --dec-sep STRING
144219

220+
**Applicable to:** ``tt import``, ``tt crud import``
221+
145222
The string of symbols that defines decimal separators for numeric data (the default is ``.,``).
146223

147224
.. NOTE::
@@ -150,6 +227,8 @@ Options
150227

151228
.. option:: --delimiter STRING
152229

230+
**Applicable to:** ``tt import``, ``tt crud import``
231+
153232
A symbol that defines a field value delimiter.
154233
For CSV, the default delimiter is a comma (``,``).
155234
To use a tab character as a delimiter, set this value as ``tab``:
@@ -169,6 +248,12 @@ Options
169248

170249
See also: :ref:`Handling parsing errors <tt-import-parsing-error>`.
171250

251+
.. option:: --force
252+
253+
**Applicable to:** ``tt tdg2 import``
254+
255+
Automatically confirm importing into TDG2 with ``--batch-size`` greater than one.
256+
172257
.. option:: --format STRING
173258

174259
A format of input data.
@@ -177,6 +262,8 @@ Options
177262

178263
.. option:: --header
179264

265+
**Applicable to:** ``tt import``, ``tt crud import``
266+
180267
Process the first line as a header containing field names.
181268
In this case, field values start from the second line.
182269

@@ -189,12 +276,16 @@ Options
189276

190277
.. option:: --match STRING
191278

279+
**Applicable to:** ``tt import``, ``tt crud import``
280+
192281
Configure matching between field names in the input file and the target space.
193282

194283
See also: :ref:`Matching of input and space fields <tt-import-match-fields>`.
195284

196285
.. option:: --null STRING
197286

287+
**Applicable to:** ``tt import``, ``tt crud import``
288+
198289
A value to be interpreted as ``null`` when importing data.
199290
By default, an empty value is interpreted as ``null``.
200291
For example, a tuple imported from the following row ...
@@ -253,6 +344,8 @@ Options
253344

254345
.. option:: --quote STRING
255346

347+
**Applicable to:** ``tt import``, ``tt crud import``
348+
256349
A symbol that defines a quote.
257350
For CSV, double quotes are used by default (``"``).
258351
The double symbol of this option acts as the escaping symbol within input data.
@@ -264,6 +357,8 @@ Options
264357

265358
.. option:: --th-sep STRING
266359

360+
**Applicable to:** ``tt import``, ``tt crud import``
361+
267362
The string of symbols that define thousand separators for numeric data.
268363
The default value includes a space and a backtick `````.
269364
This means that ``1 000 000`` and ``1`000`000`` are both imported as ``1000000``.
@@ -278,6 +373,11 @@ Options
278373

279374
.. option:: --rollback-on-error
280375

281-
Applicable only when ``crud`` is used.
376+
**Applicable to:** ``tt crud import``
377+
378+
Specify whether any operation failed on a storage leads to rollback of a batch
379+
import on this storage.
380+
381+
.. note::
282382

283-
Specify whether any operation failed on a router leads to rollback on a storage where the operation is failed.
383+
``tt tdg2 import`` always works as if ``--rollback-on-error`` is ``true``.

doc/reference/tooling/tt_cli/tdg2.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
.. _tt-tdg2:
2+
3+
Interacting with the Tarantool Data Grid 2
4+
==========================================
5+
6+
.. admonition:: Enterprise Edition
7+
:class: fact
8+
9+
This command is supported by the `Enterprise Edition <https://www.tarantool.io/compare/>`_ only.
10+
11+
.. code-block:: console
12+
13+
$ tt tdg2 COMMAND [COMMAND_OPTION ...]
14+
15+
``tt tdg2`` enables the interaction with `Tarantool Data Grid 2 <https://www.tarantool.io/ru/tdg/latest/>`_ clusters.
16+
``COMMAND`` is one of the following:
17+
18+
* ``export``: export a TDG2 cluster's data to a file. Learn more at :ref:`Exporting data <tt-export>`.
19+
* ``import``: import data to a TDG2 cluster from a file. Learn more at :ref:`Importing data <tt-import>`.

0 commit comments

Comments
 (0)