Skip to content

v3.5 Copy Mode

Andrey Kurilov edited this page Aug 29, 2017 · 8 revisions

Overview

Sometimes it's very useful to perform a copy operation on the multiple files instead of write one. The performance rates may be significantly different for the copy and write operations. Some cloud storage APIs also support copying the objects (S3 and Swift) so the functionality may be a general. In case of S3 and Swift there's no payload sent while copying the objects so these requests may be significantly faster than writing new objects.

Limitations

  • Works only if the input path is different from the output one
  • Copying the containers (Swift) is not supported
  • Copying the buckets (S3) is not supported
  • Copying the objects (Atmos) while using /rest/object interface is not supported yet

Approach

General

Copy mode is enabled if:

  • "--load-type" option is set to "create" (this is by default) and
  • one of the item inputs is configured:
    • "--item-input-path" is set to existing bucket/container/directory or
    • "--item-input-file" is set to existing items list file

Filesystem Storage Case

In case of copying the filesystem directories there's a size may be calculated, so there's a size and bandwidth (MB/sec) metrics are available.

HTTP Storage Case

Note: Cloud Storage object copy requests don't contain any payload so the byte count related metrics are not calculated (remain zero).

S3 Objects Copying

The source object path is specified with "x-amz-copy-source" header.

Existing Items Concatenation

The configuration option item-data-ranges-concat enables the existing items concatenation mode which may be considered as a Copy Mode extension. The option above specifies the source items count number range to concatenate into the single destination item. If the option value is null, the concatenation mode is disabled. Mongoose expects the operation type of CREATE and configured items input if the concatenation mode is enabled. The source items may be supplied using the items-input-file either items-input-path options. The supplied source items will be loaded into the buffer on the step's initialization stage and the configured items input will be replaced by new items input (generate new destination item ids). For each new destination item the subset of the source items from the loaded items buffer is selected. That new destination item is considered to be concatenated using this selected items. Both item-data-ranges-fixed and item-data-ranges-random options are supported for the selected source items. If any of this two is configured the destination item is concatenated using the source item data ranges (fixed either random). The entire source items data is used to concatenate otherwise.

Example
  1. Prepare the set of 100 source items on the storage:

     java -jar mongoose-<VER>/mongoose.jar \
         --item-data-size=10MB \
         --item-output-file=srcItemsToConcat.csv \
         --item-output-path=/bucket1 \
         --storage-auth-uid=user1 \
         --storage-auth-secret=**************************** \
         --storage-driver-type=emcs3 \
         --storage-net-node-addrs=datanode1,datanode2,datanode3,datanode4 \
         --test-step-limit-count=10 \
  2. Reuse these items to concatenate 1000 new items using the source items fixed byte ranges, 10 source items are used to concatenate each destination item:

     java -jar mongoose-<VER>/mongoose.jar \
         --item-data-ranges-concat=10-10 \
         --item-data-ranges-fixed=100-200,300-400,500-
         --item-input-file=srcItemsToConcat.csv \
         --item-output-path=/bucket1 \
         --storage-auth-uid=user1 \
         --storage-auth-secret=**************************** \
         --storage-driver-type=emcs3 \
         --storage-net-node-addrs=datanode1,datanode2,datanode3,datanode4
         --test-step-limit-count=1000 \
Limitations
  1. Only ems-s3 storage driver type supports the feature currently.

  2. Only CREATE load type is supported.

  3. Valid items input should be configured (file either bucket/container/directory listing).

  4. The count of the items loaded from the configured items input should be not more than 1 million.

  5. The range of the source items count to select (item-data-ranges-concat value) should fit the count of the source items loaded from the configured items input.

Swift Objects Copying

There are two variants of object copy requests:

  • Using HTTP method "COPY" and "Destination" header.
  • Using HTTP method "PUT" and "X-Copy-From" header.

The 2nd variant is preferred ss far as COPY HTTP method is not standard.

The source object URI is specified with "X-Copy-From" header.

Configuration

In order to perform a copy load step it's necessary:

  • Use "create" load type.
  • Specify "--item-input-path" (the source container/bucket/directory, contains the items to copy) or "--item-input-file" to a proper value.
  • Specify "--item-output-path" (the target container/bucket/directory) to a proper value.

For details, see the example scenarios located at: scenarios/copy/*.json.

Clone this wiki locally