You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
*[Transfer data from S3 to Azure Storage](http://blobporter.readthedocs.io/en/latest/examples.html#transfer-data-from-s3-to-azure-storage)
50
42
51
-
Set environment variables (if using the command prompt):
43
+
*[Transfer data between Azure Storage accounts, containers and blob types](http://blobporter.readthedocs.io/en/latest/examples.html#transfer-data-between-azure-storage-accounts-containers-and-blob-types)
52
44
53
-
```batchfile
54
-
set ACCOUNT_NAME=<STORAGE_ACCOUNT_NAME>
55
-
set ACCOUNT_KEY=<STORAGE_ACCOUNT_KEY>
56
-
```
45
+
*[Transfer from an HTTP/HTTPS source to Azure Blob Storage](http://blobporter.readthedocs.io/en/latest/examples.html#transfer-from-an-http-https-source-to-azure-blob-storage)
57
46
58
-
Set environment variables (if using PowerShell):
47
+
*[Download from Azure Blob Storage](http://blobporter.readthedocs.io/en/latest/examples.html#download-from-azure-blob-storage)
59
48
60
-
```PowerShell
61
-
$env:ACCOUNT_NAME="<STORAGE_ACCOUNT_NAME>"
62
-
$env:ACCOUNT_KEY="<STORAGE_ACCOUNT_KEY>"
63
-
```
49
+
*[Download a file from a HTTP source](http://blobporter.readthedocs.io/en/latest/examples.html#download-a-file-from-a-http-source)
>Note: For better performance, consider running this tranfer from a VM running in the same region as source or the target. Data is uploaded as it is downloaded from the source, therefore the transfer is bound to the bandwidth of the VM for performance.
115
-
116
-
### Synchronously Copy data between Azure Blob Storage targets and sources
117
-
118
-
You can synchronously transfer data between Azure Storage accounts, containers and blob types.
119
-
120
-
First, you must set the account key of the source storage account.
121
-
122
-
```bash
123
-
export SOURCE_ACCOUNT_KEY=<YOUR KEY>
124
-
```
125
-
126
-
Then you can specify the URI of the source. Prefixes are supported.
>Note: For better performance, consider running this tranfer from a VM running in the same region as source or the target. Data is uploaded as it is downloaded from the source, therefore the transfer is bound to the bandwidth of the VM for performance.
131
-
132
-
### Upload from an HTTP/HTTPS source to Azure Blob Storage
>Note: For better performance, consider running this tranfer from a VM running in the same region as source or the target. Data is uploaded as it is downloaded from the source, therefore the transfer is bound to the bandwidth of the VM for performance.
143
-
144
-
### Download from Azure Blob Storage
145
-
146
-
From blob storage to a local file, the source can be a page or block blob:
You can use the -n option to specify a prefix. All blobs that match the prefix will be downloaded.
151
-
The following will download all blobs in the container that start with `f`
152
-
153
-
`./blobporter -c mycontainer -n f -t blob-file`
154
-
155
-
Without the -n option all files in the container will be downloaded.
156
-
157
-
`./blobporter -c mycontainer -t blob-file`
158
-
159
-
By default files are downloaded to the same directory where you are running blobporter. If you want to keep the same directory structure of the storage account use the -p option.
>Note: The ACCOUNT_NAME and ACCOUNT_KEY environment variables are not required in this scenario.
168
-
169
-
## Command Options
170
-
171
-
-`-f`, `--source_file`*string* URL, Azure Blob or S3 Endpoint, file or files (e.g. /data/*.gz) to upload.
172
-
173
-
-`-c`, `--container_name`*string* container name (e.g. `mycontainer`).
174
-
175
-
-`-n`, `--blob_name`*string* blob name (e.g. myblob.txt) or prefix for download scenarios.
176
-
177
-
-`-g`, `--concurrent_workers`*int* number of routines for parallel upload.
178
-
179
-
-`-r`, `--concurrent_readers`*int* number of routines for parallel reading of the input.
180
-
181
-
-`-b`, `--block_size`*string* desired size of each blob block. Can be specified as an integer byte count or integer suffixed with B, KB or MB (default "4MB", maximum "100MB").
182
-
183
-
- The block size could have a significant memory impact. If you are using large blocks reduce the number of readers and workers (-r and -g options) to reduce the memory pressure during the transfer.
184
-
185
-
- For files larger than 200GB, this parameter must be set to a value higher than 4MB. The minimum block size is defined by the following formula:
186
-
187
-
-`Minimum Block Size = File Size / 50000`
188
-
189
-
- The maximum block size is 100MB
190
-
191
-
-`-a`, `--account_name`*string* storage account name (e.g. mystorage). Can also be specified via the ACCOUNT_NAME environment variable.
192
-
193
-
-`-k`, `--account_key`*string* storage account key string (e.g. `4Rr8CpUM9Y/3k/SqGSr/oZcLo3zNU6aIo32NVzda4EJj0hjS2Jp7NVLAD3sFp7C67z/i7Rfbrpu5VHgcmOShTg==`). Can also be specified via the ACCOUNT_KEY environment variable.
194
-
195
-
-`-s`, `--http_timeout`*int* HTTP client timeout in seconds. Default value is 600s.
196
-
197
-
-`-d`, `--dup_check_level`*string* desired level of effort to detect duplicate data blocks to minimize upload size. Must be one of None, ZeroOnly, Full (default "None")
198
-
199
-
-`-t`, `--transfer_type`*string* defines the source and target of the transfer. Must be one of file-blockblob, file-pageblob, http-blockblob, http-pageblob, blob-file, pageblock-file (alias of blob-file), blockblob-file (alias of blob-file), http-file, blob-pageblob, blob-blockblob, s3-pageblob and s3-blockblob.
200
-
201
-
-`m`, `--compute_blockmd5`*bool* if present or true, block level MD5 has will be computed and included as a header when the block is sent to blob storage. Default is false.
202
-
203
-
-`q`, `--quiet_mode`*bool* if present or true, the progress indicator is not displayed. The files to transfer, errors, warnings and transfer completion summary is still displayed.
204
-
205
-
-`x`, `--files_per_transfer`*int* number of files in a batch transfer. Default is 500.
206
-
207
-
-`h`, `--handles_per_file`*int* number of open handles for concurrent reads and writes per file. Default is 2.
208
-
209
-
-`i`, `--remove_directories`*bool* if set blobs are downloaded or uploaded without keeping the directory structure of the source. Not applicable when the source is a HTTP endpoint.
210
-
211
-
-`o`, `--read_token_exp`*int* Expiration in minutes of the read-only access token that will be generated to read from S3 or Azure Blob sources. Default value: 360.
212
-
213
-
## Performance Considerations
214
-
215
-
By default, BlobPorter creates 5 readers and 8 workers for each core on the computer. You can overwrite these values by using the options -r (number of readers) and -g (number of workers). When overriding these options there are few considerations:
216
-
217
-
- If during the transfer the buffer level is constant at 000%, workers could be waiting for data. Consider increasing the number of readers. If the level is 100% the opposite applies; increasing the number of workers could help.
218
-
219
-
- In BlobPorter, each reader or worker correlates to one goroutine. Goroutines are lightweight and a Go program can create a high number of goroutines, however, there's a point where the overhead of context switching impacts overall performance. Increase these values in small increments, e.g. 5.
220
-
221
-
- For transfers from fast disks (SSD) or HTTP sources reducing the number readers or workers could provide better performance than the default values. Reduce these values if you want to minimize resource utilization. Lowering these numbers reduces contention and the likelihood of experiencing throttling conditions.
222
-
223
-
- Transfers can be batched. Each batch transfer will concurrently read and transfer up to 500 files (default value) from the source. The batch size can be modified using the -x option.
224
-
225
-
- Blobs smaller than the block size are transferred in a single operation. With relatively small files (<32MB) performance may be better if you set a block size equal to the size of the files. Setting the number of workers and readers to the number of files could yield performance gains.
Copy file name to clipboardExpand all lines: docs/index.rst
+1-4Lines changed: 1 addition & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -8,10 +8,7 @@ BlobPorter
8
8
9
9
BlobPorter is a data transfer tool for Azure Blob Storage that maximizes throughput through concurrent reads and writes that can scale up and down independently.
Copy file name to clipboardExpand all lines: docs/resumabletrans.rst
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ The log entries are created with the following tab-delimited format:
18
18
19
19
20
20
The following output from a transfer status file shows that three files were included in the transfer: **file10** , **file11** and **file15** .
21
-
However, only **file10** and **file11** were successfully transferred. For **file15** the output indicates that it was queued but the lack of a second entry confirming completion (status = 2), indicates that the transfer process was interrupted. ::
21
+
However, only **file10** and **file11** were successfully transferred. For **file15** the output indicates that it was queued but there's no second entry confirming that it was transferred successfully (status = 2). ::
0 commit comments