Skip to content

Commit fe5c66e

Browse files
committed
- documentation update
1 parent 6f21191 commit fe5c66e

File tree

10 files changed

+271
-76
lines changed

10 files changed

+271
-76
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ __*.*
77
# Folders
88
_obj
99
_test
10+
_build/
1011
_build/linux_amd64
1112
_build/windows_amd64
1213
_wd

blobporter.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ func init() {
4545
numberOfHandlersPerFileMsg = "Number of open handles for concurrent reads and writes per file."
4646
numberOfFilesInBatchMsg = "Maximum number of files in a transfer.\n\tIf the number is exceeded new transfers are created"
4747
readTokenExpMsg = "Expiration in minutes of the read-only access token that will be generated to read from S3 or Azure Blob sources."
48-
transferStatusFileMsg = "Transfer status file location. If set, blobporter will use this file to track the status of the transfer.\n\tIn case of failure and if the option is set the same status file, source files that were transferred will be skipped.\n\tIf the transfer is successful a summary will be created at then."
48+
transferStatusFileMsg = "Transfer status file location. If set, blobporter will use this file to track the status of the transfer.\n\tIn case of failure and if the option is set the same status file, source files that were transferred will be skipped.\n\tIf the transfer is successful a summary will be appended."
4949
)
5050

5151
flag.Usage = func() {

docs/bptransfer.png

46.3 KB
Loading

docs/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
# -- Project information -----------------------------------------------------
2121

2222
project = u'BlobPorter'
23-
#copyright = u'2018, Jesus Aguilar'
23+
copyright = u'2018, BlobPorter Contributors'
2424
author = u'BlobPorter Contributors'
2525

2626
# The short X.Y version
@@ -74,7 +74,7 @@
7474
# The theme to use for HTML and HTML Help pages. See the documentation for
7575
# a list of builtin themes.
7676
#
77-
html_theme = 'alabaster'
77+
html_theme = 'sphinx_rtd_theme'
7878

7979
# Theme options are theme-specific and customize the look and feel of a theme
8080
# further. For a list of options available for each theme, see the

docs/examples.rst

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
========
2+
Examples
3+
========
4+
5+
Upload to Azure Block Blob Storage
6+
-------------------------------------------
7+
8+
Single file upload:
9+
10+
``./blobporter -f /datadrive/myfile.tar -c mycontainer -n myfile.tar``
11+
12+
**Note:** If the container does not exist, it will be created.
13+
14+
Upload all files that match the pattern:
15+
16+
``./blobporter -f "/datadrive/*.tar" -c mycontainer``
17+
18+
You can also specify a list of files or patterns explicitly:
19+
20+
``./blobporter -f "/datadrive/*.tar" -f "/datadrive/readme.md" -f "/datadrive/log" -c mycontainer``
21+
22+
If you want to rename the target file name, you can use the -n option:
23+
24+
``./blobporter -f /datadrive/f1.tar -n newname.tar -c mycontainer``
25+
26+
Upload to Azure Page Blob Storage
27+
--------------------------------------
28+
29+
Same as uploading to block blob storage, but with the transfer definiton (-t option) set to ``file-pageblob``.
30+
31+
For example, a single file upload to page blob:
32+
33+
``./blobporter -f /datadrive/mydisk.vhd -c mycontainer -n mydisk.vhd -t file-pageblob``
34+
35+
**Note:** The file size and block size must be a multiple of 512 (bytes). The maximum block size is 4MB.
36+
37+
Transfer Data from an S3 endpoint to Azure Storage
38+
--------------------------------------------------
39+
40+
You can upload data from an S3 compatible endpoint.
41+
42+
First you must specify the access and secret keys via environment variables.
43+
44+
::
45+
46+
export S3_ACCESS_KEY=<YOUR ACCESS KEY>
47+
export S3_SECRET_KEY=<YOUR_SECRET_KEY>
48+
49+
Then you can specify an S3 URI, with the following format:
50+
51+
``[HOST]/[BUCKET]/[PREFIX]``
52+
53+
For example:
54+
55+
``./blobporter -f s3://mys3api.com/mybucket/mydata -c froms3 -t s3-blockblob``
56+
57+
**Note:** For better performance, consider running this tranfer from a high-bandwidth VM running in the same region as source or the target. Data is uploaded as it is downloaded from the source, therefore the transfer is bound to the bandwidth of the VM for performance.
58+
Synchronously Copy data between Azure Blob Storage targets and sources
59+
60+
61+
Transfer Data Between Azure Storage Accounts, Containers and Blob Types
62+
-----------------------------------------------------------------------
63+
64+
First, you must set the account key of the source storage account.
65+
66+
``export SOURCE_ACCOUNT_KEY=<YOUR KEY>``
67+
68+
Then you can specify the URI of the source. The source could be a page, block or append blob. Prefixes are supported.
69+
70+
``./blobporter -f "https://mysourceaccount.blob.core.windows.net/container/myblob" -c mycontainer -t blob-blockblob``
71+
72+
**Note:** For better performance, consider running this tranfer from a high-bandwidth VM running in the same region as source or the target. Data is uploaded as it is downloaded from the source, therefore the transfer is bound to the bandwidth of the VM for performance.
73+
Synchronously Copy data between Azure Blob Storage targets and sources
74+
75+
76+
Transfer from an HTTP/HTTPS source to Azure Blob Storage
77+
--------------------------------------------------------
78+
79+
To block blob storage:
80+
81+
``./blobporter -f "http://mysource/file.bam" -c mycontainer -n file.bam -t http-blockblob``
82+
83+
To page blob storage:
84+
85+
``./blobporter -f "http://mysource/my.vhd" -c mycontainer -n my.vhd -t http-pageblob``
86+
87+
**Note:** For better performance, consider running this tranfer from a high-bandwidth VM running in the same region as source or the target. Data is uploaded as it is downloaded from the source, therefore the transfer is bound to the bandwidth of the VM for performance.
88+
Synchronously Copy data between Azure Blob Storage targets and sources
89+
90+
Download from Azure Blob Storage
91+
--------------------------
92+
93+
For download scenarios, the source can be a page, append or block blob:
94+
95+
``./blobporter -c mycontainer -n file.bam -t blob-file``
96+
97+
You can use the -n option to specify a prefix. All blobs that match the prefix will be downloaded.
98+
99+
The following will download all blobs in the container that start with f:
100+
101+
``./blobporter -c mycontainer -n f -t blob-file``
102+
103+
Without the -n option all files in the container will be downloaded.
104+
105+
``./blobporter -c mycontainer -t blob-file``
106+
107+
By default files are downloaded keeping the same directory structure as the remote source.
108+
109+
If you want download to the same directory where you are running blobporter, set -i option.
110+
111+
``./blobporter -p -c mycontainer -t blob-file -i``
112+
113+
Download a file from a HTTP source
114+
----------------------------------
115+
116+
``./blobporter -f "http://mysource/file.bam" -n /datadrive/file.bam -t http-file``
117+
118+
**Note:** The ACCOUNT_NAME and ACCOUNT_KEY environment variables are not required in this scenario.

docs/gettingstarted.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
===============
2+
Getting Started
3+
===============
4+
5+
Linux
6+
-----
7+
8+
Download, extract and set permissions
9+
10+
::
11+
12+
wget -O bp_linux.tar.gz https://github.com/Azure/blobporter/releases/download/v0.6.09/bp_linux.tar.gz
13+
tar -xvf bp_linux.tar.gz linux_amd64/blobporter
14+
chmod +x ~/linux_amd64/blobporter
15+
cd ~/linux_amd64
16+
17+
Set environment variables: ::
18+
19+
export ACCOUNT_NAME=<STORAGE_ACCOUNT_NAME>
20+
export ACCOUNT_KEY=<STORAGE_ACCOUNT_KEY>
21+
22+
**Note:** You can also set these values via `options <options.html>`__
23+
24+
Windows
25+
-------
26+
27+
Download `BlobPorter.exe <https://github.com/Azure/blobporter/releases/download/v0.6.10/bp_windows.zip>`_
28+
29+
Set environment variables (if using the command prompt): ::
30+
31+
set ACCOUNT_NAME=<STORAGE_ACCOUNT_NAME>
32+
set ACCOUNT_KEY=<STORAGE_ACCOUNT_KEY>
33+
34+
Set environment variables (if using PowerShell): ::
35+
36+
$env:ACCOUNT_NAME="<STORAGE_ACCOUNT_NAME>"
37+
$env:ACCOUNT_KEY="<STORAGE_ACCOUNT_KEY>"
38+

docs/index.rst

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,19 @@
33
You can adapt this file completely to your liking, but it should at least
44
contain the root `toctree` directive.
55
6-
Welcome to BlobPorter's documentation!
6+
BlobPorter
77
======================================
88

9+
BlobPorter is a data transfer tool for Azure Blob Storage that maximizes throughput through concurrent reads and writes that can scale up and down independently.
10+
11+
.. image :: bptransfer.png
12+
913
.. toctree::
1014
:maxdepth: 2
1115
:caption: Contents:
1216

13-
14-
15-
Indices and tables
16-
==================
17-
18-
* :ref:`genindex`
19-
* :ref:`modindex`
20-
* :ref:`search`
17+
gettingstarted
18+
examples
19+
performance/perfmode
20+
resumable_transfers
21+
options

docs/options.rst

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
===============
2+
Command Options
3+
===============
4+
5+
-f, --source_file (string) URL, Azure Blob or S3 Endpoint, file or files (e.g. /data/\*.gz) to upload.
6+
-c, --container_name (string) Container name (e.g. mycontainer).
7+
-n, --blob_name (string) Blob name (e.g. myblob.txt) or prefix for download scenarios.
8+
-g, --concurrent_workers (int) Number of go-routines for parallel upload.
9+
-r, --concurrent_readers (int) Number of go-routines for parallel reading of the input.
10+
-b, --block_size (string) Desired size of each blob block.
11+
Can be specified as an integer byte count or integer suffixed with B, KB or MB (default /"4MB/", maximum /"100MB/").
12+
The block size could have a significant memory impact.
13+
If you are using large blocks reduce the number of readers and workers (-r and -g options) to reduce the memory pressure during the transfer.
14+
For files larger than 200GB, this parameter must be set to a value higher than 4MB.
15+
The minimum block size is defined by the following formula:
16+
Minimum Block Size = File Size / 50000
17+
The maximum block size is 100MB
18+
19+
-a, --account_name (string) Storage account name (e.g. mystorage).
20+
21+
Can also be specified via the ACCOUNT_NAME environment variable.
22+
23+
-k, --account_key (string) Storage account key string.
24+
25+
Can also be specified via the ACCOUNT_KEY environment variable.
26+
-s, --http_timeout (int) HTTP client timeout in seconds. Default value is 600s.
27+
-d, --dup_check_level (string) Desired level of effort to detect duplicate data blocks to minimize upload size.
28+
29+
Must be one of None, ZeroOnly, Full (default "None")
30+
-t, --transfer_type (string) Defines the source and target of the transfer.
31+
32+
Must be one of file-blockblob, file-pageblob, http-blockblob, http-pageblob, blob-file, pageblock-file (alias of blob-file), blockblob-file (alias of blob-file), http-file, blob-pageblob, blob-blockblob, s3-pageblob and s3-blockblob.
33+
-m, --compute_blockmd5 (bool) If set, block level MD5 has will be computed and included as a header when the block is sent to blob storage.
34+
35+
Default is false.
36+
-q, --quiet_mode (bool) If set, the progress indicator is not displayed.
37+
38+
The files to transfer, errors, warnings and transfer completion summary is still displayed.
39+
-x, --files_per_transfer (int) Number of files in a batch transfer. Default is 500.
40+
-h, --handles_per_file (int) Number of open handles for concurrent reads and writes per file. Default is 2.
41+
-i, --remove_directories (bool) If set blobs are downloaded or uploaded without keeping the directory structure of the source.
42+
43+
Not applicable when the source is a HTTP endpoint.
44+
-o, --read_token_exp (int) Expiration in minutes of the read-only access token that will be generated to read from S3 or Azure Blob sources.
45+
46+
Default value: 360.
47+
-l, --transfer_status (string) Transfer status file location.
48+
If set, blobporter will use this file to track the status of the transfer.
49+
50+
In case of failure and the same file is referrenced, the source files that were transferred will be skipped.
51+
52+
If the transfer is successful a summary will be appended.
Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,28 @@
1+
================
12
Performance Mode
2-
======================================
3+
================
4+
35
BlobPorter has a performance mode that uploads random data generated in memory and measures the performance of the operation without the impact of disk i/o.
4-
The performance mode for uploads could help you identify the potential upper limit of throughput that the network and the target storage account can provide.
6+
The performance mode for uploads could help you identify the potential upper limit of the data-transfer throughput that your environment can provide.
57

68
For example, the following command will upload 10 x 10GB files to a storage account.
79

8-
```
9-
blobporter -f "1GB:10" -c perft -t perf-blockblob
10-
```
10+
``blobporter -f "1GB:10" -c perft -t perf-blockblob``
1111

1212
You can also use this mode to see if increasing (or decreasing) the number of workers/writers (-g option) will have a potential impact.
1313

14-
```
15-
blobporter -f "1GB:10" -c perft -t perf-blockblob -g 20
16-
```
14+
``blobporter -f "1GB:10" -c perft -t perf-blockblob -g 20``
1715

1816
Similarly, for downloads, you can simulate downloading data from a storage account without writing to disk. This mode could also help you fine-tune the number of readers (-r option) and get an idea of the maximum download throughput.
1917

2018
The following command downloads the data previously uploaded.
2119

22-
```
23-
export SRC_ACCOUNT_KEY=$ACCOUNT_KEY
24-
blobporter -f "https://myaccount.blob.core.windows.net/perft" -t blob-perf
25-
```
20+
``export SRC_ACCOUNT_KEY=$ACCOUNT_KEY``
21+
22+
``blobporter -f "https://myaccount.blob.core.windows.net/perft" -t blob-perf``
2623

27-
Then you can download to disk.
24+
Then you can download the file to disk.
2825

29-
```
30-
blobporter -c perft -t blob-file
31-
```
26+
``blobporter -c perft -t blob-file``
3227

3328
The performance difference will you a measurement of the impact of disk i/o.

0 commit comments

Comments
 (0)