Skip to content

Commit 088cf6d

Browse files
Update README.md
1 parent ec9f51c commit 088cf6d

File tree

1 file changed

+141
-144
lines changed

1 file changed

+141
-144
lines changed

README.md

Lines changed: 141 additions & 144 deletions
Original file line numberDiff line numberDiff line change
@@ -1,82 +1,96 @@
11
# AwsApps
2-
Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app. See [Misc/WorkingWithTheAWSJobRunner.pdf](https://github.com/HuntsmanCancerInstitute/AwsApps/blob/master/Misc/WorkingWithTheAWSJobRunner.pdf) for details.
3-
<pre>
4-
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6.jar
2+
Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app.
53

4+
<pre>
5+
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/S3Copy_0.3.jar
66
**************************************************************************************
7-
** GSync : June 2020 **
7+
** S3 Copy : Feb 2024 **
88
**************************************************************************************
9-
GSync pushes files with a particular extension that exceed a given size and age to
10-
Amazon's S3 object store. Associated genomic index files are also moved. Once
11-
correctly uploaded, GSync replaces the original file with a local txt placeholder file
12-
containing information about the S3 object. Files are restored or deleted by modifying
13-
the name of the placeholder file. Symbolic links are ignored.
14-
15-
WARNING! This app has the potential to destroy precious genomic data. TEST IT on a
16-
pilot system before deploying in production. BACKUP your local files and ENABLE S3
17-
Object Versioning before running. This app is provided with no guarantee of proper
18-
function.
9+
SC copies AWS S3 objects, unarchiving them as needed, within the same or different
10+
accounts or downloads them to your local computer. Run this as a daemon with -l or run
11+
repeatedly until complete. To upload files to S3, use the AWS CLI.
1912

2013
To use the app:
21-
1) Create a new S3 bucket dedicated solely to this purpose. Use it for nothing else.
22-
2) Enable S3 Object Locking and Versioning on the bucket to assist in preventing
23-
accidental object overwriting. Add lifecycle rules to
24-
AbortIncompleteMultipartUpload and move objects to Deep Glacier.
25-
3) It is a good policy when working on AWS S3 to limit your ability to accidentally
26-
delete buckets and objects. To do so, create and assign yourself to an AWS Group
27-
called AllExceptS3Delete with a custom permission policy that denies s3:Delete*:
28-
{"Version": "2012-10-17", "Statement": [
29-
{"Effect": "Allow", "Action": "*", "Resource": "*"},
30-
{"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]}
31-
For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete
32-
group. When you need to delete or update objects, switch to the Admin group, then
33-
switch back. Accidental overwrites are OK since object versioning is enabled.
34-
To add another layer of protection, apply object legal locks via the aws cli.
35-
3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod
36-
600 the file and keep it private. Use a txt editor or the aws cli configure
37-
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file:
38-
[default]
39-
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A
40-
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ
41-
region = us-west-2
42-
4) Execute GSync to upload large old files to S3 and replace them with a placeholder
43-
file named xxx.S3.txt
44-
5) To download and restore an archived file, rename the placeholder
45-
xxx.S3.txt.restore and run GSync.
46-
6) To delete an S3 archived file, it's placeholder, and any local files, rename the
47-
placeholder xxx.S3.txt.delete and run GSync.
48-
Before executing, switch the GSync/AWS user to the Admin group.
49-
7) Placeholder files may be moved, see -u
14+
Create a ~/.aws/credentials file with your access, secret, and region info, chmod
15+
600 the file and keep it private. Use a txt editor or the AWS CLI configure
16+
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file:
17+
[default]
18+
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A
19+
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ
20+
region = us-west-2
21+
Repeat these entries for multiple accounts replacing the word 'default' with a single
22+
unique account name.
5023

5124
Required:
52-
-d One or more local directories with the same parent to sync. This parent dir
53-
becomes the base key in S3, e.g. BucketName/Parent/.... Comma delimited, no
54-
spaces, see the example.
55-
-b Dedicated S3 bucket name
25+
-j Provide a comma delimited string of copy jobs or a txt file with one per line.
26+
A copy job consists of a full S3 URI as the source and a destination separated
27+
by '>', e.g. 's3://source/tumor.cram > s3://destination/collabTumor.cram' or
28+
folders 's3://source/alignments/tumor > s3://destination/Collab/' or local
29+
's3://source/alignments/tumor > .' Note, the trailing '/' is required in the
30+
S3 destination for a recursive copy or when the local folder doesn't exist.
5631

57-
Optional:
58-
-f File extensions to consider, comma delimited, no spaces, case sensitive. Defaults
59-
to '.bam,.cram,.gz,.zip'
60-
-a Minimum days old for archiving, defaults to 120
61-
-g Minimum gigabyte size for archiving, defaults to 5
62-
-r Perform a real run, defaults to just listing the actions that would be taken.
63-
-k Delete local files that were successfully uploaded.
64-
-u Update S3 Object keys to match current placeholder paths.
65-
-c Recreate deleted placeholder files using info from orphaned S3 Objects.
66-
-q Quiet verbose output.
67-
-e Email addresses to send gsync messages, comma delimited, no spaces.
68-
-s Smtp host, defaults to hci-mail.hci.utah.edu
69-
-x Execute every 6 hrs until complete, defaults to just once, good for downloading
70-
latent glacier objects.
32+
Optional/ Defaults:
33+
-d Perform a dry run to list the actions that would be taken
34+
-r Perform a recursive copy, defaults to an exact source key match
35+
-e Email addresse(s) to send status messages, comma delimited, no spaces. Note,
36+
the sendmail app must be configured on your system. Test it:
37+
echo 'Subject: Hello' | sendmail yourEmailAddress@yourProvider.com
38+
-x Expedite archive retrieval, increased cost $0.03/GB vs $0.01/GB, 1-5min vs 3-12hr,
39+
defaults to standard.
40+
-l Execute every hour (standard) or minute (expedited) until complete
41+
-t Maximum threads to utilize, defaults to 8
42+
-p AWS credentials profile, defaults to 'default'
43+
-n Number of days to keep restored files in S3, defaults to 1
44+
-a Print instructions for copying files between different accounts
7145

72-
Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo
73-
-q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e obama@real.gov
46+
Example: java -Xmx10G -jar pathTo/S3Copy_x.x.jar -e obama@real.gov -p obama -d -l
47+
-j 's3://source/Logs.zip>s3://destination/,s3://source/normal > ~/Downloads/' -r
48+
**************************************************************************************
49+
</pre>
7450

51+
<pre>
52+
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/VersionManager_0.2.jar
7553
**************************************************************************************
54+
** AWS S3 Version Manager : August 2023 **
55+
**************************************************************************************
56+
Bucket versioning in S3 protects objects from being deleted or overwritten by hiding
57+
the original when 'deleting' or over writing an existing object. Use this tool to
58+
delete these hidden S3 objects and any deletion marks from your buckets. Use the
59+
options to select particular redundant objects to delete in a dry run, review the
60+
actions, and rerun it with the -r option to actually delete them. This app will not
61+
delete any isLatest=true object.
7662

63+
WARNING! This app has the potential to destroy precious data. TEST IT on a
64+
pilot system before deploying in production. Although extensively unit tested, this
65+
app is provided with no guarantee of proper function.
7766

67+
To use the app:
68+
1) Enable S3 Object versioning on your bucket.
69+
2) Install and configure the aws cli with your region, access and secret keys. See
70+
https://aws.amazon.com/cli
71+
3) Use cli commands like 'aws s3 rm s3://myBucket/myObj.txt' or the AWS web Console to
72+
'delete' particular objects. Then run this app to actually delete them.
7873

74+
Required Parameters:
75+
-b Versioned S3 bucket name
7976

77+
Optional Parameters:
78+
-r Perform a real run, defaults to a dry run where no objects are deleted
79+
-c Credentials profile name, defaults to 'default'
80+
-a Minimum age, in days, of object to delete, defaults to 30
81+
-s Object key suffixes to delete, comma delimited, no spaces
82+
-p Object key prefixes to delete, comma delimited, no spaces
83+
-v Verbose output
84+
-t Maximum threads to use, defaults to 8
85+
86+
Example: java -Xmx10G -jar pathTo/VersionManager_X.X.jar -b mybucket-vm-test
87+
-s .cram,.bam,.gz,.zip -a 7 -c MiloLab
88+
89+
**************************************************************************************
90+
</pre>
91+
92+
See [Misc/WorkingWithTheAWSJobRunner.pdf](https://github.com/HuntsmanCancerInstitute/AwsApps/blob/master/Misc/WorkingWithTheAWSJobRunner.pdf) for details.
93+
<pre>
8094
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/JobRunner_0.3.jar
8195

8296
**************************************************************************************
@@ -137,96 +151,79 @@ Example: java -jar -Xmx1G JobRunner.jar -x -t
137151
-c 'https://my-jr.s3.us-west-2.amazonaws.com/aws.cred.txt?X-AmRun...'
138152

139153
**************************************************************************************
154+
</pre>
140155

141-
142-
143-
144-
u0028003$ java -jar ~/Code/AwsApps/target/VersionManager_0.1.jar
145-
146-
**************************************************************************************
147-
** AWS S3 Version Manager : January 2022 **
156+
<pre>
157+
u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6.jar
148158
**************************************************************************************
149-
Bucket versioning in S3 protects objects from being deleted or overwritten by hiding
150-
the original when 'deleting' or over writing an existing object. Use this tool to
151-
delete these hidden S3 objects and any deletion marks from your buckets. Use the
152-
options to select particular redundant objects to delete in a dry run, review the
153-
actions, and rerun it with the -r option to actually delete them. This app will not
154-
delete any isLatest=true object.
155-
156-
WARNING! This app has the potential to destroy precious data. TEST IT on a
157-
pilot system before deploying in production. Although extensively unit tested, this
158-
app is provided with no guarantee of proper function.
159-
160-
To use the app:
161-
1) Enable S3 Object versioning on your bucket.
162-
2) Install and configure the aws cli with your region, access and secret keys. See
163-
https://aws.amazon.com/cli
164-
3) Use cli commands like 'aws s3 rm s3://myBucket/myObj.txt' or the AWS web Console to
165-
'delete' particular objects. Then run this app to actually delete them.
166-
167-
Required Parameters:
168-
-b Versioned S3 bucket name
169-
-l Bucket region location
170-
171-
Optional Parameters:
172-
-r Perform a real run, defaults to a dry run where no objects are deleted
173-
-c Credentials profile name, defaults to 'default'
174-
-a Minimum age, in days, of object to delete, defaults to 30
175-
-s Object key suffixes to delete, comma delimited, no spaces
176-
-p Object key prefixes to delete, comma delimited, no spaces
177-
-q Quiet output.
178-
179-
Example: java -Xmx10G -jar pathTo/VersionManager_X.X.jar -b mybucket-vm-test
180-
-s .cram,.bam,.gz,.zip -a 7 -c MiloLab -l us-west-2
181-
159+
** GSync : June 2020 **
182160
**************************************************************************************
161+
GSync pushes files with a particular extension that exceed a given size and age to
162+
Amazon's S3 object store. Associated genomic index files are also moved. Once
163+
correctly uploaded, GSync replaces the original file with a local txt placeholder file
164+
containing information about the S3 object. Files are restored or deleted by modifying
165+
the name of the placeholder file. Symbolic links are ignored.
183166

184-
185-
186-
u0028003$ java -jar ~/Code/AwsApps/target/S3Copy_0.1.jar
187-
188-
**************************************************************************************
189-
** S3 Copy : Jan 2023 **
190-
**************************************************************************************
191-
SC copies AWS S3 objects, unarchiving them as needed, within the same or different
192-
accounts or downloads them to your local computer. Run this as a daemon with -l or run
193-
repeatedly until complete. To upload files to S3, use the AWS CLI.
167+
WARNING! This app has the potential to destroy precious genomic data. TEST IT on a
168+
pilot system before deploying in production. BACKUP your local files and ENABLE S3
169+
Object Versioning before running. This app is provided with no guarantee of proper
170+
function.
194171

195172
To use the app:
196-
Create a ~/.aws/credentials file with your access, secret, and region info, chmod
197-
600 the file and keep it private. Use a txt editor or the AWS CLI configure
198-
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file:
199-
[default]
200-
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A
201-
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ
202-
region = us-west-2
203-
Repeat these entries for multiple accounts replacing the word 'default' with a single
204-
unique account name.
173+
1) Create a new S3 bucket dedicated solely to this purpose. Use it for nothing else.
174+
2) Enable S3 Object Locking and Versioning on the bucket to assist in preventing
175+
accidental object overwriting. Add lifecycle rules to
176+
AbortIncompleteMultipartUpload and move objects to Deep Glacier.
177+
3) It is a good policy when working on AWS S3 to limit your ability to accidentally
178+
delete buckets and objects. To do so, create and assign yourself to an AWS Group
179+
called AllExceptS3Delete with a custom permission policy that denies s3:Delete*:
180+
{"Version": "2012-10-17", "Statement": [
181+
{"Effect": "Allow", "Action": "*", "Resource": "*"},
182+
{"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]}
183+
For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete
184+
group. When you need to delete or update objects, switch to the Admin group, then
185+
switch back. Accidental overwrites are OK since object versioning is enabled.
186+
To add another layer of protection, apply object legal locks via the aws cli.
187+
3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod
188+
600 the file and keep it private. Use a txt editor or the aws cli configure
189+
command, see https://aws.amazon.com/cli Example ~/.aws/credentials file:
190+
[default]
191+
aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A
192+
aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ
193+
region = us-west-2
194+
4) Execute GSync to upload large old files to S3 and replace them with a placeholder
195+
file named xxx.S3.txt
196+
5) To download and restore an archived file, rename the placeholder
197+
xxx.S3.txt.restore and run GSync.
198+
6) To delete an S3 archived file, it's placeholder, and any local files, rename the
199+
placeholder xxx.S3.txt.delete and run GSync.
200+
Before executing, switch the GSync/AWS user to the Admin group.
201+
7) Placeholder files may be moved, see -u
205202

206203
Required:
207-
-j Provide a comma delimited string of copy jobs or a txt file with one per line.
208-
A copy job consists of a full S3 URI as the source and a destination separated
209-
by '>', e.g. 's3://source/tumor.cram > s3://destination/collabTumor.cram' or
210-
folders 's3://source/alignments/tumor > s3://destination/Collab/' or local
211-
's3://source/alignments/tumor > .' Note, the trailing '/' is required in the
212-
S3 destination for a recursive copy or when the local folder doesn't exist.
204+
-d One or more local directories with the same parent to sync. This parent dir
205+
becomes the base key in S3, e.g. BucketName/Parent/.... Comma delimited, no
206+
spaces, see the example.
207+
-b Dedicated S3 bucket name
213208

214-
Optional/ Defaults:
215-
-d Perform a dry run to list the actions that would be taken
216-
-r Perform a recursive copy, defaults to an exact source key match
217-
-e Email addresse(s) to send status messages, comma delimited, no spaces. Note,
218-
the sendmail app must be configured on your system. Test it:
219-
echo 'Subject: Hello' | sendmail yourEmailAddress@yourProvider.com
220-
-x Expedite archive retrieval, increased cost $0.03/GB vs $0.01/GB, 1-5min vs 3-12hr,
221-
defaults to standard.
222-
-l Execute every hour (standard) or minute (expedited) until complete
223-
-t Maximum threads to utilize, defaults to 8
224-
-p AWS credentials profile, defaults to 'default'
225-
-n Number of days to keep restored files in S3, defaults to 1
226-
-a Print instructions for copying files between different accounts
209+
Optional:
210+
-f File extensions to consider, comma delimited, no spaces, case sensitive. Defaults
211+
to '.bam,.cram,.gz,.zip'
212+
-a Minimum days old for archiving, defaults to 120
213+
-g Minimum gigabyte size for archiving, defaults to 5
214+
-r Perform a real run, defaults to just listing the actions that would be taken.
215+
-k Delete local files that were successfully uploaded.
216+
-u Update S3 Object keys to match current placeholder paths.
217+
-c Recreate deleted placeholder files using info from orphaned S3 Objects.
218+
-q Quiet verbose output.
219+
-e Email addresses to send gsync messages, comma delimited, no spaces.
220+
-s Smtp host, defaults to hci-mail.hci.utah.edu
221+
-x Execute every 6 hrs until complete, defaults to just once, good for downloading
222+
latent glacier objects.
227223

228-
Example: java -Xmx20G -jar pathTo/S3Copy_x.x.jar -e obama@real.gov -p obama -d -l
229-
-j 's3://source/Logs.zip>s3://destination/,s3://source/normal > ~/Downloads/' -r
230-
**************************************************************************************
224+
Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo
225+
-q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e obama@real.gov
231226

227+
**************************************************************************************
232228
</pre>
229+

0 commit comments

Comments
 (0)