|
1 | 1 | # AwsApps
|
2 |
| -Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app. See [Misc/WorkingWithTheAWSJobRunner.pdf](https://github.com/HuntsmanCancerInstitute/AwsApps/blob/master/Misc/WorkingWithTheAWSJobRunner.pdf) for details. |
3 |
| -<pre> |
4 |
| -u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6.jar |
| 2 | +Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app. |
5 | 3 |
|
| 4 | +<pre> |
| 5 | +u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/S3Copy_0.3.jar |
6 | 6 | **************************************************************************************
|
7 |
| -** GSync : June 2020 ** |
| 7 | +** S3 Copy : Feb 2024 ** |
8 | 8 | **************************************************************************************
|
9 |
| -GSync pushes files with a particular extension that exceed a given size and age to |
10 |
| -Amazon's S3 object store. Associated genomic index files are also moved. Once |
11 |
| -correctly uploaded, GSync replaces the original file with a local txt placeholder file |
12 |
| -containing information about the S3 object. Files are restored or deleted by modifying |
13 |
| -the name of the placeholder file. Symbolic links are ignored. |
14 |
| - |
15 |
| -WARNING! This app has the potential to destroy precious genomic data. TEST IT on a |
16 |
| -pilot system before deploying in production. BACKUP your local files and ENABLE S3 |
17 |
| -Object Versioning before running. This app is provided with no guarantee of proper |
18 |
| -function. |
| 9 | +SC copies AWS S3 objects, unarchiving them as needed, within the same or different |
| 10 | +accounts or downloads them to your local computer. Run this as a daemon with -l or run |
| 11 | +repeatedly until complete. To upload files to S3, use the AWS CLI. |
19 | 12 |
|
20 | 13 | To use the app:
|
21 |
| -1) Create a new S3 bucket dedicated solely to this purpose. Use it for nothing else. |
22 |
| -2) Enable S3 Object Locking and Versioning on the bucket to assist in preventing |
23 |
| - accidental object overwriting. Add lifecycle rules to |
24 |
| - AbortIncompleteMultipartUpload and move objects to Deep Glacier. |
25 |
| -3) It is a good policy when working on AWS S3 to limit your ability to accidentally |
26 |
| - delete buckets and objects. To do so, create and assign yourself to an AWS Group |
27 |
| - called AllExceptS3Delete with a custom permission policy that denies s3:Delete*: |
28 |
| - {"Version": "2012-10-17", "Statement": [ |
29 |
| - {"Effect": "Allow", "Action": "*", "Resource": "*"}, |
30 |
| - {"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]} |
31 |
| - For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete |
32 |
| - group. When you need to delete or update objects, switch to the Admin group, then |
33 |
| - switch back. Accidental overwrites are OK since object versioning is enabled. |
34 |
| - To add another layer of protection, apply object legal locks via the aws cli. |
35 |
| -3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod |
36 |
| - 600 the file and keep it private. Use a txt editor or the aws cli configure |
37 |
| - command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: |
38 |
| - [default] |
39 |
| - aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A |
40 |
| - aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ |
41 |
| - region = us-west-2 |
42 |
| -4) Execute GSync to upload large old files to S3 and replace them with a placeholder |
43 |
| - file named xxx.S3.txt |
44 |
| -5) To download and restore an archived file, rename the placeholder |
45 |
| - xxx.S3.txt.restore and run GSync. |
46 |
| -6) To delete an S3 archived file, it's placeholder, and any local files, rename the |
47 |
| - placeholder xxx.S3.txt.delete and run GSync. |
48 |
| - Before executing, switch the GSync/AWS user to the Admin group. |
49 |
| -7) Placeholder files may be moved, see -u |
| 14 | +Create a ~/.aws/credentials file with your access, secret, and region info, chmod |
| 15 | + 600 the file and keep it private. Use a txt editor or the AWS CLI configure |
| 16 | + command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: |
| 17 | + [default] |
| 18 | + aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A |
| 19 | + aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ |
| 20 | + region = us-west-2 |
| 21 | +Repeat these entries for multiple accounts replacing the word 'default' with a single |
| 22 | +unique account name. |
50 | 23 |
|
51 | 24 | Required:
|
52 |
| --d One or more local directories with the same parent to sync. This parent dir |
53 |
| - becomes the base key in S3, e.g. BucketName/Parent/.... Comma delimited, no |
54 |
| - spaces, see the example. |
55 |
| --b Dedicated S3 bucket name |
| 25 | +-j Provide a comma delimited string of copy jobs or a txt file with one per line. |
| 26 | + A copy job consists of a full S3 URI as the source and a destination separated |
| 27 | + by '>', e.g. 's3://source/tumor.cram > s3://destination/collabTumor.cram' or |
| 28 | + folders 's3://source/alignments/tumor > s3://destination/Collab/' or local |
| 29 | + 's3://source/alignments/tumor > .' Note, the trailing '/' is required in the |
| 30 | + S3 destination for a recursive copy or when the local folder doesn't exist. |
56 | 31 |
|
57 |
| -Optional: |
58 |
| --f File extensions to consider, comma delimited, no spaces, case sensitive. Defaults |
59 |
| - to '.bam,.cram,.gz,.zip' |
60 |
| --a Minimum days old for archiving, defaults to 120 |
61 |
| --g Minimum gigabyte size for archiving, defaults to 5 |
62 |
| --r Perform a real run, defaults to just listing the actions that would be taken. |
63 |
| --k Delete local files that were successfully uploaded. |
64 |
| --u Update S3 Object keys to match current placeholder paths. |
65 |
| --c Recreate deleted placeholder files using info from orphaned S3 Objects. |
66 |
| --q Quiet verbose output. |
67 |
| --e Email addresses to send gsync messages, comma delimited, no spaces. |
68 |
| --s Smtp host, defaults to hci-mail.hci.utah.edu |
69 |
| --x Execute every 6 hrs until complete, defaults to just once, good for downloading |
70 |
| - latent glacier objects. |
| 32 | +Optional/ Defaults: |
| 33 | +-d Perform a dry run to list the actions that would be taken |
| 34 | +-r Perform a recursive copy, defaults to an exact source key match |
| 35 | +-e Email addresse(s) to send status messages, comma delimited, no spaces. Note, |
| 36 | + the sendmail app must be configured on your system. Test it: |
| 37 | + echo 'Subject: Hello' | sendmail yourEmailAddress@yourProvider.com |
| 38 | +-x Expedite archive retrieval, increased cost $0.03/GB vs $0.01/GB, 1-5min vs 3-12hr, |
| 39 | + defaults to standard. |
| 40 | +-l Execute every hour (standard) or minute (expedited) until complete |
| 41 | +-t Maximum threads to utilize, defaults to 8 |
| 42 | +-p AWS credentials profile, defaults to 'default' |
| 43 | +-n Number of days to keep restored files in S3, defaults to 1 |
| 44 | +-a Print instructions for copying files between different accounts |
71 | 45 |
|
72 |
| -Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo |
73 |
| - -q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e obama@real.gov |
| 46 | +Example: java -Xmx10G -jar pathTo/S3Copy_x.x.jar -e obama@real.gov -p obama -d -l |
| 47 | + -j 's3://source/Logs.zip>s3://destination/,s3://source/normal > ~/Downloads/' -r |
| 48 | +************************************************************************************** |
| 49 | +</pre> |
74 | 50 |
|
| 51 | +<pre> |
| 52 | +u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/VersionManager_0.2.jar |
75 | 53 | **************************************************************************************
|
| 54 | +** AWS S3 Version Manager : August 2023 ** |
| 55 | +************************************************************************************** |
| 56 | +Bucket versioning in S3 protects objects from being deleted or overwritten by hiding |
| 57 | +the original when 'deleting' or over writing an existing object. Use this tool to |
| 58 | +delete these hidden S3 objects and any deletion marks from your buckets. Use the |
| 59 | +options to select particular redundant objects to delete in a dry run, review the |
| 60 | +actions, and rerun it with the -r option to actually delete them. This app will not |
| 61 | +delete any isLatest=true object. |
76 | 62 |
|
| 63 | +WARNING! This app has the potential to destroy precious data. TEST IT on a |
| 64 | +pilot system before deploying in production. Although extensively unit tested, this |
| 65 | +app is provided with no guarantee of proper function. |
77 | 66 |
|
| 67 | +To use the app: |
| 68 | +1) Enable S3 Object versioning on your bucket. |
| 69 | +2) Install and configure the aws cli with your region, access and secret keys. See |
| 70 | + https://aws.amazon.com/cli |
| 71 | +3) Use cli commands like 'aws s3 rm s3://myBucket/myObj.txt' or the AWS web Console to |
| 72 | + 'delete' particular objects. Then run this app to actually delete them. |
78 | 73 |
|
| 74 | +Required Parameters: |
| 75 | +-b Versioned S3 bucket name |
79 | 76 |
|
| 77 | +Optional Parameters: |
| 78 | +-r Perform a real run, defaults to a dry run where no objects are deleted |
| 79 | +-c Credentials profile name, defaults to 'default' |
| 80 | +-a Minimum age, in days, of object to delete, defaults to 30 |
| 81 | +-s Object key suffixes to delete, comma delimited, no spaces |
| 82 | +-p Object key prefixes to delete, comma delimited, no spaces |
| 83 | +-v Verbose output |
| 84 | +-t Maximum threads to use, defaults to 8 |
| 85 | + |
| 86 | +Example: java -Xmx10G -jar pathTo/VersionManager_X.X.jar -b mybucket-vm-test |
| 87 | + -s .cram,.bam,.gz,.zip -a 7 -c MiloLab |
| 88 | + |
| 89 | +************************************************************************************** |
| 90 | +</pre> |
| 91 | + |
| 92 | +See [Misc/WorkingWithTheAWSJobRunner.pdf](https://github.com/HuntsmanCancerInstitute/AwsApps/blob/master/Misc/WorkingWithTheAWSJobRunner.pdf) for details. |
| 93 | +<pre> |
80 | 94 | u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/JobRunner_0.3.jar
|
81 | 95 |
|
82 | 96 | **************************************************************************************
|
@@ -137,96 +151,79 @@ Example: java -jar -Xmx1G JobRunner.jar -x -t
|
137 | 151 | -c 'https://my-jr.s3.us-west-2.amazonaws.com/aws.cred.txt?X-AmRun...'
|
138 | 152 |
|
139 | 153 | **************************************************************************************
|
| 154 | +</pre> |
140 | 155 |
|
141 |
| - |
142 |
| - |
143 |
| - |
144 |
| -u0028003$ java -jar ~/Code/AwsApps/target/VersionManager_0.1.jar |
145 |
| - |
146 |
| -************************************************************************************** |
147 |
| -** AWS S3 Version Manager : January 2022 ** |
| 156 | +<pre> |
| 157 | +u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6.jar |
148 | 158 | **************************************************************************************
|
149 |
| -Bucket versioning in S3 protects objects from being deleted or overwritten by hiding |
150 |
| -the original when 'deleting' or over writing an existing object. Use this tool to |
151 |
| -delete these hidden S3 objects and any deletion marks from your buckets. Use the |
152 |
| -options to select particular redundant objects to delete in a dry run, review the |
153 |
| -actions, and rerun it with the -r option to actually delete them. This app will not |
154 |
| -delete any isLatest=true object. |
155 |
| - |
156 |
| -WARNING! This app has the potential to destroy precious data. TEST IT on a |
157 |
| -pilot system before deploying in production. Although extensively unit tested, this |
158 |
| -app is provided with no guarantee of proper function. |
159 |
| - |
160 |
| -To use the app: |
161 |
| -1) Enable S3 Object versioning on your bucket. |
162 |
| -2) Install and configure the aws cli with your region, access and secret keys. See |
163 |
| - https://aws.amazon.com/cli |
164 |
| -3) Use cli commands like 'aws s3 rm s3://myBucket/myObj.txt' or the AWS web Console to |
165 |
| - 'delete' particular objects. Then run this app to actually delete them. |
166 |
| - |
167 |
| -Required Parameters: |
168 |
| --b Versioned S3 bucket name |
169 |
| --l Bucket region location |
170 |
| - |
171 |
| -Optional Parameters: |
172 |
| --r Perform a real run, defaults to a dry run where no objects are deleted |
173 |
| --c Credentials profile name, defaults to 'default' |
174 |
| --a Minimum age, in days, of object to delete, defaults to 30 |
175 |
| --s Object key suffixes to delete, comma delimited, no spaces |
176 |
| --p Object key prefixes to delete, comma delimited, no spaces |
177 |
| --q Quiet output. |
178 |
| - |
179 |
| -Example: java -Xmx10G -jar pathTo/VersionManager_X.X.jar -b mybucket-vm-test |
180 |
| - -s .cram,.bam,.gz,.zip -a 7 -c MiloLab -l us-west-2 |
181 |
| - |
| 159 | +** GSync : June 2020 ** |
182 | 160 | **************************************************************************************
|
| 161 | +GSync pushes files with a particular extension that exceed a given size and age to |
| 162 | +Amazon's S3 object store. Associated genomic index files are also moved. Once |
| 163 | +correctly uploaded, GSync replaces the original file with a local txt placeholder file |
| 164 | +containing information about the S3 object. Files are restored or deleted by modifying |
| 165 | +the name of the placeholder file. Symbolic links are ignored. |
183 | 166 |
|
184 |
| - |
185 |
| - |
186 |
| -u0028003$ java -jar ~/Code/AwsApps/target/S3Copy_0.1.jar |
187 |
| - |
188 |
| -************************************************************************************** |
189 |
| -** S3 Copy : Jan 2023 ** |
190 |
| -************************************************************************************** |
191 |
| -SC copies AWS S3 objects, unarchiving them as needed, within the same or different |
192 |
| -accounts or downloads them to your local computer. Run this as a daemon with -l or run |
193 |
| -repeatedly until complete. To upload files to S3, use the AWS CLI. |
| 167 | +WARNING! This app has the potential to destroy precious genomic data. TEST IT on a |
| 168 | +pilot system before deploying in production. BACKUP your local files and ENABLE S3 |
| 169 | +Object Versioning before running. This app is provided with no guarantee of proper |
| 170 | +function. |
194 | 171 |
|
195 | 172 | To use the app:
|
196 |
| -Create a ~/.aws/credentials file with your access, secret, and region info, chmod |
197 |
| - 600 the file and keep it private. Use a txt editor or the AWS CLI configure |
198 |
| - command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: |
199 |
| - [default] |
200 |
| - aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A |
201 |
| - aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ |
202 |
| - region = us-west-2 |
203 |
| -Repeat these entries for multiple accounts replacing the word 'default' with a single |
204 |
| -unique account name. |
| 173 | +1) Create a new S3 bucket dedicated solely to this purpose. Use it for nothing else. |
| 174 | +2) Enable S3 Object Locking and Versioning on the bucket to assist in preventing |
| 175 | + accidental object overwriting. Add lifecycle rules to |
| 176 | + AbortIncompleteMultipartUpload and move objects to Deep Glacier. |
| 177 | +3) It is a good policy when working on AWS S3 to limit your ability to accidentally |
| 178 | + delete buckets and objects. To do so, create and assign yourself to an AWS Group |
| 179 | + called AllExceptS3Delete with a custom permission policy that denies s3:Delete*: |
| 180 | + {"Version": "2012-10-17", "Statement": [ |
| 181 | + {"Effect": "Allow", "Action": "*", "Resource": "*"}, |
| 182 | + {"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]} |
| 183 | + For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete |
| 184 | + group. When you need to delete or update objects, switch to the Admin group, then |
| 185 | + switch back. Accidental overwrites are OK since object versioning is enabled. |
| 186 | + To add another layer of protection, apply object legal locks via the aws cli. |
| 187 | +3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod |
| 188 | + 600 the file and keep it private. Use a txt editor or the aws cli configure |
| 189 | + command, see https://aws.amazon.com/cli Example ~/.aws/credentials file: |
| 190 | + [default] |
| 191 | + aws_access_key_id = AKIARHBDRGYUIBR33RCJK6A |
| 192 | + aws_secret_access_key = BgDV2UHZv/T5ENs395867ueESMPGV65HZMpUQ |
| 193 | + region = us-west-2 |
| 194 | +4) Execute GSync to upload large old files to S3 and replace them with a placeholder |
| 195 | + file named xxx.S3.txt |
| 196 | +5) To download and restore an archived file, rename the placeholder |
| 197 | + xxx.S3.txt.restore and run GSync. |
| 198 | +6) To delete an S3 archived file, it's placeholder, and any local files, rename the |
| 199 | + placeholder xxx.S3.txt.delete and run GSync. |
| 200 | + Before executing, switch the GSync/AWS user to the Admin group. |
| 201 | +7) Placeholder files may be moved, see -u |
205 | 202 |
|
206 | 203 | Required:
|
207 |
| --j Provide a comma delimited string of copy jobs or a txt file with one per line. |
208 |
| - A copy job consists of a full S3 URI as the source and a destination separated |
209 |
| - by '>', e.g. 's3://source/tumor.cram > s3://destination/collabTumor.cram' or |
210 |
| - folders 's3://source/alignments/tumor > s3://destination/Collab/' or local |
211 |
| - 's3://source/alignments/tumor > .' Note, the trailing '/' is required in the |
212 |
| - S3 destination for a recursive copy or when the local folder doesn't exist. |
| 204 | +-d One or more local directories with the same parent to sync. This parent dir |
| 205 | + becomes the base key in S3, e.g. BucketName/Parent/.... Comma delimited, no |
| 206 | + spaces, see the example. |
| 207 | +-b Dedicated S3 bucket name |
213 | 208 |
|
214 |
| -Optional/ Defaults: |
215 |
| --d Perform a dry run to list the actions that would be taken |
216 |
| --r Perform a recursive copy, defaults to an exact source key match |
217 |
| --e Email addresse(s) to send status messages, comma delimited, no spaces. Note, |
218 |
| - the sendmail app must be configured on your system. Test it: |
219 |
| - echo 'Subject: Hello' | sendmail yourEmailAddress@yourProvider.com |
220 |
| --x Expedite archive retrieval, increased cost $0.03/GB vs $0.01/GB, 1-5min vs 3-12hr, |
221 |
| - defaults to standard. |
222 |
| --l Execute every hour (standard) or minute (expedited) until complete |
223 |
| --t Maximum threads to utilize, defaults to 8 |
224 |
| --p AWS credentials profile, defaults to 'default' |
225 |
| --n Number of days to keep restored files in S3, defaults to 1 |
226 |
| --a Print instructions for copying files between different accounts |
| 209 | +Optional: |
| 210 | +-f File extensions to consider, comma delimited, no spaces, case sensitive. Defaults |
| 211 | + to '.bam,.cram,.gz,.zip' |
| 212 | +-a Minimum days old for archiving, defaults to 120 |
| 213 | +-g Minimum gigabyte size for archiving, defaults to 5 |
| 214 | +-r Perform a real run, defaults to just listing the actions that would be taken. |
| 215 | +-k Delete local files that were successfully uploaded. |
| 216 | +-u Update S3 Object keys to match current placeholder paths. |
| 217 | +-c Recreate deleted placeholder files using info from orphaned S3 Objects. |
| 218 | +-q Quiet verbose output. |
| 219 | +-e Email addresses to send gsync messages, comma delimited, no spaces. |
| 220 | +-s Smtp host, defaults to hci-mail.hci.utah.edu |
| 221 | +-x Execute every 6 hrs until complete, defaults to just once, good for downloading |
| 222 | + latent glacier objects. |
227 | 223 |
|
228 |
| -Example: java -Xmx20G -jar pathTo/S3Copy_x.x.jar -e obama@real.gov -p obama -d -l |
229 |
| - -j 's3://source/Logs.zip>s3://destination/,s3://source/normal > ~/Downloads/' -r |
230 |
| -************************************************************************************** |
| 224 | +Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo |
| 225 | + -q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e obama@real.gov |
231 | 226 |
|
| 227 | +************************************************************************************** |
232 | 228 | </pre>
|
| 229 | + |
0 commit comments