1
1
# AwsApps
2
2
Genomic data focused toolkit for working with AWS services (e.g. S3 and EC2). Includes exhaustive JUnit testing for each app.
3
3
<pre >
4
- MacBook-Pro-89:~ u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.4 .jar
4
+ u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/GSync_0.6 .jar
5
5
6
6
**************************************************************************************
7
- ** GSync : Feb 2020 **
7
+ ** GSync : June 2020 **
8
8
**************************************************************************************
9
9
GSync pushes files with a particular extension that exceed a given size and age to
10
10
Amazon's S3 object store. Associated genomic index files are also moved. Once
@@ -29,7 +29,7 @@ To use the app:
29
29
{"Effect": "Allow", "Action": "*", "Resource": "*"},
30
30
{"Effect": "Deny", "Action": "s3:Delete*", "Resource": "*"} ]}
31
31
For standard upload and download gsyncs, assign yourself to the AllExceptS3Delete
32
- group. When you need to delete objects or buckets , switch to the Admin group, then
32
+ group. When you need to delete or update objects , switch to the Admin group, then
33
33
switch back. Accidental overwrites are OK since object versioning is enabled.
34
34
To add another layer of protection, apply object legal locks via the aws cli.
35
35
3) Create a ~/.aws/credentials file with your access, secret, and region info, chmod
@@ -63,14 +63,69 @@ Optional:
63
63
-k Delete local files that were successfully uploaded.
64
64
-u Update S3 Object keys to match current placeholder paths.
65
65
-c Recreate deleted placeholder files using info from orphaned S3 Objects.
66
- -v Verbose output.
66
+ -q Quiet verbose output.
67
67
-e Email addresses to send gsync messages, comma delimited, no spaces.
68
68
-s Smtp host, defaults to hci-mail.hci.utah.edu
69
69
-x Execute every 6 hrs until complete, defaults to just once, good for downloading
70
70
latent glacier objects.
71
71
72
72
Example: java -Xmx20G -jar pathTo/GSync_X.X.jar -r -u -k -b hcibioinfo_gsync_repo
73
- -v -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e obama@real.gov
73
+ -q -a 90 -g 1 -d -d /Repo/DNA,/Repo/RNA,/Repo/Fastq -e obama@real.gov
74
74
75
75
**************************************************************************************
76
+
77
+
78
+
79
+
80
+ u0028003$ java -jar -Xmx1G ~/Code/AwsApps/target/JobRunner_0.2.jar
81
+
82
+ ****************************************************************************************************************************
83
+ ** AWS Job Runner : December 2021 **
84
+ ****************************************************************************************************************************
85
+ JR is an app for running bash scripts on AWS EC2 nodes. It downloads and uncompressed your resource bundle and looks for
86
+ xxx.sh_JR_START files in your S3 Jobs directories. For each, it copies over the directory contents, executes the
87
+ associated xxx.sh script, and transfers back the results. This is repeated until no unrun jobs are found. Launch many
88
+ EC2 JR nodes, each running an instance of the JR, to process hundreds of jobs in parallel. Use spot requests and
89
+ hibernation to reduce costs.
90
+
91
+ To use:
92
+ 1) Install and configure the aws cli on your local workstation, see https://aws.amazon.com/cli/
93
+ 2) Upload your aws credentials file into a private bucket on aws, e.g.
94
+ aws s3 cp ~/.aws/credentials s3://my-jr/aws.cred.txt
95
+ 3) Generate a secure 24hr timed URL for the credentials file, e.g.
96
+ aws --region us-west-2 s3 presign s3://my-jr/aws.cred.txt --expires-in 259200
97
+ 4) Upload a zip archive containing resources needed to run your jobs into S3, e.g.
98
+ aws s3 cp ~/TNRunnerResourceBundle.zip s3://my-jr/TNRunnerResourceBundle.zip
99
+ This will be copied into the /JRDir/ directory and then unzipped.
100
+ 5) Upload script and job files into a 'Jobs' directory on S3, e.g.
101
+ aws s3 cp ~/JRJobs/A/ s3://my-jr/Jobs/A/ --recursive
102
+ 6) Optional, upload bash script files ending with JR_INIT.sh and or JR_TERM.sh. These are executed by JR before and after
103
+ running the main bash script. Use these to copy in sample specific resources, e.g. fastq/ cram/ bam files, and to run
104
+ post job clean up.
105
+ 7) Upload a file named XXX_JR_START to let the JobRunner know the bash script named XXX is ready to run, e.g.
106
+ aws s3 cp s3://my-jr/emptyFile s3://my-jr/Jobs/A/dnaAlignQC.sh_JR_START
107
+ 8) Launch the JobRunner.jar on one or more JR configured EC2 nodes. See https://ri-confluence.hci.utah.edu/x/gYCgBw
108
+
109
+ Job Runner Options:
110
+ -c URL to your secure timed config credentials file.
111
+ -r S3URI to your zipped resource bundle.
112
+ -j S3URI to your root Jobs directory containing folders with job scripts to execute.
113
+ -l S3URI to your Log folder for node logs.
114
+
115
+ Default Options:
116
+ -d Directory on the local worker node, full path, in which resources and job files will be processed, defaults to /JRDir/
117
+ -a Aws credentials directory, defaults to ~/.aws/
118
+ -t Terminate the EC2 node upon job completion. Defaults to looking for jobs for the min2Wait.
119
+ -w Minutes to wait when jobs are not found before termination, defaults to 10.
120
+ -x Replace S3 job directories with processed analysis, defaults to syncing local with S3. WARNING, if selected, don't place
121
+ any files in these S3 jobs directories that cannot be replaced. JR will delete them.
122
+ -v Verbose debugging output.
123
+
124
+ Example: java -jar -Xmx1G JobRunner.jar
125
+ -r s3://my-jr/TNRunnerResourceBundle.zip
126
+ -j s3://my-jr/Jobs/
127
+ -l s3://my-jr/NodeLogs/
128
+ -c 'https://my-jr.s3.us-west-2.amazonaws.com/aws.cred.txt?X-Amz-Algorithm=AWS4-HMXXX...'
129
+
130
+ ****************************************************************************************************************************
76
131
</pre >
0 commit comments