Skip to content

Commit 6428886

Browse files
barnharts4Patrick Duin
authored andcommitted
add an assume role provider (#150)
* add an assume role provider * Add `assume-role` to copier options. Use the role in the AssumeRoleCredentialProvider
1 parent 6b71f83 commit 6428886

File tree

10 files changed

+205
-7
lines changed

10 files changed

+205
-7
lines changed

CHANGELOG.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,7 @@
33
### Added
44
* Table transformation to add custom properties to tables during a replication.
55
* If a user doesn't specify `avro-serde-options`, Circus Train will still copy the external schema over to the target table. See [#131](https://github.com/HotelsDotCom/circus-train/issues/131).
6-
7-
### Changed
8-
* Updated `jackson` version to 2.9.10 (was 2.9.9).
6+
* Added `copier-options.assume-role` to assume a role when using the S3MapReduceCp copier class. See [README.md](https://github.com/HotelsDotCom/circus-train) for details.
97

108
### Removed
119
* Excluded `org.pentaho:pentaho-aggdesigner-algorithm` from build.
@@ -14,7 +12,7 @@
1412
* Bug in `AbstractAvroSerDeTransformation` where the config state wasn't refreshed on every replication.
1513

1614
### Changed
17-
* Updated `jackson` version to 2.9.9 (was 2.9.8).
15+
* Updated `jackson` version to 2.9.10 (was 2.9.8).
1816
* Updated `beeju` version to 2.0.0 (was 1.2.1).
1917
* Updated `circus-train-minimal.yml.template` to include the required `housekeeping` configuration for using the default schema with H2.
2018

README.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,7 @@ If data is being replicated from HDFS to S3 then Circus Train will use a customi
376376
| `copier-options.upload-buffer-size`|No|Size of the buffer used to upload the stream of data. If the value is `0` the upload will use the value of the HDFS property `io.file.buffer.size` to configure the buffer. Defaults to `0`|
377377
| `copier-options.canned-acl`|No|AWS Canned ACL name. See [Access Control List (ACL) Overview](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for possible values. If not specified `S3MapReduceCp` will not specify any canned ACL.|
378378
| `copier-options.copier-factory-class`|No|Controls which copier is used for replication if provided.|
379+
| `copier-options.assume-role`|No|ARN of an IAM role to assume when writing S3 data to the target replica. Useful when the target is in a different AWS account than CircusTrain is running in. Note that if JCEKS is also configured, JCEKS credentials will be used instead of assuming a role. If `assume-role` is not specified, the copier will use instance credentials.|
379380

380381
##### S3 to S3 copier options
381382
If data is being replicated from S3 to S3 then Circus Train will use the AWS S3 API to copy data between S3 buckets. Using the AWS provided APIs no data needs to be downloaded or uploaded to the machine on which Circus Train is running but is copied by AWS internal infrastructure and stays in the AWS network boundaries. Assuming the correct bucket policies are in place cross region and cross account replication is supported. We are using the [TransferManager](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html) to do the copying and we expose its options via copier-options see the table below. Given the source and target buckets Circus-Train will try to infer the region from them.
@@ -391,7 +392,14 @@ If data is being replicated from S3 to S3 then Circus Train will use the AWS S3
391392
|`copier-options.s3s3-retry-max-copy-attempts`|No|Controls the maximum number of attempts if AWS throws an error during copy. Default value is 3.|
392393

393394
### S3 Secret Configuration
394-
When configuring a job for replication to or from S3, the AWS access key and secret key with read/write access to the configured S3 buckets must be supplied. To protect these from being exposed in the job's Hadoop configuration, Circus Train expects them to be stored using the Hadoop Credential Provider and the JCEKS URL provided in the Circus Train configuration `security.credential-provider` property. This property is only required if a specific set of credentials is needed or if Circus Train runs on a non-AWS environment. If it is not set then the credentials of the instance where Circus Train runs will be used - note this scenario is only valid when Circus Train is executed on an AWS environment, i.e. EC2/EMR instance.
395+
When configuring a job for replication to or from S3, the AWS access key and secret key with read/write access to the configured S3 buckets must be supplied. Circus train has a couple of options depending on where you run Circus Train.
396+
* Running on EMR:
397+
We provide a `com.hotels.bdp.circustrain.aws.HadoopAWSCredentialProviderChain` that uses Jceks (explained below), STS assume role credentials (see `copier-options.assume-role`) and Instance Profile credentials. If the EMR cluster is running with a role that has read write access to the necessary buckets no configuration is necessary and instance profile credentials will be used.
398+
* Running on on-premises:
399+
It is possible to provide s3 secret configuration with jceks (explained below) or STS assume role credentials (see `copier-options.assume-role`).
400+
401+
#### S3 Secret Configuration with Jceks
402+
The AWS access key and secret key can be protected from being exposed in the job's Hadoop configuration, Circus Train expects them to be stored using the Hadoop Credential Provider and the JCEKS URL provided in the Circus Train configuration `security.credential-provider` property. This property is only required if a specific set of credentials is needed or if Circus Train runs on a non-AWS environment. If it is not set then the credentials of the instance where Circus Train runs will be used - note this scenario is only valid when Circus Train is executed on an AWS environment, i.e. EC2/EMR instance.
395403

396404
To add your existing AWS keys for a new replication job run the following commands as the user that will be executing Circus Train and pass in your keys when prompted:
397405

circus-train-aws/pom.xml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,11 @@
2929
<groupId>org.springframework</groupId>
3030
<artifactId>spring-context</artifactId>
3131
</dependency>
32+
<dependency>
33+
<groupId>com.amazonaws</groupId>
34+
<artifactId>aws-java-sdk-sts</artifactId>
35+
<version>${aws-jdk.version}</version>
36+
</dependency>
3237

3338
<!-- Test -->
3439
<dependency>
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
/**
2+
* Copyright (C) 2016-2019 Expedia Inc.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package com.hotels.bdp.circustrain.aws;
17+
18+
import static com.google.common.base.Preconditions.checkArgument;
19+
import static com.google.common.base.Preconditions.checkNotNull;
20+
21+
import org.apache.commons.lang3.StringUtils;
22+
import org.apache.hadoop.conf.Configuration;
23+
24+
import com.amazonaws.auth.AWSCredentials;
25+
import com.amazonaws.auth.AWSCredentialsProvider;
26+
import com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider;
27+
import com.amazonaws.auth.STSAssumeRoleSessionCredentialsProvider.Builder;
28+
29+
public class AssumeRoleCredentialProvider implements AWSCredentialsProvider {
30+
31+
public static final String ASSUME_ROLE_PROPERTY_NAME = "com.hotels.bdp.circustrain.aws.AssumeRoleCredentialProvider.assumeRole";
32+
private static final int CREDENTIALS_DURATION = 12 * 60 * 60; // max duration for assumed role credentials
33+
34+
private AWSCredentials credentials;
35+
private final Configuration conf;
36+
37+
public AssumeRoleCredentialProvider(Configuration conf) {
38+
this.conf = conf;
39+
}
40+
41+
@Override
42+
public AWSCredentials getCredentials() {
43+
if (credentials == null) {
44+
refresh();
45+
}
46+
return credentials;
47+
}
48+
49+
@Override
50+
public void refresh() {
51+
checkNotNull(conf, "conf is required");
52+
String roleArn = conf.get(ASSUME_ROLE_PROPERTY_NAME);
53+
checkArgument(StringUtils.isNotEmpty(roleArn),
54+
"Role ARN must not be empty, please set: " + ASSUME_ROLE_PROPERTY_NAME);
55+
56+
Builder builder = new STSAssumeRoleSessionCredentialsProvider.Builder(roleArn, "ct-assume-role-session");
57+
credentials = builder.withRoleSessionDurationSeconds(CREDENTIALS_DURATION).build().getCredentials();
58+
}
59+
60+
}

circus-train-aws/src/main/java/com/hotels/bdp/circustrain/aws/HadoopAWSCredentialProviderChain.java

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
* <ul>
2626
* <li>Credentials from Hadoop configuration set via JCE KS - if a JCE KS path or Hadoop {@code Configuration} is
2727
* provided</li>
28+
* <li>{@link AssumeRoleCredentialProvider} that provides credentials by assuming a role</li>
2829
* <li>{@link EC2ContainerCredentialsProviderWrapper} that loads credentials from an Amazon Container (e.g. EC2)</li>
2930
* </ul>
3031
*
@@ -42,7 +43,8 @@ public HadoopAWSCredentialProviderChain(String credentialProviderPath) {
4243
}
4344

4445
public HadoopAWSCredentialProviderChain(Configuration conf) {
45-
super(new JceksAWSCredentialProvider(conf), new EC2ContainerCredentialsProviderWrapper());
46+
// note that the order of these providers is significant as they will be tried in the order passed in
47+
super(new JceksAWSCredentialProvider(conf), new AssumeRoleCredentialProvider(conf),
48+
new EC2ContainerCredentialsProviderWrapper());
4649
}
47-
4850
}
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
/**
2+
* Copyright (C) 2016-2019 Expedia Inc.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package com.hotels.bdp.circustrain.aws;
17+
18+
import org.apache.hadoop.conf.Configuration;
19+
import org.junit.Test;
20+
21+
public class AssumeRoleCredentialProviderTest {
22+
23+
@Test(expected = NullPointerException.class)
24+
public void getCredentialsThrowsNullPointerException() {
25+
AssumeRoleCredentialProvider provider = new AssumeRoleCredentialProvider(null);
26+
provider.getCredentials();
27+
}
28+
29+
@Test(expected = NullPointerException.class)
30+
public void refreshThrowsNullPointerException() {
31+
AssumeRoleCredentialProvider provider = new AssumeRoleCredentialProvider(null);
32+
provider.refresh();
33+
}
34+
35+
@Test(expected = IllegalArgumentException.class)
36+
public void getCredentialsThrowsIllegalArgumentException() {
37+
AssumeRoleCredentialProvider provider = new AssumeRoleCredentialProvider(new Configuration());
38+
provider.getCredentials();
39+
}
40+
41+
@Test(expected = IllegalArgumentException.class)
42+
public void refreshThrowsIllegalArgumentException() {
43+
AssumeRoleCredentialProvider provider = new AssumeRoleCredentialProvider(new Configuration());
44+
provider.refresh();
45+
}
46+
47+
}

circus-train-s3-mapreduce-cp-copier/src/main/java/com/hotels/bdp/circustrain/s3mapreducecpcopier/S3MapReduceCpOptionsParser.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ public class S3MapReduceCpOptionsParser {
4949
public static final String UPLOAD_RETRY_DELAY_MS = "upload-retry-delay-ms";
5050
public static final String UPLOAD_BUFFER_SIZE = "upload-buffer-size";
5151
public static final String CANNED_ACL = "canned-acl";
52+
public static final String ASSUME_ROLE = "assume-role";
5253

5354
private final S3MapReduceCpOptions.Builder optionsBuilder;
5455
private final URI defaultCredentialsProvider;
@@ -147,6 +148,8 @@ protected S3MapReduceCpOptions parse(Map<String, Object> copierOptions) {
147148

148149
optionsBuilder.cannedAcl(MapUtils.getString(copierOptions, CANNED_ACL, ConfigurationVariable.CANNED_ACL.defaultValue()));
149150

151+
optionsBuilder.assumeRole(MapUtils.getString(copierOptions, ASSUME_ROLE, ConfigurationVariable.ASSUME_ROLE.defaultValue()));
152+
150153
return optionsBuilder.build();
151154
}
152155

circus-train-s3-mapreduce-cp/src/main/java/com/hotels/bdp/circustrain/s3mapreducecp/ConfigurationVariable.java

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,8 @@
2424
import com.amazonaws.services.s3.model.StorageClass;
2525
import com.amazonaws.services.s3.transfer.TransferManagerConfiguration;
2626

27+
import com.hotels.bdp.circustrain.aws.AssumeRoleCredentialProvider;
28+
2729
final class Constants {
2830
static final TransferManagerConfiguration DEFAULT_TRANSFER_MANAGER_CONFIGURATION = new TransferManagerConfiguration();
2931

@@ -32,6 +34,7 @@ private Constants() {}
3234

3335
public enum ConfigurationVariable {
3436

37+
ASSUME_ROLE(AssumeRoleCredentialProvider.ASSUME_ROLE_PROPERTY_NAME, null),
3538
CANNED_ACL("com.hotels.bdp.circustrain.s3mapreducecp.cannedAcl", null),
3639
CREDENTIAL_PROVIDER("com.hotels.bdp.circustrain.s3mapreducecp.credentialsProvider", null),
3740
MINIMUM_UPLOAD_PART_SIZE("com.hotels.bdp.circustrain.s3mapreducecp.minimumUploadPartSize",

circus-train-s3-mapreduce-cp/src/main/java/com/hotels/bdp/circustrain/s3mapreducecp/S3MapReduceCpOptions.java

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,11 @@ public Builder cannedAcl(String cannedAcl) {
141141
return this;
142142
}
143143

144+
public Builder assumeRole(String assumeRole) {
145+
options.setAssumeRole(assumeRole);
146+
return this;
147+
}
148+
144149
public S3MapReduceCpOptions build() {
145150
return options;
146151
}
@@ -213,6 +218,9 @@ public static Builder builder(List<Path> sources, URI target) {
213218
@Parameter(names = "--cannedAcl", description = "AWS Canned ACL")
214219
private String cannedAcl = ConfigurationVariable.CANNED_ACL.defaultValue();
215220

221+
@Parameter(names = "--assumeRole", description = "AWS IAM role to assume for writing to replica S3 bucket")
222+
private String assumeRole = ConfigurationVariable.ASSUME_ROLE.defaultValue();
223+
216224
public S3MapReduceCpOptions() {}
217225

218226
public S3MapReduceCpOptions(S3MapReduceCpOptions options) {
@@ -237,6 +245,7 @@ public S3MapReduceCpOptions(S3MapReduceCpOptions options) {
237245
uploadRetryDelayMs = options.uploadRetryDelayMs;
238246
uploadBufferSize = options.uploadBufferSize;
239247
cannedAcl = options.cannedAcl;
248+
assumeRole = options.assumeRole;
240249
}
241250

242251
public boolean isHelp() {
@@ -411,6 +420,12 @@ public void setCannedAcl(String cannedAcl) {
411420
this.cannedAcl = cannedAcl;
412421
}
413422

423+
public String getAssumeRole() { return assumeRole; }
424+
425+
public void setAssumeRole(String assumeRole) {
426+
this.assumeRole = assumeRole;
427+
}
428+
414429
public Map<String, String> toMap() {
415430
ImmutableMap.Builder<String, String> builder = ImmutableMap
416431
.<String, String>builder()
@@ -438,6 +453,9 @@ public Map<String, String> toMap() {
438453
if (cannedAcl != null) {
439454
builder.put(ConfigurationVariable.CANNED_ACL.getName(), cannedAcl);
440455
}
456+
if (assumeRole != null) {
457+
builder.put(ConfigurationVariable.ASSUME_ROLE.getName(), assumeRole);
458+
}
441459
return builder.build();
442460
}
443461

@@ -465,6 +483,7 @@ public String toString() {
465483
", uploadRetryDelayMs=" + uploadRetryDelayMs +
466484
", uploadBufferSize=" + uploadBufferSize +
467485
", cannedAcl='" + cannedAcl + '\'' +
486+
", assumeRole='" + assumeRole + '\'' +
468487
'}';
469488
}
470489
}

0 commit comments

Comments
 (0)