Skip to content

Synchronize Secret Format and Usage #1620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Jul 11, 2025

Conversation

lewijacn
Copy link
Collaborator

@lewijacn lewijacn commented Jul 2, 2025

Description

This change makes our usage of AWS secrets for basic auth consistent across the solution (minus K8s which will have a separate task for its secret management). It also includes various improvement for the MA CDK library such as removing some invalid context options, making task names readable and predictable, and removing the OS container stack which was causing additional complexity and is not being used. This change also provides a way for RFS and Replayer services to load specified arguments from environment variables, as would be the case when a secret is provided to ECS or EKS as this is how it provides the values to the container

Note: This includes a breaking change to secret format
Previously:
We expected an AWS secret to contain only the password value as plaintext with no structure and would expect the username through a separate plaintext argument

myStrongPassword123!

Now:
We follow the standard AWS secret format of containing both the username and password as Key-Value pairs within the secret

{"username":"admin","password":"myStrongPassword123!"}

The console library will produce errors like below when using an improperly structured secret:

migration-console (~) -> console clusters cat-indices 

WARNING: Cluster information may be stale. Use --refresh to update.

SOURCE CLUSTER
health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   test_e2e_0001_integ_full_jul9_2 U--CcY4FQVGQx30DY7q7LA   3   0          3            0     13.3kb         13.3kb

TARGET CLUSTER
Error: Unable to perform cat-indices command with message: Expected secret arn:aws:secretsmanager:us-east-1:123456789012:secret:test-jul2-secret-Y8JbPm to be a JSON object with username and password fields

Or a secret with missing keys:

migration-console (~) -> console clusters curl target_cluster /
Error: Unable to perform cluster command with message: Secret arn:aws:secretsmanager:us-east-1:123456789012:secret:test-jul2-secret-Y8JbPm is missing required key(s): username, password

Other Fixes

Task roles in AWS now follow a consistent pattern like below:

OSMigrations-full-es68-3-us-east-1-capture-proxy-task
OSMigrations-full-es68-3-us-east-1-reindex-from-snapshot-task
OSMigrations-full-es68-3-us-east-1-target-cluster-proxy-task
OSMigrations-full-es68-3-us-east-1-traffic-replayer-default-task
OSMigrations-full-es68-3-us-east-1-migration-console-task
  • Removed printing RFS args twice on every startup
  • Added basic logic for RFS and Replayer to redact password fields when printing args at startup
  • Removed printing stack traces for middleware layer for cat-indices and cluster curl commands

Bug Fixes to check

We were using using source_cluster insecure here, but have changed to target cluster: skipClusterCertCheck: servicesYaml.target_cluster?.allowInsecure,

        let targetClusterProxyStack
        if (targetClusterProxyServiceEnabled && networkStack && migrationStack) {
            targetClusterProxyStack = new CaptureProxyStack(scope, "target-cluster-proxy", {
                serviceName: "target-cluster-proxy",
                vpcDetails: networkStack.vpcDetails,
                destinationConfig: {
                    endpointMigrationSSMParameter: MigrationSSMParameter.OS_CLUSTER_ENDPOINT,
                    securityGroupMigrationSSMParameter: MigrationSSMParameter.OS_ACCESS_SECURITY_GROUP_ID,
                },
                otelCollectorEnabled: false,
                skipClusterCertCheck: servicesYaml.target_cluster?.allowInsecure,

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-1787
https://opensearch.atlassian.net/browse/MIGRATIONS-1915

Testing

Unit testing
Cloud testing in progress...

Check List

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
# Conflicts:
#	deployment/cdk/opensearch-service-migration/lib/network-stack.ts
#	deployment/cdk/opensearch-service-migration/lib/service-stacks/migration-service-core.ts
#	deployment/cdk/opensearch-service-migration/lib/service-stacks/opensearch-container-stack.ts
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
else
echo "RFS Target Cluster password found in RFS_COMMAND; skipping logging of the value"
fi
echo "RFS_COMMAND: $RFS_COMMAND"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print the password if set by --target-password right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if passed explicitly as a command argument (say with extraArgs). We already do this though correct?

[ -z "$RFS_COMMAND" ] && \
{ echo "Warning: RFS_COMMAND is empty! Exiting."; exit 1; } || \
while true; do
    echo "Running command $RFS_COMMAND"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was an existing bug, have cleaned up now so this script will redact in both places

lewijacn added 2 commits July 2, 2025 11:01
# Conflicts:
#	deployment/cdk/opensearch-service-migration/lib/common-utilities.ts
#	deployment/cdk/opensearch-service-migration/lib/stack-composer.ts
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 2, 2025 16:07 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 2, 2025 16:34 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 2, 2025 16:34 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 3, 2025 02:27 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@gregschohn
Copy link
Collaborator

One other idea - could you scan the cdk's extra args for usernames and passwords and emit a warning?

@@ -38,8 +34,7 @@ export interface OpensearchDomainStackProps extends StackPropsExt {
readonly openAccessPolicyEnabled?: boolean
readonly useUnsignedBasicAuth?: boolean,
readonly fineGrainedManagerUserARN?: string,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically a managed service creation option. I'd like to completely pull out managed service creation in the future and place in a more targeted location.

@lewijacn
Copy link
Collaborator Author

lewijacn commented Jul 7, 2025

Why is there an empty osi_utils.py file? I went through the non-ts code and left some comments. Let me know if you need a thorough review on the CDK parts.

Thanks for catching have removed the file now

Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 7, 2025 19:39 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 7, 2025 19:39 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@gregschohn
Copy link
Collaborator

One other idea - could you scan the cdk's extra args for usernames and passwords and emit a warning?

Since our CDK implementation's days are numbered, we should just do this in the configure commands for the K8s workflow commands... Even better, route those values directly to secrets (even if they come in via a file).

if (args.targetArgs.username == null && System.getenv(TARGET_USERNAME_ENV_VAR) != null) {
args.targetArgs.username = System.getenv(TARGET_USERNAME_ENV_VAR);
addedEnvParams.add(TARGET_USERNAME_ENV_VAR);
if (args.targetArgs.username == null && System.getenv(ArgNameConstants.TARGET_USERNAME_ENV_ARG) != null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you added the env variables BEFORE the JCommander ones, would you need to check for null?
Should one take precedent over the other or should you throw an error if they're both specified (and different)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My stance (and what seems like should be expected) has been that command line arguments have priority over environment variables. I think we will get to the point where we are using command line arguments less, and they will be more an override than the norm

Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 9, 2025 17:18 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 9, 2025 17:18 — with GitHub Actions Inactive
…ands

Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 9, 2025 19:43 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 9, 2025 19:43 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 9, 2025 19:49 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 9, 2025 19:49 — with GitHub Actions Inactive
Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see the cdk.context.json comment for the secret arn give more direction, but I'm happy to get this merged in either way.

Comment on lines +38 to +42
SAFE_PRINT_COMMAND="$RFS_COMMAND"
if [[ "$RFS_COMMAND" == *"--target-password"* || "$RFS_COMMAND" == *"--targetPassword"* ]]; then
SAFE_PRINT_COMMAND=$(echo "$RFS_COMMAND" | sed -E 's/--target(-)?[Pp]assword[ =]"?[^"[:space:]]+"?/--target-password=******/g')
fi
echo "Executing RFS_COMMAND: $SAFE_PRINT_COMMAND"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required if the java application itself unconditionally prints the arguments?

Copy link
Collaborator Author

@lewijacn lewijacn Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have kept it as a debugging reference in case there is some issue with the entrypoint.sh before we start the RFS app

@@ -38,8 +38,7 @@ dependencies {
implementation libs.log4j.slf4j2.impl
implementation libs.slf4j.api
implementation libs.disruptor
implementation libs.aws.arns
implementation libs.aws.secretsmanager
implementation libs.aws.auth
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this for - or what was included in the previous 2 outgoing libraries that we needed to backfill?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes was included in the previous libraries. Specifically the Replayer still has need of this import for SigV4: import software.amazon.awssdk.auth.credentials.DefaultCredentialsProvider;

"// basic auth documentation": "The next two lines are relevant for basic auth only",
"username": "<USERNAME>",
"passwordFromSecretArn": "<ARN_OF_SECRET_CONTAINING_PASSWORD>",
"// basic auth documentation": "The next line is relevant for basic auth only",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per the warning below "Provide only userSecretArn or username/password, not both" - we should make the doc/comment more descriptive and explain what else you won't need. Assume that for many, this is the main documentation that they'll interact with to configure their CDK.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've gone back and forth with this. The options.md does list the options for users more intent on using plaintext username/password. However, as a default I'd like to steer users to using a secret. I'm open to changing this though if we find this to be problematic

@lewijacn lewijacn temporarily deployed to migrations-cicd July 11, 2025 18:39 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 11, 2025 18:39 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 11, 2025 18:59 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 11, 2025 18:59 — with GitHub Actions Inactive
Signed-off-by: Tanner Lewis <lewijacn@amazon.com>
@lewijacn lewijacn temporarily deployed to migrations-cicd July 11, 2025 19:14 — with GitHub Actions Inactive
@lewijacn lewijacn temporarily deployed to migrations-cicd July 11, 2025 19:14 — with GitHub Actions Inactive
@lewijacn lewijacn merged commit 0306d23 into opensearch-project:main Jul 11, 2025
58 checks passed
Copy link

codecov bot commented Jul 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (56c3373) to head (a0ca08d).
Report is 3 commits behind head on main.

Additional details and impacted files
@@     Coverage Diff      @@
##   main   #1620   +/-   ##
============================
============================

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants