Skip to content

Conversation

@frankhereford
Copy link
Member

@frankhereford frankhereford commented Oct 27, 2025

Associated issues

This PR aims to close cityofaustin/atd-data-tech#24398.

Intent

This PR extends our graphql-engine CI such that it monitors the task definitions for our graphql-engine services in our Moped clusters, and installs and deploys them when they change.

Testing

As always, testing CI is a little different.

First, announce that you're testing to the devs. This is a one-person-testing-at-a-time type of PR.

When you are confident that it's just you working on this PR, then fire up your code editor, check out this branch and:

  • Edit staging.graphql-engine.ecs-td.json. We need to make a substantive change, not just white space. I suggest swithcing the "memory" key's value from 2048 to 4096. Remember to change it back when you're done please.
  • Commit this change into this feature branch and go ahead and push it up to GitHub.
  • Check out the actions in the GitHub web UI to see it work. Did it?
  • Confirm the changes in the AWS console in the ECS section. Looking good?

Hard-ish Mode

  • Did the workflow maintain the list of registered TDs to not be longer than the last 3?
  • Can you change the task definition in a non-substantive way (such as adding whitespace) and have the CI detect that no substantive changes have been made and not deploy the TD needlessly?
  • Did the CI properly serialize the tasks of deploying migrations & metadata independently of the TD deployment?
  • Did the CI show a meaningful diff showing just what had changed in the TD when it deployed it?
  • Does the CI know when it's not deployed a TD and know not to purge old TDs nor update the ECS service?
  • Does the CI detect a failed TD creation (such as if it's invalid or otherwise unsuitable) and not try to purge or restart anything?

⚠️ Before merger operations

  • There are two lines, about here & here, which need to be removed and reinstated respectively prior to merger.

Ship list

…orkflow

- Introduced functions for determining and validating ECS task definitions based on the branch name.
- Enhanced the migration metadata deployment workflow to include steps for checking out code, configuring AWS credentials, and validating the ECS task definition.
- Updated comments for clarity and organization in the helper script.
- Replaced the previous AWS CLI installation method with the official installation instructions for AWS CLI v2.
- Commented out the old installation and configuration steps for clarity and future reference.
- Renamed the function for validating ECS task definitions to better reflect its purpose, changing it to register the task definition.
- Updated the logic to register the ECS task definition using AWS CLI instead of validating it.
- Adjusted comments for clarity regarding the registration process and its success or failure.
- Introduced a new function to update the ECS service with the latest task definition.
- Added environment variables for the ECS cluster and service.
- Enhanced logging to provide more context during the update process.
- Updated the main task definition update process to include the service update step.
@netlify
Copy link

netlify bot commented Oct 27, 2025

Deploy Preview for atd-moped-main ready!

Name Link
🔨 Latest commit ebe3748
🔍 Latest deploy log https://app.netlify.com/projects/atd-moped-main/deploys/6903a4463f60850008f38f89
😎 Deploy Preview https://deploy-preview-1707--atd-moped-main.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

echo "GR: ${GITHUB_REF}"
echo "PWD: $(pwd)"
source $(pwd)/.github/workflows/migrations-metadata-deployment-helper.sh
# run_migration_process
Copy link
Member Author

@frankhereford frankhereford Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to avoid blasting main's database stack with the migrations and metadata in this branch as we dev and test, so this line is commented out. It needs to be commented back in just prior to merger (or as a direct push on main should we forget.)

run: |
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install --update
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was surprised to learn that we use the AWS cli version 1 in our old, existing CI here, but in retrospect, the timing is not too surprising. V2 came out in 2020. V1 is still supported, but V2 is the future.

Anyway, that's why I --update here.

@frankhereford frankhereford self-assigned this Oct 27, 2025
Copy link
Member Author

@frankhereford frankhereford Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

H/T to @mddilley who spent some time with me looking as these task definitions until we were both satisfied that we were comfortable with them being released into the public. Thank you Mike 🙏


# Extract just the taskDefinition object and remove AWS-managed fields
# These fields are added by AWS and shouldn't be compared
jq --sort-keys '.taskDefinition | del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy, .deregisteredAt, .tags)' \
Copy link
Member Author

@frankhereford frankhereford Oct 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort keys is important here to make the ordering of KV pairs in the JSON deterministic, because we actually compare these json blobs with diff as if they were code, and not by doing a == inside JS or python or what-have-you.

Anyway - with jq here, we strip out things that are intrinsic (such as those items supplied by AWS) and we only look at what we're actually defining.

An upside of doing it this way is that it becomes trivial for us to include the output of that diff in the logs, which is a super easy way for a dev to figure out what change caused the new task definition to be deployed into the service.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really cool to see how you solved this. How are you feeling about how future proof this is, and also if concerns about false positives due to minor formatting differences?

Copy link
Collaborator

@mddilley mddilley Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frankhereford I might be way overthinking, but I'm curious if we could consider deploying the task definition if the version-controlled file changes at all instead of comparing to what is currently deployed. Since we will be reviewing any code changes in a PR, we could give ourselves responsibility for making sure that we understand the changes we are pushing to be deployed in the service (like we do right now during the launch parties).

I'm also thinking about needing to maintain this list of fields that are added by AWS in the future. I really could be worrying to much but wanted to get some discussion going on this in case I am overlooking something.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're doing some renaming of some files here to give them names that better represent what they do, not what they act upon.

branches:
- main
- production
- frank/version-control-ecs-task-definitions
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line needs to be removed before merger, and it is what allows us to test this PR.

@frankhereford frankhereford requested a review from Copilot October 27, 2025 21:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements version control for ECS task definitions for the graphql-engine service in Moped. The changes enable automatic monitoring and deployment of task definition updates through CI/CD, replacing manual task definition management with a declarative approach.

Key changes:

  • Adds new ECS task definition JSON files for staging and production environments
  • Introduces automated task definition deployment workflow that detects changes, registers updates, and manages service deployments
  • Consolidates and enhances the existing migration workflow to include task definition management

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
moped-database/ecs_task_definitions/staging.graphql-engine.ecs-td.json New task definition for staging graphql-engine service with 2GB memory allocation
moped-database/ecs_task_definitions/production.graphql-engine.ecs-td.json New task definition for production graphql-engine service with 8GB memory allocation
.github/workflows/migrations-metadata-deployment.yml Enhanced workflow that now handles both migrations/metadata and ECS task definition deployments
.github/workflows/migrations-metadata-deployment-helper.sh New comprehensive helper script containing functions for migrations, task definition registration, cleanup, and service updates
.github/workflows/aws-moped-migrations-helper.sh Removed legacy helper script, functionality moved to migrations-metadata-deployment-helper.sh
.github/workflows/atd_moped_database.yml Removed legacy workflow, replaced by migrations-metadata-deployment.yml

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@mddilley mddilley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am releasing my comments before I test the workflow after John. 🥽 🧪 I mentioned you in one question that I'm still thinking through.

},
"memory": 2048,
"memoryReservation": 2048,
"mountPoints": [ ],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a team-wide formatter for json that we're set on so I want to get feedback from others. I noticed that when I ran "Prettify JSON" from VSCode it remove the spaces between these brackets. I don't think it is blocking the merge of this since we will always review a code diff when updating this file anyways.

Mostly wondering if there is a good extension that I'm missing out on (or yet another one that doesn't work in my editor like with markdown 🫠 😅)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a side-by-side of the JSON in staging in the AWS console and noticed some small differences with compatibilities and requiresAttributes not present in this file but then I looked at the docs and see that we should be validating for Fargate only and that Fargate doesn't support the requiresAttributes parameter anyways. Makes sense to remove since we've always used Fargate and TIL! 🙏

Comment on lines +68 to +75
{
"name": "MOPED_API_ACTIONS_URL",
"valueFrom": "arn:aws:ssm:us-east-1:295525487728:parameter/MOPED_PRODUCTION_FARGATE_HASURA_ACTIONS_URL"
},
{
"name": "MOPED_API_APIKEY",
"valueFrom": "arn:aws:ssm:us-east-1:295525487728:parameter/MOPED_PRODUCTION_FARGATE_HASURA_API_KEY"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a spin-off issue snackoo to cover this but this reminds me that we have some cleanup to do in these secrets. I believe that these were used by our previous SQS-powered activity log events process.

################################################################################

#
# Download the Hasura settings from the AWS Secrets Manager
Copy link
Collaborator

@mddilley mddilley Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for cleaning up this last confusing reference to Zappa 😵‍💫🙏 and improving the comments ✨


# Extract just the taskDefinition object and remove AWS-managed fields
# These fields are added by AWS and shouldn't be compared
jq --sort-keys '.taskDefinition | del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy, .deregisteredAt, .tags)' \
Copy link
Collaborator

@mddilley mddilley Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frankhereford I might be way overthinking, but I'm curious if we could consider deploying the task definition if the version-controlled file changes at all instead of comparing to what is currently deployed. Since we will be reviewing any code changes in a PR, we could give ourselves responsibility for making sure that we understand the changes we are pushing to be deployed in the service (like we do right now during the launch parties).

I'm also thinking about needing to maintain this list of fields that are added by AWS in the future. I really could be worrying to much but wanted to get some discussion going on this in case I am overlooking something.

Copy link
Member

@johnclary johnclary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This so cool, Frank. My tests work exactly as expected. The diff logging is super fancy. It is great getting to learn some dev ops magic through your PRs 🙌

Glad to see your big bold reminder to re-enable the migrations job and remove this branch from the paths trigger 😉

🚢 🚢

#
# Deregisters old task definitions, keeping only the last 3 active ones
#
function cleanup_old_task_definitions() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a really nice touch, and i'm mostly taking it for granted that this works as promised 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see now that it does work! (with my latest revision the most recent of the three) ✨

Image

Copy link
Member

@johnclary johnclary Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@frankhereford one last thought on this. I wonder if we might want to increase to preserving maybe 10 of the most recent revisions. Reason being that although the latest task definition may install successfully, it is still possible that the definition has a faulty configuration that causes the container deployment to fail. If something goes sideways, it might be a relief to have a working task def available in AWS so we can recover quickly while we triage the issue. This would be excessive caution, of course, because we do have task definitions version controlled in Github as well.


# Extract just the taskDefinition object and remove AWS-managed fields
# These fields are added by AWS and shouldn't be compared
jq --sort-keys '.taskDefinition | del(.taskDefinitionArn, .revision, .status, .requiresAttributes, .compatibilities, .registeredAt, .registeredBy, .deregisteredAt, .tags)' \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really cool to see how you solved this. How are you feeling about how future proof this is, and also if concerns about false positives due to minor formatting differences?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Version control Moped graphl-engine task definitions to automate deployment

3 participants