Skip to content

Adding GPU E2E Test #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/eks-add-on-integ-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
name: Run EKS addon Integration Tests
env:
TERRAFORM_AWS_ASSUME_ROLE: ${{ secrets.TERRAFORM_AWS_ASSUME_ROLE }}

on:
workflow_dispatch:
inputs:
Expand All @@ -24,6 +23,7 @@ on:
default: true
description: "Run in EKS Addon Beta environment"


concurrency:
group: ${{ github.workflow }}-${{ github.ref_name }}
cancel-in-progress: true
Expand Down
145 changes: 145 additions & 0 deletions .github/workflows/gpu-e2e-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: Apache-2.0

name: Run GPU E2E Test
env:
TERRAFORM_AWS_ASSUME_ROLE: ${{ secrets.TERRAFORM_AWS_ASSUME_ROLE }}
on:
workflow_dispatch:
inputs:
addon_name:
required: true
type: string
default: "amazon-cloudwatch-observability"
description: "GPU E2E Test"
addon_version:
required: true
type: string
default: "v1.1.0-eksbuild.1"
description: "EKS addon version"
run_in_beta:
required: true
type: boolean
default: true
description: "Run in EKS Addon Beta environment"
pull_request:
branches:
- main
concurrency:
group: ${{ github.workflow }}-${{ github.ref_name }}
cancel-in-progress: true

permissions:
id-token: write
contents: read

jobs:
GenerateTestMatrix:
name: 'GenerateTestMatrix'
runs-on: ubuntu-latest
outputs:
eks_addon_matrix: ${{ steps.set-matrix.outputs.eks_addon_matrix }}
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Extract PR Information
id: pr_info
run: |
# Check if pull request title exists
if [ -n "${{ github.event.pull_request.title }}" ]; then
addon_name=$(echo ${{ github.event.pull_request.title }} | cut -d' ' -f1)
addon_version=$(echo ${{ github.event.pull_request.title }} | cut -d' ' -f2)
else
# Use default values if no title provided
addon_name="amazon-cloudwatch-observability"
addon_version="v1.6.0-eksbuild.1"
fi
# Set outputs
echo "::set-output name=addon_name::$addon_name"
echo "::set-output name=addon_version::$addon_version"
echo "::set-output name=run_in_beta::true"

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ env.TERRAFORM_AWS_ASSUME_ROLE }}
aws-region: us-west-2

- name: Generate matrix
id: set-matrix
run: |
echo "::set-output name=eks_addon_matrix::$(echo $(cat integration-tests/generator/k8s_versions_matrix.json))"

- name: Echo test plan matrix
run: |
echo "eks_addon_matrix: ${{ steps.set-matrix.outputs.eks_addon_matrix }}"
echo "Addon name ${{ steps.pr_info.outputs.addon_name }}, addon version ${{ steps.pr_info.outputs.addon_version }}"
GPUE2ETest:
needs: [GenerateTestMatrix]
name: GPUE2ETest
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
arrays: ${{ fromJson(needs.GenerateTestMatrix.outputs.eks_addon_matrix) }}
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v3
with:
fetch-depth: 0

- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v2
with:
role-to-assume: ${{ env.TERRAFORM_AWS_ASSUME_ROLE }}
aws-region: us-west-2

- name: Confirm EKS Version Support
run: |
if [[
$(go list -m k8s.io/client-go | cut -d ' ' -f 2 | cut -d '.' -f 2) -lt $( echo ${{ matrix.arrays.k8sVersion }} | cut -d '.' -f 2)
|| $(go list -m k8s.io/apimachinery | cut -d ' ' -f 2 | cut -d '.' -f 2) -lt $( echo ${{ matrix.arrays.k8sVersion }} | cut -d '.' -f 2)
|| $(go list -m k8s.io/component-base | cut -d ' ' -f 2 | cut -d '.' -f 2) -lt $( echo ${{ matrix.arrays.k8sVersion }} | cut -d '.' -f 2)
|| $(go list -m k8s.io/kubectl | cut -d ' ' -f 2 | cut -d '.' -f 2) -lt $( echo ${{ matrix.arrays.k8sVersion }} | cut -d '.' -f 2)
]]; then
echo k8s.io/client-go $(go list -m k8s.io/client-go) is less than ${{ matrix.arrays.k8sVersion }}
echo or k8s.io/apimachinery $(go list -m k8s.io/apimachinery) is less than ${{ matrix.arrays.k8sVersion }}
echo or k8s.io/component-base $(go list -m k8s.io/component-base) is less than ${{ matrix.arrays.k8sVersion }}
echo or k8s.io/kubectl $(go list -m k8s.io/kubectl) is less than ${{ matrix.arrays.k8sVersion }}, fail test
echo "please run go get -u && go mod tidy"
exit 1;
fi

- name: Verify Terraform version
run: terraform --version

- name: Terraform apply
uses: nick-fields/retry@v2
with:
max_attempts: 1
timeout_minutes: 60 # EKS takes about 20 minutes to spin up a cluster and service on the cluster
retry_wait_seconds: 5
command: |
cd integration-tests/terraform/gpu

terraform init
if terraform apply -var="beta=${{ steps.pr_info.outputs.run_in_beta }}" -var="addon_name=${{ steps.pr_info.outputs.addon_name }}" -var="addon_version=${{ steps.pr_info.outputs.addon_version }}" -var="k8s_version=${{ matrix.arrays.k8sVersion }}" --auto-approve; then
terraform destroy -var="beta=${{ steps.pr_info.outputs.run_in_beta }}" -auto-approve
else
terraform destroy -var="beta=${{ steps.pr_info.outputs.run_in_beta }}" -auto-approve && exit 1
fi

- name: Terraform destroy
if: ${{ cancelled() || failure() }}
uses: nick-fields/retry@v2
with:
max_attempts: 3
timeout_minutes: 8
retry_wait_seconds: 5
command: |
cd integration-tests/terraform/gpu

terraform destroy -var="beta=${{ steps.pr_info.outputs.run_in_beta }}" --auto-approve
13 changes: 6 additions & 7 deletions .github/workflows/operator-integration-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,12 +70,12 @@ jobs:

- name: Test for default instrumentation resources for Java
run: |
kubectl apply -f integration-tests/java/sample-deployment-java.yaml
# sleep is to make sure the app is fully active before restart wait doesn't work as accurately
sleep 5
kubectl get pods -A
kubectl describe pods -n default
go run integration-tests/manifests/cmd/validate_instrumentation_vars.go default integration-tests/java/default_instrumentation_java_env_variables.json
kubectl apply -f integration-tests/java/sample-deployment-java.yaml
# sleep is to make sure the app is fully active before restart wait doesn't work as accurately
sleep 5
kubectl get pods -A
kubectl describe pods -n default
go run integration-tests/manifests/cmd/validate_instrumentation_vars.go default integration-tests/java/default_instrumentation_java_env_variables.json

- name: Test for defined instrumentation resources for Java
run: |
Expand Down Expand Up @@ -330,4 +330,3 @@ jobs:
sleep 5
go test -v -run TestAlreadyAutoAnnotatedResourceShouldNotRestart ./integration-tests/manifests/annotations -timeout 30m


61 changes: 49 additions & 12 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ replace github.com/openshift/api v3.9.0+incompatible => github.com/openshift/api

require (
dario.cat/mergo v1.0.0
github.com/aws/amazon-cloudwatch-agent-test v0.0.0-20240430141315-a273a24f6b33
github.com/go-logr/logr v1.4.1
github.com/google/uuid v1.4.0
github.com/mitchellh/mapstructure v1.5.0
github.com/openshift/api v3.9.0+incompatible
github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring v0.70.0
Expand All @@ -19,7 +21,7 @@ require (
go.opentelemetry.io/collector/featuregate v0.77.0
go.opentelemetry.io/otel v1.21.0
go.uber.org/zap v1.25.0
golang.org/x/exp v0.0.0-20231006140011-7918f672742d
golang.org/x/exp v0.0.0-20231127185646-65229373498e
gopkg.in/yaml.v2 v2.4.0
k8s.io/api v0.29.0
k8s.io/apimachinery v0.29.0
Expand All @@ -31,18 +33,51 @@ require (
)

require (
cloud.google.com/go/compute v1.23.0 // indirect
cloud.google.com/go/compute v1.23.3 // indirect
cloud.google.com/go/compute/metadata v0.2.3 // indirect
collectd.org v0.5.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.8.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.4.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/internal v1.3.0 // indirect
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/compute/armcompute/v4 v4.2.1 // indirect
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork/v2 v2.2.1 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.1.1 // indirect
github.com/DataDog/datadog-go v4.8.3+incompatible // indirect
github.com/Microsoft/go-winio v0.6.1 // indirect
github.com/armon/go-metrics v0.4.1 // indirect
github.com/aws/aws-sdk-go v1.45.25 // indirect
github.com/aws/aws-sdk-go v1.48.12 // indirect
github.com/aws/aws-sdk-go-v2 v1.23.5 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.5.3 // indirect
github.com/aws/aws-sdk-go-v2/config v1.25.11 // indirect
github.com/aws/aws-sdk-go-v2/credentials v1.16.9 // indirect
github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.12.9 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.14.9 // indirect
github.com/aws/aws-sdk-go-v2/feature/s3/manager v1.15.4 // indirect
github.com/aws/aws-sdk-go-v2/internal/configsources v1.2.8 // indirect
github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.5.8 // indirect
github.com/aws/aws-sdk-go-v2/internal/ini v1.7.1 // indirect
github.com/aws/aws-sdk-go-v2/internal/v4a v1.2.8 // indirect
github.com/aws/aws-sdk-go-v2/service/cloudformation v1.42.0 // indirect
github.com/aws/aws-sdk-go-v2/service/cloudwatch v1.31.2 // indirect
github.com/aws/aws-sdk-go-v2/service/cloudwatchlogs v1.29.2 // indirect
github.com/aws/aws-sdk-go-v2/service/dynamodb v1.26.3 // indirect
github.com/aws/aws-sdk-go-v2/service/dynamodbstreams v1.18.2 // indirect
github.com/aws/aws-sdk-go-v2/service/ec2 v1.138.2 // indirect
github.com/aws/aws-sdk-go-v2/service/ecs v1.35.2 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/accept-encoding v1.10.3 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/checksum v1.2.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/endpoint-discovery v1.8.9 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/presigned-url v1.10.8 // indirect
github.com/aws/aws-sdk-go-v2/service/internal/s3shared v1.16.8 // indirect
github.com/aws/aws-sdk-go-v2/service/s3 v1.47.2 // indirect
github.com/aws/aws-sdk-go-v2/service/ssm v1.44.2 // indirect
github.com/aws/aws-sdk-go-v2/service/sso v1.18.2 // indirect
github.com/aws/aws-sdk-go-v2/service/ssooidc v1.21.2 // indirect
github.com/aws/aws-sdk-go-v2/service/sts v1.26.2 // indirect
github.com/aws/aws-sdk-go-v2/service/xray v1.23.2 // indirect
github.com/aws/smithy-go v1.18.1 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/cncf/xds/go v0.0.0-20230607035331-e9ce68804cb4 // indirect
github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect
Expand Down Expand Up @@ -75,8 +110,7 @@ require (
github.com/google/go-querystring v1.1.0 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/s2a-go v0.1.7 // indirect
github.com/google/uuid v1.3.1 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.1 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/googleapis/gax-go/v2 v2.12.0 // indirect
github.com/gophercloud/gophercloud v1.7.0 // indirect
github.com/gorilla/websocket v1.5.0 // indirect
Expand Down Expand Up @@ -125,6 +159,9 @@ require (
github.com/prometheus/client_model v0.5.0 // indirect
github.com/prometheus/common v0.44.0 // indirect
github.com/prometheus/procfs v0.11.1 // indirect
github.com/prozz/aws-embedded-metrics-golang v1.2.0 // indirect
github.com/qri-io/jsonpointer v0.1.1 // indirect
github.com/qri-io/jsonschema v0.2.1 // indirect
github.com/scaleway/scaleway-sdk-go v1.0.0-beta.21 // indirect
github.com/spf13/cobra v1.7.0 // indirect
github.com/stretchr/objx v0.5.0 // indirect
Expand All @@ -133,21 +170,21 @@ require (
go.opentelemetry.io/otel/trace v1.21.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/crypto v0.17.0 // indirect
golang.org/x/mod v0.13.0 // indirect
golang.org/x/mod v0.14.0 // indirect
golang.org/x/net v0.19.0 // indirect
golang.org/x/oauth2 v0.13.0 // indirect
golang.org/x/sys v0.15.0 // indirect
golang.org/x/term v0.15.0 // indirect
golang.org/x/text v0.14.0 // indirect
golang.org/x/time v0.3.0 // indirect
golang.org/x/tools v0.14.0 // indirect
golang.org/x/tools v0.16.0 // indirect
gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect
google.golang.org/api v0.147.0 // indirect
google.golang.org/api v0.149.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20231002182017-d307bd883b97 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20231012201019-e917dd12ba7a // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231009173412-8bfb1ae86b6c // indirect
google.golang.org/grpc v1.58.3 // indirect
google.golang.org/genproto v0.0.0-20231127180814-3a041ad873d4 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20231127180814-3a041ad873d4 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20231127180814-3a041ad873d4 // indirect
google.golang.org/grpc v1.59.0 // indirect
google.golang.org/protobuf v1.33.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
Expand Down
Loading