Skip to content

Auto-Configure ADOT SDK Defaults for Genesis #392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

yiyuan-he
Copy link
Contributor

What does this pull request do?

Defaults many environment variables when AGENT_OBSERVABILITY_ENABLED=true to streamline enablement process for customers.

Before:

docker run -p 8000:8000 \
           -e "OTEL_METRICS_EXPORTER=awsemf" \
           -e "OTEL_TRACES_EXPORTER=otlp" \
           -e "OTEL_LOGS_EXPORTER=otlp" \
           -e "OTEL_PYTHON_DISTRO=aws_distro" \
           -e "OTEL_PYTHON_CONFIGURATOR=aws_configurator" \
           -e "OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf" \
           -e "OTEL_RESOURCE_ATTRIBUTES=service.name=ticketing-agent" \
           -e "OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true" \
           -e "OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true" \
           -e "OTEL_AWS_APPLICATION_SIGNALS_ENABLED=false" \
           -e "OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=https://xray.us-west-2.amazonaws.com/v1/traces" \
           -e "OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=https://logs.us-west-2.amazonaws.com/v1/logs" \
           -e "OTEL_EXPORTER_OTLP_LOGS_HEADERS=x-aws-log-group=test/genesis,x-aws-log-stream=default,x-aws-metric-namespace=genesis-test" \
           -e "OTEL_PYTHON_DISABLED_INSTRUMENTATIONS=http,sqlalchemy,psycopg2,pymysql,sqlite3,aiopg,asyncpg,mysql_connector,botocore,boto3,urllib3,requests,starlette" \
           -e "AGENT_OBSERVABILITY_ENABLED=true" \
           genesis-poc

After:

docker run -p 8000:8000 \
           -e "OTEL_PYTHON_DISTRO=aws_distro" \
           -e "OTEL_PYTHON_CONFIGURATOR=aws_configurator" \
           -e "OTEL_RESOURCE_ATTRIBUTES=service.name=ticketing-agent,aws.log.group.names=test/genesis,cloud.resource_id=agent_arn" \
           -e "OTEL_EXPORTER_OTLP_LOGS_HEADERS=x-aws-log-group=test/genesis,x-aws-log-stream=default,x-aws-metric-namespace=genesis" \
           -e "AGENT_OBSERVABILITY_ENABLED=true" \
             genesis-poc

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@yiyuan-he yiyuan-he requested a review from a team as a code owner June 12, 2025 00:37
@yiyuan-he yiyuan-he changed the title Auto-Configure ADOT SDK Defaults for Caton Auto-Configure ADOT SDK Defaults for Genesis Jun 12, 2025
@yiyuan-he yiyuan-he force-pushed the configure-genesis-defaults branch from 264042e to 2c5ffd6 Compare June 12, 2025 00:50
yiyuan-he and others added 9 commits June 16, 2025 17:34
## What does this pull request do?
Bumps our OTel Dependency versions to
[1.33.0/0.54b0](https://github.com/open-telemetry/opentelemetry-python/releases/tag/v1.33.0)
to support compatability with third-party AI Instrumentation
libraries/frameworks such as OpenInference, Traceloop/Openllmetry, and
OpenLit.

We do not bump to the latest upstream version
[1.34.0/0.55b0](https://github.com/open-telemetry/opentelemetry-python/releases/tag/v1.34.0)
because that release includes `BatchLogRecordProcessor` refactoring
which is not compatible with our Caton changes.


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
…ility#397)

Reverts aws-observability#388

## Why?
Bumping the OTel dependency versions is currently causing our main build
due to spans not being generated correctly. For example in an SNS call,
we see that `aws.local.service` is not being populated correctly:
```
{
    "name": "testTopic send",
    "context": {
        "trace_id": "0x684c92d9eecb9548c12f90342875a8f3",
        "span_id": "0xfd714402fb0429f9",
        "trace_state": "[]"
    },
    "kind": "SpanKind.PRODUCER",
    "parent_id": "0xa6868c3dde9d4839",
    "start_time": "2025-06-13T21:06:33.612183Z",
    "end_time": "2025-06-13T21:06:33.920669Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "rpc.system": "aws-api",
        "rpc.service": "SNS",
        "rpc.method": "Publish",
        "aws.region": "us-west-2",
        "server.address": "sns.us-west-2.amazonaws.com",
        "server.port": 443,
        "messaging.system": "aws.sns",
        "messaging.destination_kind": "topic",
        "messaging.destination": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "messaging.destination.name": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "aws.sns.topic.arn": "arn:aws:sns:us-west-2:792479605405:testTopic",
        "aws.request_id": "8184c44e-c6db-5998-a9d2-a48853c2dd94",
        "retry_attempts": 0,
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.33.0",
            "service.name": "unknown_service",
            "cloud.provider": "aws",
            "cloud.platform": "aws_ec2",
            "cloud.account.id": "445567081046",
            "cloud.region": "us-east-1",
            "cloud.availability_zone": "us-east-1b",
            "host.id": "i-09dfcf17712adbde4",
            "host.type": "c5a.12xlarge",
            "host.name": "ip-172-31-43-64.ec2.internal",
            "telemetry.auto.version": "0.9.0.dev0-aws",
            "aws.local.service": "UnknownService"
        },
        "schema_url": ""
    }
}
{
    "name": "GET /server_request",
    "context": {
        "trace_id": "0x684c92d9eecb9548c12f90342875a8f3",
        "span_id": "0xa6868c3dde9d4839",
        "trace_state": "[]"
    },
    "kind": "SpanKind.SERVER",
    "parent_id": null,
    "start_time": "2025-06-13T21:06:33.610724Z",
    "end_time": "2025-06-13T21:06:33.920935Z",
    "status": {
        "status_code": "UNSET"
    },
    "attributes": {
        "http.method": "GET",
        "http.server_name": "127.0.0.1",
        "http.scheme": "http",
        "net.host.name": "localhost:8082",
        "http.host": "localhost:8082",
        "net.host.port": 8082,
        "http.target": "/server_request?param=.%2Fsample-applications%2Fsimple-client-server%2Fclient.py",
        "net.peer.ip": "127.0.0.1",
        "net.peer.port": 34778,
        "http.user_agent": "python-requests/2.32.2",
        "http.flavor": "1.1",
        "http.route": "/server_request",
        "http.status_code": 200
    },
    "events": [],
    "links": [],
    "resource": {
        "attributes": {
            "telemetry.sdk.language": "python",
            "telemetry.sdk.name": "opentelemetry",
            "telemetry.sdk.version": "1.33.0",
            "service.name": "unknown_service",
            "cloud.provider": "aws",
            "cloud.platform": "aws_ec2",
            "cloud.account.id": "445567081046",
            "cloud.region": "us-east-1",
            "cloud.availability_zone": "us-east-1b",
            "host.id": "i-09dfcf17712adbde4",
            "host.type": "c5a.12xlarge",
            "host.name": "ip-172-31-43-64.ec2.internal",
            "telemetry.auto.version": "0.9.0.dev0-aws",
            "aws.local.service": "UnknownService"
        },
        "schema_url": ""
    }
}
```

Previously these contract tests were passing in the PR build as well as
locally with these dependency version bumps so we are not sure why they
are failing all of a sudden. As a short-term mitigation, we will revert
these changes as we investigate further.
Same as:
aws-observability/aws-otel-java-instrumentation#1096


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
…Setup (aws-observability#398)

## What does this pull request do?
Fixes an issue where
[upgrading](aws-observability#388)
our OTel dependency version from 1.27.0 caused all of our contract tests
to start
[failing](https://github.com/aws-observability/aws-otel-python-instrumentation/actions/runs/15640951584/job/44067918087)
in the main build.

The root cause was that in version
[1.28.0](https://github.com/open-telemetry/opentelemetry-python-contrib/releases/tag/v0.49b0)
OpenTelemetry Python SDK migrated from `pkg_resources` to
`importlib_metadata` for entry point discovery. This was a [breaking
change](open-telemetry/opentelemetry-python-contrib#2871)
that had significant behavioral implications:
- **Before (pkg_resources):** Entry points were discovered in `sys.path`
order, meaing packages installed in the local test environment (e.g.
venv) were always prioritized. This made ADOT discovery predictable and
consistent even without explicitly specifying `OTEL_PYTHON_DISTRO` and
`OTEL_PYTHON_CONFIGURATOR` in the contract test set up.
- **After (importlib_metadata):** Entry points are discovered using an
implementation ordering that doesn't guarantee `sys.path` precedence. In
short, the discovery order depends on factors like filesystem iteration
order, installation timestamps, etc. - things that can vary between
environments. This is why our contract tests were able to pass in
original PR build to bump the OTel dependencies, but then started
failing in our main build.

Due to this unpredicatable ordering, our ADOT SDK was not able to
instrument the sample apps in our contract tests correctly which then
resulted in all the test assertions failing.

The solution is to explicitly configure the OpenTelemetry distro and
configurator in our contract test set up. This approach follows
OpenTelemetry's [official
recommendations](https://pypi.org/project/opentelemetry-instrumentation/)
when multiple distros are present.
> If you have entry points for multiple distros or configurators present
in your environment, you should specify the entry point name of the
distro and configurator you want to be used via the OTEL_PYTHON_DISTRO
and OTEL_PYTHON_CONFIGURATOR environment variables.

**This fix will enable us to safely upgrade our OTel dependency version
from 1.27.0 which unblocks the Caton project.**


By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants