Add OTEL metrics to cel input #47014
Draft
+15,073
−14,435
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed commit message
Added OTEL metrics to cel input to support collection of metrics per input periodic run in agentless environment.
Produces http and cel input metrics using the OTEL SDK and pushes the metric to either a defined endpoint or the console. No metrics are produced if no environment variables are set.
Defaults to producing a count for each defined metric for every periodic run instead of using histograms over the entire period that the integrations is running. This allows us to associate metrics with traces in future development.
If the OTEL environment variables OTEL_EXPORTER_OTLP_ENDPOINT, OTEL_EXPORTER_OTLP_HEADERS, OTEL_RESOURCE_ATTRIBUTES are set in an agent, OTEL OTLP metrics will be exported after each periodic run endpoint defined in OTEL_EXPORTER_OTLP_ENDPOINT.
Each metric is associated with a resource with the cel input id, the agent id, the integration name and version as well as the timestamp that the count was collected.
Integration name and version are now preserved from the source information sent by the agent.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
No. The default is to produce no metrics
Author's Checklist
How to test this PR locally
Reviewing this PR requires building beats, building elastic-agent, then running elastic-agent standalone against a cluster.
I used the serverless cluster on prod as this has the APM server running.
checkout branch, cd ../beats and build beats
DEV=true SNAPSHOT=true PLATFORMS=darwin/arm64 make snapshot
replace PLATFORMS with correct platform for builds on non MAC machines.
cd into elastic-agent repo and Build elastic-agent
DEV=true EXTERNAL=false SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=tar.gz mage -v package
replace PLATFORMS with correct platform for builds on non MAC machines.
in elastic-agent repo
cd ./build/distributions
tar -xvzf <elastic-agent-.tar.gz>
cd into untar directory elastic-agent-.
rm elastic-agent.yml (we will replace this before running the elastic-agent)
Create an observability serverless cluster
Get environment variables for APM
On Bottom left side click "Add Data"
copy the 3 environment variables. (note that OTEL_RESOURCE_ATTRIBUTES get sent but it's presence in the environment is used by CEL metrics to determine if metrics should be sent)
Use Add Agent to create a policy
On left click Fleet.
On UI upper right click "Add Agent"
Create a new agent policy
Under Advanced options->Agent Monitoring->Advanced Monitoring options, click "Enable HTTP endpoint at /liveness" and change the port to "6793" or some other unused port on your machine. This is not required if your machine is not running elastic-agent for monitoring.
Under 2 "Enroll in Fleet"-> choose "Run Standalone"
Under 3 "Configure the agent"-> click "Create API Key"
Download Policy.
Add an integration. I have a simple CEL integration package that requires no configuration if you want an easy one to use.
Add this integration to the already created policy and download. From the download, copy the new input form "-id" to end of the "-id" field into end of inputs section of the the already downloaded elastic-agent.yml policy file. Don't download the updated policy.
Copy the downloaded and updated policy elastic-agent.yml into ./build/distributions/elastic-agent*
Start the agent in development mode. In ./build/distributions/elastic-agent*
sudo OTEL_RESOURCE_ATTRIBUTES="<value>" OTEL_EXPORTER_OTLP_ENDPOINT="<value>" OTEL_EXPORTER_OTLP_HEADERS="<value>" ./elastic-agent run -e --develop &> output.txt
Check for data in the cluster.
On Left choose "Discover"
in Date View, use dropdown to find APM
Select APM.
Look for metrics beginning with
input.cel.* (cel processing metrics for each periodic run)
metrics.input.cel.* (http metrics)
Related issues
Use cases
Screenshots
Logs