[elasticsearchexporter] Support 'encoding.format' scope attribute for dataset routing #42844

tommyers-elastic · 2025-09-23T09:57:33Z

Description

Adds an additional way to set dataset based on scope attributes set by econding extensions. The main use case we are supporting with this right now is awslogsencodingextension which sets encoding.format like aws.cloudtrail, aws.vpcflow, aws.elbaccess. See #42899 for more detail.

Link to tracking issue

Fixes

Testing

Unit test added.

Documentation

README updated.

exporter/elasticsearchexporter/data_stream_router.go

mlunadia · 2025-09-24T16:12:18Z

exporter/elasticsearchexporter/data_stream_router.go

-	// 3. receiver-based routing
-	// 4. use default hardcoded data_stream.*
+	// 3. extension-based routing
+	// 4. receiver-based routing


Is the expected usage that extension-based routing is prioritized before receiver-based?

yeh, this was my hunch - i figured you could have a receiver e.g. filelog, with additional per-log-type encoding type annotations that are more specific.

exporter/elasticsearchexporter/data_stream_router.go

MichaelKatsoulis · 2025-09-26T08:38:06Z

@tommyers-elastic the relevant encoding extension PR is approved and waiting to get merged. What are the next steps on this one?

tommyers-elastic · 2025-09-26T12:54:43Z

@MichaelKatsoulis i updated and refactored a little. ready for review now.

carsonip

thanks, lgtm.

minor Q: do we want this encoding.format in the final document? I assume it is fine to keep them, which is what is happening now. Otherwise, to exclude them, it'll be some changes in exporter/elasticsearchexporter/internal/serializer/otelserializer/common.go:53

exporter/elasticsearchexporter/README.md

exporter/elasticsearchexporter/data_stream_router.go

exporter/elasticsearchexporter/data_stream_router_test.go

exporter/elasticsearchexporter/data_stream_router.go

tommyers-elastic · 2025-09-26T19:41:53Z

do we want this encoding.format in the final document?

not sure. i think it probably makes sense to keep it

carsonip

lgtm, thanks!

ChrsMark · 2025-10-02T09:32:22Z

exporter/elasticsearchexporter/README.md

   1. `data_stream.dataset` or `data_stream.namespace` in attributes (precedence: log record / data point / span attribute > scope attribute > resource attribute)
-   2. Otherwise, if scope name matches regex `/receiver/(\w*receiver)`, `data_stream.dataset` will be capture group #1
-   3. Otherwise, `data_stream.dataset` falls back to `generic` and `data_stream.namespace` falls back to `default`. 
+   2. Otherwise, if a scope attribute with the name `encoding.format` exists and contains a string value, `data_stream.dataset` will be set to this value. 


Isn't the addition of this quite arbitrary? What if users want to use another scope/resource attribute for routing? Also what if there are components that already populate this attribute? Hardcoding this here will break the routing for them, right? In this, why not to make this configurable so as users can define the routing rule based on whatever they would like to i.e. routing_attribute: resource.attributes["someattr"] (similarly to ottl)

can't they already effectively do this just by using the explicit data_stream.* attributes?

Also what if there are components that already populate this attribute?

this is a fair point, but we're not currently aware of any.

why not to make this configurable so as users can define the routing rule based on whatever they would like

the intention here is to provide a default behaviour so encoding extension + es exporter works OOTB.

can't they already effectively do this just by using the explicit data_stream.* attributes?

No without explicitly setting these. Users might want to avoid setting them for several reasons, and instead route based on existing attributes.

the intention here is to provide a default behaviour so encoding extension + es exporter works OOTB.

My concern here is about if a generic exporter like this one, should be coupled that much to specific components like the one you mention. To me this looks weird and isn't really future-proof/extensible. That's why I suggested making this configurable.

However, since this has been approved by a code-owner you can dismiss my review if others agree with proceeding (consider marking this as a breaking change).

I tend to agree that document routing in the Elasticsearch exporter is already complex and is getting even more complex with this change. This makes it harder for users to understand what's going on by just looking at the config or by looking at the data that landed in Elasticsearch. Questions like "Where's my data?" or "Why was my data routed to this index?" are increasingly harder to answer.

I understand the need to for the component to "work out of the box". The problem I see with this is that everyone seems to have their own, different idea of what "out of the box" behavior looks like.

This change proposes to use the encoding.format scope attribute to route documents to Elasticsearch. This attribute is currently only set by one (1) of the ten (10) encoding extensions in Contrib, making it not quite as general as one might assume.

It would be much much simpler if the Elasticsearch exporter only routed based on the data_stream.* attributes. Those attributes may be set in a processor based on any other attribute or scope name. I would prefer this approach.

Having said all this, I'll defer to the component's code owners on this, at least for this change.

Updated myself on the issue and from what I understand so far, there is already an issue to drive upstream semconv to include encoding.format as scope attributes. If/when we have this, if I understand correctly, we would still end up handling this in the ES exporter. In this sense, it is somewhat of an experimental addition - @tommyers-elastic, maybe we can mark it experimental and subject to change until the semconv goes forward?

Regarding treating this as a breaking change, I don't have an opinion, but we should surely have this mentioned in the changelog in some format so that users who might have this attribute in their pipeline are not surprised by the side effects.

Zooming out a bit. Let's consider data_stream.dataset handling is Elasticsearch specific, and ES exporter would like to implement routing based on a standard. And encoding.format would be a proposed standard that es exporter is willing to implement as an experimental feature. I see this as a healthy addition that makes es exporter more operable with other components without requiring the other components to implement an elasticsearch specific attribute, or requiring the user to add a processor just to glue 2 components together.

On the topic of configurable routing attribute, making it a configuration won't make routing less complex. Not all routing are attribute based, e.g. telemetry scope name routing that requires parsing. Not to mention the order between the routing rules themselves. Probably YAGNI and we can refactor the code to make it configurable in the future if we really see the need.

@carsonip do you consider the whole routing behavior as subject to change? If so, that should be documented imo.

Regarding the current addition, if that's experimental as @lahsivjar mentioned, how about adding this behind a feature gate? That's a common practice for the Collector project when it comes to experimental features.

A slightly less ideal alternative to the feature gate would be to just clearly document that the added routing rule is experimental. Though that's a bit weird when it affects the whole default routing behavior 🤔 .

Note that any changes to the routing logic also need to be applied to the Elasticsearch OTLP endpoint.
For this proposed change, I’m a bit worried that the attribute is too specific about encoding extensions. It feels like a slippery slope for adding more similar routing attributes. Can we instead either set an appropriate scope name in the encoding extensions or introduce a more generic concept, something like a dataset or schema name that describes the shape of the data that we can use for routing?

carsonip

putting this PR on hold as we realign on the e2e design

carsonip

reapproving this PR as this PR is revived as an outcome of realignment

andrzej-stencel

The code owners decided to use the encoding.format for now.

otelbot · 2025-10-13T11:14:14Z

Thank you for your contribution @tommyers-elastic! 🎉 We would like to hear from you about your experience contributing to OpenTelemetry by taking a few minutes to fill out this survey. If you are getting started contributing, you can also join the CNCF Slack channel #opentelemetry-new-contributors to ask for guidance and get help.

… dataset routing (open-telemetry#42844)  #### Description Adds an additional way to set dataset based on scope attributes set by econding extensions. The main use case we are supporting with this right now is awslogsencodingextension which sets `encoding.format` like `aws.cloudtrail`, `aws.vpcflow`, `aws.elbaccess`. See open-telemetry#42899 for more detail.  #### Link to tracking issue Fixes  #### Testing Unit test added.  #### Documentation README updated.  --------- Co-authored-by: Carson Ip <carsonip@users.noreply.github.com>

MichaelKatsoulis reviewed Sep 24, 2025

View reviewed changes

exporter/elasticsearchexporter/data_stream_router.go Outdated Show resolved Hide resolved

mlunadia reviewed Sep 24, 2025

View reviewed changes

exporter/elasticsearchexporter/data_stream_router.go Outdated Show resolved Hide resolved

mlunadia reviewed Sep 24, 2025

View reviewed changes

exporter/elasticsearchexporter/data_stream_router.go Outdated Show resolved Hide resolved

vigneshshanmugam reviewed Sep 24, 2025

View reviewed changes

exporter/elasticsearchexporter/data_stream_router.go Outdated Show resolved Hide resolved

tommyers-elastic marked this pull request as ready for review September 26, 2025 12:54

tommyers-elastic requested a review from a team as a code owner September 26, 2025 12:54

tommyers-elastic requested a review from ArthurSens September 26, 2025 12:54

github-actions bot assigned MovieStoreGuy Sep 26, 2025

github-actions bot added the exporter/elasticsearch label Sep 26, 2025

github-actions bot requested review from JaredTan95, carsonip and lahsivjar September 26, 2025 12:54

tommyers-elastic changed the title ~~[elasticsearchexporter] Add dataset rule for awslogsencodingextension using scope attributes~~ [elasticsearchexporter] Support 'encoding.format' scope attribute for dataset routing Sep 26, 2025

MichaelKatsoulis approved these changes Sep 26, 2025

View reviewed changes

carsonip reviewed Sep 26, 2025

View reviewed changes

tommyers-elastic requested a review from carsonip September 26, 2025 19:45

carsonip approved these changes Sep 29, 2025

View reviewed changes

ChrsMark reviewed Oct 2, 2025

View reviewed changes

tommyers-elastic requested a review from ChrsMark October 2, 2025 10:02

theletterf approved these changes Oct 3, 2025

View reviewed changes

carsonip approved these changes Oct 3, 2025

View reviewed changes

carsonip suggested changes Oct 6, 2025

View reviewed changes

joecompute mentioned this pull request Oct 8, 2025

Add generic encoding.format to GCP encoding extension #43320

Open

carsonip approved these changes Oct 10, 2025

View reviewed changes

tommyers-elastic requested review from MovieStoreGuy and crobert-1 as code owners October 10, 2025 10:34

tommyers-elastic requested a review from andrzej-stencel October 13, 2025 10:19

andrzej-stencel approved these changes Oct 13, 2025

View reviewed changes

andrzej-stencel merged commit c15e845 into open-telemetry:main Oct 13, 2025
240 of 248 checks passed

github-actions bot added this to the next release milestone Oct 13, 2025

[elasticsearchexporter] Support 'encoding.format' scope attribute for dataset routing #42844

[elasticsearchexporter] Support 'encoding.format' scope attribute for dataset routing #42844

Uh oh!

Conversation

tommyers-elastic commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Link to tracking issue

Testing

Documentation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichaelKatsoulis commented Sep 26, 2025

Uh oh!

tommyers-elastic commented Sep 26, 2025

Uh oh!

carsonip left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tommyers-elastic commented Sep 26, 2025

Uh oh!

carsonip left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahsivjar Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lahsivjar Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carsonip Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

carsonip left a comment

Choose a reason for hiding this comment

Uh oh!

carsonip left a comment

Choose a reason for hiding this comment

Uh oh!

andrzej-stencel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

otelbot bot commented Oct 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

tommyers-elastic commented Sep 23, 2025 •

edited

Loading

lahsivjar Oct 3, 2025 •

edited

Loading

lahsivjar Oct 3, 2025 •

edited

Loading

carsonip Oct 3, 2025 •

edited

Loading