Skip to content

New component: Enrichment Processor #41816

@jsvd

Description

@jsvd

The purpose and use-cases of the new component

This issue is a follow up to a presentation regarding enhancing enrichment capabilities of the Collector on the Collector SIG meeting (July 23rd).
The feedback was to create an issue to get the discussion going, so here it is:

The OpenTelemetry Collector currently supports limited enrichment types, mostly focusing on self-contained parsing and contextual metadata. To improve the versatility of the Collector in comparison to other data collectors and transformation tools, we should expand its capabilities to include other enrichment types.

The original document “Enrichment in OTel Collector” introduces a taxonomy of enrichment types and its support in the Collector:
• Type 1: Self-Contained Parsing & Derivation (supported)
• Type 2: Reference Data Lookup (Static or Semi-Static) (very limited support)
• Type 3: Dynamic External Enrichment (Live Lookups) (not supported)
• Type 4: Contextual Metadata Enrichment (supported)
• Type 5: Cross-Event Correlation & Aggregation (not supported)
• Type 6: Analytical & ML-Based Enrichment (not supported)

Of this list, looking at similar tools (comparison can be seen in the original document), type 2 and type 3 are the strongest candidates to include in the Collector to facilitate migration of workloads to the Collector from other tools.

From this problem statement we could consider introducing a Lookup Processor to aimed at handling both static reference data lookups (Type 2) and dynamic external enrichments (Type 3).

The processor would support:

  • Local lookups: Using static or semi-static data sources such as CSV, JSON, or inline key/value pairs.
  • Remote lookups: Dynamic enrichment from APIs, DNS, databases, or cache systems like Redis or Memcached.

Example configuration for the component

processors:
  lookupprocessor/json:
    source: json
    path: "/tmp/file.json"
    json_path: .process.name
    source_attribute: process.name
    target_attribute: aws.log.group.names
    timeout: 25ms
    refresh_interval: 1h

  lookupprocessor/http:
    source: http
    url: "https://my.url/query"
    params:
      org: resource.attributes["foo"]
    target_attribute: org
    timeout: 1s
    cache:
      size: 100
      ttl: 60 # seconds

Implementing such a processor requires significant considerations to abide by the long term vision, namely:
• Progressive implementation, starting with basic local and then remote lookup capabilities.
• Modular structure facilitating easy addition of new lookup sources.
• Built-in caching and timeout mechanisms for performance optimization.
• Comprehensive and useful observability metrics (success/failure rates, latency percentiles).

This is not an exhaustive list of concerns and neither does it provide solutions, just an acknowledgment that these will have to be addressed.

Telemetry data types supported

Logs, Metrics, Traces, Profiles

Code Owner(s)

@jsvd

Sponsor (optional)

No response

Additional context

Similar ideas and alternative solutions have been suggested in the past:

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions