Bal 1657 otel traces support new #3167

alonp99 · 2025-03-17T14:54:10Z

Summary by CodeRabbit

New Features
- Introduced distributed tracing and enhanced observability with improved integration to external monitoring tools.
- Added new configurations that support advanced service monitoring and instrumentation of HTTP requests.
- Implemented a new class for filtering incoming HTTP requests based on specified URL patterns.
- Enhanced error handling for environment variables to ensure robustness.
Refactor
- Simplified logging by streamlining metadata handling and centralizing trace identification.
- Updated request handling to consistently propagate trace information across the application for improved diagnostics.

…-otel-traces-support

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…o/ballerine into bal-1657-otel-traces-support

changeset-bot · 2025-03-17T14:54:14Z

⚠️ No Changeset found

Latest commit: 6fac7e1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2025-03-17T14:54:18Z

Walkthrough

This pull request integrates distributed tracing capabilities into the workflows service using OpenTelemetry and Jaeger. It adds a new Docker Compose configuration for a Jaeger service and introduces multiple OpenTelemetry dependencies to enhance instrumentation. The Prisma schema now includes a preview feature for tracing. Additionally, components such as the logger, middleware, and Sentry service have been updated to use trace IDs from the OpenTelemetry context rather than generated request IDs. New modules for HTTP instrumentation and tracer initialization further support the tracing functionality.

Changes

File(s)	Change Summary
`services/workflows-service/docker-compose.otel.yml`	New Docker Compose file defining a Jaeger tracing service with multiple port mappings.
`services/workflows-service/package.json`	Added multiple OpenTelemetry and instrumentation dependencies to enhance observability capabilities.
`services/workflows-service/prisma/schema.prisma`	Added `previewFeatures = ["tracing"]` in the `generator client` block to enable tracing support.
`services/workflows-service/src/common/app-logger/app-logger.service.ts`	Updated `getLogMetadata` by removing the `reqId` property from the logging metadata.
`services/workflows-service/src/common/middlewares/request-id.middleware.ts`	Renamed `setReqId` to `setTraceId` and modified logic to retrieve trace IDs from the OpenTelemetry context instead of generating a random UUID, setting the `X-Request-ID` header accordingly.
`services/workflows-service/src/main.ts`	Imported and invoked `tracingSdk.start()` at the start of the application to initialize tracing.
`services/workflows-service/src/sentry/sentry.service.ts`	Modified `_setExtraData` to retrieve and set a `traceId` from CLS or the active OpenTelemetry span, replacing the use of a request ID.
`services/workflows-service/src/traces/http-intrumentation.ts`	New file adding `BallerineHttpInstrumentation`, extending the OpenTelemetry HTTP instrumentation with filtering for incoming requests based on URL patterns.
`services/workflows-service/src/traces/tracer.ts`	New file setting up OpenTelemetry tracing via the NodeSDK, configuring exporters, processors, resource detectors, and multiple instrumentations, and exporting the tracer instance for application-wide use.
`apps/backoffice-v2/src/common/env/schema.ts`	Enhanced error handling in `EnvSchema` for properties `VITE_AUTH_ENABLED`, `VITE_MOCK_SERVER`, and `VITE_FETCH_SIGNED_URL` with try-catch blocks to log warnings on parsing failures.

Suggested reviewers

tomer-shvadron
MatanYadaev

Poem

In fields of code I softly tread,
Amidst bits and bytes, my worries shed.
I hop with glee on tracing trails,
Leaving footprints where each trace prevails.
With carrots of logic, crisp and bright,
I celebrate changes from morning to night!
🥕🐇 Happy trails in every byte!

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)

We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
- To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbdbf76 and 6fac7e1.

📒 Files selected for processing (1)

apps/backoffice-v2/src/common/env/schema.ts (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (12)

GitHub Check: build-dashboard / build-and-push
GitHub Check: build-kyb / build-and-push
GitHub Check: build-wf-service / build-and-push
GitHub Check: build-backoffice / build-and-push
GitHub Check: test_windows
GitHub Check: test_linux
GitHub Check: spell_check
GitHub Check: build (windows-latest)
GitHub Check: lint
GitHub Check: build (ubuntu-latest)
GitHub Check: Analyze (javascript)
GitHub Check: format

🔇 Additional comments (3)

apps/backoffice-v2/src/common/env/schema.ts (3)

8-15: Great improvement in error handling for VITE_AUTH_ENABLED

Adding a try-catch block to handle parsing errors is a valuable defensive programming technique. This change prevents potential runtime exceptions that could occur when parsing invalid input values, and provides a sensible default value with appropriate logging.

16-23: Good error handling for VITE_MOCK_SERVER

This try-catch implementation follows the same robust pattern as the previous property, ensuring consistent error handling throughout the schema. The warning message is clear and helpful for debugging.

35-42: Error handling improvement for VITE_FETCH_SIGNED_URL

The addition of error handling with appropriate warnings and fallback values completes the consistent pattern across all boolean environment variables. This change enhances the robustness of the application's configuration system.

These changes align well with the PR's focus on observability improvements, as proper error handling is a foundation for good system observability.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (5)

services/workflows-service/src/main.ts (1)
1-3: Early initialization of OpenTelemetry tracer is good practice

Initializing the tracing SDK at the very beginning of the application bootstrap process ensures that all components and behaviors are properly instrumented and captured in traces.

Consider adding error handling to prevent application startup failures if tracing initialization fails:
 import { tracingSdk } from './traces/tracer';

-tracingSdk.start();
+try {
+  tracingSdk.start();
+  console.log('OpenTelemetry tracing initialized successfully');
+} catch (error) {
+  console.error('Failed to initialize OpenTelemetry tracing:', error);
+  // Continue application startup even if tracing fails
+}
services/workflows-service/src/sentry/sentry.service.ts (1)
47-54: Effective correlation of Sentry errors with OpenTelemetry traces

The implementation correctly retrieves the trace ID from either the CLS context or the active OpenTelemetry span, which enables correlation between error reports and distributed traces - a critical capability for debugging in a microservice architecture.

Consider adding a brief comment explaining the significance of this correlation for future maintainers:
 private _setExtraData(scope: Sentry.Scope, request?: Request) {
+  // Retrieve trace ID from CLS or active OpenTelemetry span to correlate Sentry errors with distributed traces
   let traceId = this.cls.get('traceId');
   if (!traceId) {
     const activeSpan = api.trace.getSpan(api.context.active());
     traceId = activeSpan?.spanContext().traceId;
   }

-  scope.setExtra('traceId', traceId ?? 'N/A');
+  // Ensure the trace ID is in a valid format before using it
+  const isValidTraceId = traceId && typeof traceId === 'string' && traceId.length > 0;
+  scope.setExtra('traceId', isValidTraceId ? traceId : 'N/A');
   scope.setExtra('assigneeId', this.cls.get('assigneeId'));
   scope.setExtra('workflowId', this.cls.get('workflowId'));
services/workflows-service/src/common/app-logger/app-logger.service.ts (1)
53-53: Consider logging the active trace ID for improved correlation.
Currently, the logger metadata no longer includes a request or trace correlation ID. Integrating the OpenTelemetry trace ID into metadata can greatly aid in debugging distributed systems.

A possible approach (using OpenTelemetry API) is:
+ import { context, trace } from '@opentelemetry/api';
...
private getLogMetadata() {
  const metadata = {};
+ const span = trace.getSpan(context.active());
+ if (span) {
+   const traceId = span.spanContext().traceId;
+   metadata.traceId = traceId;
+ }
  ...
  return metadata;
}
services/workflows-service/src/traces/tracer.ts (2)

1-25: Question the default tracer name.
You are creating a tracer with the name 'default'. For clarity, you might consider providing a discipline-specific name, such as 'wf-service', to reflect the codebase context.

35-63: Review Express instrumentation layers for completeness.
Ignoring request handlers and middleware can reduce overhead, but it might also skip essential route-level trace spans. Ensure this behavior aligns with engineering goals for inbound request visibility.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 862f811 and 812ef9e.

⛔ Files ignored due to path filters (1)

pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (9)

services/workflows-service/docker-compose.otel.yml (1 hunks)
services/workflows-service/package.json (1 hunks)
services/workflows-service/prisma/schema.prisma (1 hunks)
services/workflows-service/src/common/app-logger/app-logger.service.ts (1 hunks)
services/workflows-service/src/common/middlewares/request-id.middleware.ts (3 hunks)
services/workflows-service/src/main.ts (1 hunks)
services/workflows-service/src/sentry/sentry.service.ts (2 hunks)
services/workflows-service/src/traces/http-intrumentation.ts (1 hunks)
services/workflows-service/src/traces/tracer.ts (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (5)

GitHub Check: build-wf-service / build-and-push
GitHub Check: build-dashboard / build-and-push
GitHub Check: build-backoffice / build-and-push
GitHub Check: build-kyb / build-and-push
GitHub Check: Analyze (javascript)

🔇 Additional comments (11)

services/workflows-service/docker-compose.otel.yml (1)

1-15: OpenTelemetry integration with Jaeger is well-configured

The Docker Compose configuration for Jaeger tracing service properly exposes all necessary ports for the various Jaeger functionalities, including UDP ports for trace ingestion (6831, 6832), UI access (16686), and OpenTelemetry Protocol ports (4317, 4318).

Consider adding these improvements for production readiness:

A container name for easier reference in logs and commands

Resource limits to prevent resource contention

Health checks for better orchestration

Volume mounts for persistent storage of traces
 version: '3'
 services:
   tracing:
     image: jaegertracing/all-in-one:1.55
+    container_name: jaeger-tracing
     ports:
       - 6831:6831/udp
       - 6832:6832/udp
       - 5778:5778
       - 16686:16686
       - 4317:4317
       - 4318:4318
       - 14250:14250
       - 14268:14268
       - 14269:14269
       - 9411:9411
+    volumes:
+      - jaeger-data:/badger
+    healthcheck:
+      test: ["CMD", "wget", "--spider", "-q", "http://localhost:16686"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+    restart: unless-stopped
+    deploy:
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+
+volumes:
+  jaeger-data:
services/workflows-service/prisma/schema.prisma (1)

8-8: Good addition of Prisma tracing preview feature

Adding the tracing preview feature to Prisma is a key component for end-to-end distributed tracing with OpenTelemetry. This enables capturing of database query performance metrics and correlation with traces.

Keep in mind this is a preview feature in Prisma, so it may have limitations or behavior changes in future releases.

services/workflows-service/src/sentry/sentry.service.ts (1)

1-1: Good integration of OpenTelemetry API

Adding the OpenTelemetry API import facilitates accessing the current trace context.

services/workflows-service/src/traces/http-intrumentation.ts (2)

4-18: Looks good for creating a flexible exclusion filter.
The createFilter function neatly converts the provided filters into regular expressions and checks for URL matches. This approach should be efficient and readable, as it’s only initialized once in the constructor.

20-33: Confirm the decision to ignore all outgoing requests.
By returning true in ignoreOutgoingRequestHook, no outgoing requests will be traced. If that’s intentional (e.g., you only want to focus on inbound requests), this is fine. If you intended to trace external calls, consider adjusting this logic.

Please verify if ignoring all outgoing requests is desired. If not, you could remove or modify the hook to capture relevant outbound traffic.
services/workflows-service/src/traces/tracer.ts (2)

65-71: Graceful tracing shutdown is well-implemented.
The SIGTERM handling logic cleanly ends tracing. This approach helps ensure no pending spans are left unexported.

27-34:
✅ Verification successful

Verify environment naming and configuration.
Currently, you apply a SimpleSpanProcessor only when env.ENVIRONMENT_NAME === 'local'. Confirm that your environment variable aligns with your local development environment naming. Otherwise, the intended local logging behavior may not occur.

🏁 Script executed:
#!/bin/bash
# Check for references to ENVIRONMENT_NAME in the codebase to ensure consistency
rg 'ENVIRONMENT_NAME'
Length of output: 6173
ENVIRONMENT_NAME Configuration Verified

The code in services/workflows-service/src/traces/tracer.ts correctly uses env.ENVIRONMENT_NAME === 'local' to trigger the SimpleSpanProcessor for local logging. Our repository search confirms that:

The local environment is defined as local in .env.example and in the environment schema (src/env.ts).

This naming is used consistently throughout the codebase (e.g., in swagger, tests, and other modules).

No discrepancies were found. Please ensure that local development environments continue to use local for ENVIRONMENT_NAME so that this behavior remains intact.
services/workflows-service/src/common/middlewares/request-id.middleware.ts (3)

1-1: Adding OpenTelemetry imports for distributed tracing

These imports enable the middleware to access the trace context and integrate with the OpenTelemetry tracing system.

24-24: Method name updated to reflect new tracing functionality

The method name change from setReqId to setTraceId accurately reflects its new purpose of using OpenTelemetry trace IDs instead of randomly generated request IDs.

68-77: Properly implemented trace ID extraction from OpenTelemetry

The implementation correctly retrieves the trace ID from the active span context and uses it for request tracking. This enables distributed tracing across services while maintaining backward compatibility by continuing to use the same header name.

A few observations:

Good null safety check with the optional chaining operator and the conditional check

The trace ID is correctly stored in both the CLS service and response header

Maintains the same X-Request-ID header name for backward compatibility

services/workflows-service/package.json (1)

69-85: Comprehensive OpenTelemetry instrumentation dependencies added

The added dependencies provide a complete distributed tracing solution for the application:

Core OpenTelemetry packages for API and SDK functionality

Multiple exporters (Jaeger, OTLP) for flexible backend integration

Specific instrumentations for various components (AWS SDK, Express, HTTP, NestJS, Winston)

Resource detectors for cloud and container environments

Propagators for cross-service trace context sharing

Prisma instrumentation for database operation tracing

This comprehensive approach will enable end-to-end tracing across the entire application stack.

- Improve parsing logic for VITE_AUTH_ENABLED, VITE_MOCK_SERVER, and VITE_FETCH_SIGNED_URL - Add console warnings for failed JSON parsing and default to true

liorzblrn and others added 21 commits March 21, 2024 17:39

feat: otel support

bd87c13

fix: when no trace id for sentry use N/A

1f6c4f6

Merge branch 'dev' of github.com:ballerine-io/ballerine into bal-1657…

b32fa45

…-otel-traces-support

fix: presist jaegertracing/all-in-one to 1.55

8db8fa6

fix: remove url property

20bc3e9

Merge remote-tracking branch 'origin' into bal-1657-otel-traces-support

3c1d79f

Update services/workflows-service/src/traces/tracer.ts

62e7f62

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

chore: data migration

99c3328

Merge remote-tracking branch 'origin' into bal-1657-otel-traces-support

347c0cc

fix: http instrumentation

e4607ee

fix: lock

bd02e7a

Update services/workflows-service/src/traces/http-intrumentation.ts

7b020bf

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

fix: ballerine http instrumentaiton

c4e1e1d

Merge branch 'bal-1657-otel-traces-support' of github.com:ballerine-i…

8738727

…o/ballerine into bal-1657-otel-traces-support

chore: update from dev

f12c6a8

fix: improve tracing

f2fbedf

chore: merge dev changes

9882b09

feat: added new script to just run server

b3a41da

chore: merge from dev

0fdb7d6

Merge branch 'bal-1657-otel-traces-support' of github.com:ballerine-i…

60ba7ad

…o/ballerine into bal-1657-otel-traces-support

fix(workflows-plugins): individuals sanaction plugin fixes

812ef9e

vercel bot deployed to Preview March 17, 2025 14:54 View deployment

alonp99 added the deploy-pr For running PR environment label Mar 17, 2025

coderabbitai bot reviewed Mar 17, 2025

View reviewed changes

fix(workflows-plugins): individuals sanaction plugin fixes

d4dce91

vercel bot had a problem deploying to Preview March 17, 2025 15:10 Failure

fix(workflows-plugins): individuals sanaction plugin fixes

bbdbf76

vercel bot had a problem deploying to Preview March 17, 2025 16:07 Failure

fix(env): enhance error handling for environment variables

6fac7e1

- Improve parsing logic for VITE_AUTH_ENABLED, VITE_MOCK_SERVER, and VITE_FETCH_SIGNED_URL - Add console warnings for failed JSON parsing and default to true

vercel bot deployed to Preview March 17, 2025 19:32 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bal 1657 otel traces support new #3167

Bal 1657 otel traces support new #3167

alonp99 commented Mar 17, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Mar 17, 2025 •

edited

Loading

coderabbitai bot commented Mar 17, 2025 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

Bal 1657 otel traces support new #3167

Are you sure you want to change the base?

Bal 1657 otel traces support new #3167

Conversation

alonp99 commented Mar 17, 2025 • edited by coderabbitai bot Loading

Summary by CodeRabbit

changeset-bot bot commented Mar 17, 2025 • edited Loading

⚠️ No Changeset found

coderabbitai bot commented Mar 17, 2025 • edited Loading

Walkthrough

Changes

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

alonp99 commented Mar 17, 2025 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Mar 17, 2025 •

edited

Loading

coderabbitai bot commented Mar 17, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)