Skip to content

Bal 1657 otel traces support new #3167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: dev
Choose a base branch
from

Conversation

alonp99
Copy link
Collaborator

@alonp99 alonp99 commented Mar 17, 2025

Summary by CodeRabbit

  • New Features

    • Introduced distributed tracing and enhanced observability with improved integration to external monitoring tools.
    • Added new configurations that support advanced service monitoring and instrumentation of HTTP requests.
    • Implemented a new class for filtering incoming HTTP requests based on specified URL patterns.
    • Enhanced error handling for environment variables to ensure robustness.
  • Refactor

    • Simplified logging by streamlining metadata handling and centralizing trace identification.
    • Updated request handling to consistently propagate trace information across the application for improved diagnostics.

Copy link

changeset-bot bot commented Mar 17, 2025

⚠️ No Changeset found

Latest commit: 6fac7e1

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link
Contributor

coderabbitai bot commented Mar 17, 2025

Walkthrough

This pull request integrates distributed tracing capabilities into the workflows service using OpenTelemetry and Jaeger. It adds a new Docker Compose configuration for a Jaeger service and introduces multiple OpenTelemetry dependencies to enhance instrumentation. The Prisma schema now includes a preview feature for tracing. Additionally, components such as the logger, middleware, and Sentry service have been updated to use trace IDs from the OpenTelemetry context rather than generated request IDs. New modules for HTTP instrumentation and tracer initialization further support the tracing functionality.

Changes

File(s) Change Summary
services/workflows-service/docker-compose.otel.yml New Docker Compose file defining a Jaeger tracing service with multiple port mappings.
services/workflows-service/package.json Added multiple OpenTelemetry and instrumentation dependencies to enhance observability capabilities.
services/workflows-service/prisma/schema.prisma Added previewFeatures = ["tracing"] in the generator client block to enable tracing support.
services/workflows-service/src/common/app-logger/app-logger.service.ts Updated getLogMetadata by removing the reqId property from the logging metadata.
services/workflows-service/src/common/middlewares/request-id.middleware.ts Renamed setReqId to setTraceId and modified logic to retrieve trace IDs from the OpenTelemetry context instead of generating a random UUID, setting the X-Request-ID header accordingly.
services/workflows-service/src/main.ts Imported and invoked tracingSdk.start() at the start of the application to initialize tracing.
services/workflows-service/src/sentry/sentry.service.ts Modified _setExtraData to retrieve and set a traceId from CLS or the active OpenTelemetry span, replacing the use of a request ID.
services/workflows-service/src/traces/http-intrumentation.ts New file adding BallerineHttpInstrumentation, extending the OpenTelemetry HTTP instrumentation with filtering for incoming requests based on URL patterns.
services/workflows-service/src/traces/tracer.ts New file setting up OpenTelemetry tracing via the NodeSDK, configuring exporters, processors, resource detectors, and multiple instrumentations, and exporting the tracer instance for application-wide use.
apps/backoffice-v2/src/common/env/schema.ts Enhanced error handling in EnvSchema for properties VITE_AUTH_ENABLED, VITE_MOCK_SERVER, and VITE_FETCH_SIGNED_URL with try-catch blocks to log warnings on parsing failures.

Suggested reviewers

  • tomer-shvadron
  • MatanYadaev

Poem

In fields of code I softly tread,
Amidst bits and bytes, my worries shed.
I hop with glee on tracing trails,
Leaving footprints where each trace prevails.
With carrots of logic, crisp and bright,
I celebrate changes from morning to night!
🥕🐇 Happy trails in every byte!

Tip

⚡🧪 Multi-step agentic review comment chat (experimental)
  • We're introducing multi-step agentic chat in review comments. This experimental feature enhances review discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments.
    - To enable this feature, set early_access to true under in the settings.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bbdbf76 and 6fac7e1.

📒 Files selected for processing (1)
  • apps/backoffice-v2/src/common/env/schema.ts (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (12)
  • GitHub Check: build-dashboard / build-and-push
  • GitHub Check: build-kyb / build-and-push
  • GitHub Check: build-wf-service / build-and-push
  • GitHub Check: build-backoffice / build-and-push
  • GitHub Check: test_windows
  • GitHub Check: test_linux
  • GitHub Check: spell_check
  • GitHub Check: build (windows-latest)
  • GitHub Check: lint
  • GitHub Check: build (ubuntu-latest)
  • GitHub Check: Analyze (javascript)
  • GitHub Check: format
🔇 Additional comments (3)
apps/backoffice-v2/src/common/env/schema.ts (3)

8-15: Great improvement in error handling for VITE_AUTH_ENABLED

Adding a try-catch block to handle parsing errors is a valuable defensive programming technique. This change prevents potential runtime exceptions that could occur when parsing invalid input values, and provides a sensible default value with appropriate logging.


16-23: Good error handling for VITE_MOCK_SERVER

This try-catch implementation follows the same robust pattern as the previous property, ensuring consistent error handling throughout the schema. The warning message is clear and helpful for debugging.


35-42: Error handling improvement for VITE_FETCH_SIGNED_URL

The addition of error handling with appropriate warnings and fallback values completes the consistent pattern across all boolean environment variables. This change enhances the robustness of the application's configuration system.

These changes align well with the PR's focus on observability improvements, as proper error handling is a foundation for good system observability.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
services/workflows-service/src/main.ts (1)

1-3: Early initialization of OpenTelemetry tracer is good practice

Initializing the tracing SDK at the very beginning of the application bootstrap process ensures that all components and behaviors are properly instrumented and captured in traces.

Consider adding error handling to prevent application startup failures if tracing initialization fails:

 import { tracingSdk } from './traces/tracer';

-tracingSdk.start();
+try {
+  tracingSdk.start();
+  console.log('OpenTelemetry tracing initialized successfully');
+} catch (error) {
+  console.error('Failed to initialize OpenTelemetry tracing:', error);
+  // Continue application startup even if tracing fails
+}
services/workflows-service/src/sentry/sentry.service.ts (1)

47-54: Effective correlation of Sentry errors with OpenTelemetry traces

The implementation correctly retrieves the trace ID from either the CLS context or the active OpenTelemetry span, which enables correlation between error reports and distributed traces - a critical capability for debugging in a microservice architecture.

Consider adding a brief comment explaining the significance of this correlation for future maintainers:

 private _setExtraData(scope: Sentry.Scope, request?: Request) {
+  // Retrieve trace ID from CLS or active OpenTelemetry span to correlate Sentry errors with distributed traces
   let traceId = this.cls.get('traceId');
   if (!traceId) {
     const activeSpan = api.trace.getSpan(api.context.active());
     traceId = activeSpan?.spanContext().traceId;
   }

-  scope.setExtra('traceId', traceId ?? 'N/A');
+  // Ensure the trace ID is in a valid format before using it
+  const isValidTraceId = traceId && typeof traceId === 'string' && traceId.length > 0;
+  scope.setExtra('traceId', isValidTraceId ? traceId : 'N/A');
   scope.setExtra('assigneeId', this.cls.get('assigneeId'));
   scope.setExtra('workflowId', this.cls.get('workflowId'));
services/workflows-service/src/common/app-logger/app-logger.service.ts (1)

53-53: Consider logging the active trace ID for improved correlation.
Currently, the logger metadata no longer includes a request or trace correlation ID. Integrating the OpenTelemetry trace ID into metadata can greatly aid in debugging distributed systems.

A possible approach (using OpenTelemetry API) is:

+ import { context, trace } from '@opentelemetry/api';
...
private getLogMetadata() {
  const metadata = {};
+ const span = trace.getSpan(context.active());
+ if (span) {
+   const traceId = span.spanContext().traceId;
+   metadata.traceId = traceId;
+ }
  ...
  return metadata;
}
services/workflows-service/src/traces/tracer.ts (2)

1-25: Question the default tracer name.
You are creating a tracer with the name 'default'. For clarity, you might consider providing a discipline-specific name, such as 'wf-service', to reflect the codebase context.


35-63: Review Express instrumentation layers for completeness.
Ignoring request handlers and middleware can reduce overhead, but it might also skip essential route-level trace spans. Ensure this behavior aligns with engineering goals for inbound request visibility.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 862f811 and 812ef9e.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (9)
  • services/workflows-service/docker-compose.otel.yml (1 hunks)
  • services/workflows-service/package.json (1 hunks)
  • services/workflows-service/prisma/schema.prisma (1 hunks)
  • services/workflows-service/src/common/app-logger/app-logger.service.ts (1 hunks)
  • services/workflows-service/src/common/middlewares/request-id.middleware.ts (3 hunks)
  • services/workflows-service/src/main.ts (1 hunks)
  • services/workflows-service/src/sentry/sentry.service.ts (2 hunks)
  • services/workflows-service/src/traces/http-intrumentation.ts (1 hunks)
  • services/workflows-service/src/traces/tracer.ts (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
  • GitHub Check: build-wf-service / build-and-push
  • GitHub Check: build-dashboard / build-and-push
  • GitHub Check: build-backoffice / build-and-push
  • GitHub Check: build-kyb / build-and-push
  • GitHub Check: Analyze (javascript)
🔇 Additional comments (11)
services/workflows-service/docker-compose.otel.yml (1)

1-15: OpenTelemetry integration with Jaeger is well-configured

The Docker Compose configuration for Jaeger tracing service properly exposes all necessary ports for the various Jaeger functionalities, including UDP ports for trace ingestion (6831, 6832), UI access (16686), and OpenTelemetry Protocol ports (4317, 4318).

Consider adding these improvements for production readiness:

  1. A container name for easier reference in logs and commands
  2. Resource limits to prevent resource contention
  3. Health checks for better orchestration
  4. Volume mounts for persistent storage of traces
 version: '3'
 services:
   tracing:
     image: jaegertracing/all-in-one:1.55
+    container_name: jaeger-tracing
     ports:
       - 6831:6831/udp
       - 6832:6832/udp
       - 5778:5778
       - 16686:16686
       - 4317:4317
       - 4318:4318
       - 14250:14250
       - 14268:14268
       - 14269:14269
       - 9411:9411
+    volumes:
+      - jaeger-data:/badger
+    healthcheck:
+      test: ["CMD", "wget", "--spider", "-q", "http://localhost:16686"]
+      interval: 10s
+      timeout: 5s
+      retries: 3
+    restart: unless-stopped
+    deploy:
+      resources:
+        limits:
+          cpus: '1'
+          memory: 1G
+
+volumes:
+  jaeger-data:
services/workflows-service/prisma/schema.prisma (1)

8-8: Good addition of Prisma tracing preview feature

Adding the tracing preview feature to Prisma is a key component for end-to-end distributed tracing with OpenTelemetry. This enables capturing of database query performance metrics and correlation with traces.

Keep in mind this is a preview feature in Prisma, so it may have limitations or behavior changes in future releases.

services/workflows-service/src/sentry/sentry.service.ts (1)

1-1: Good integration of OpenTelemetry API

Adding the OpenTelemetry API import facilitates accessing the current trace context.

services/workflows-service/src/traces/http-intrumentation.ts (2)

4-18: Looks good for creating a flexible exclusion filter.
The createFilter function neatly converts the provided filters into regular expressions and checks for URL matches. This approach should be efficient and readable, as it’s only initialized once in the constructor.


20-33: Confirm the decision to ignore all outgoing requests.
By returning true in ignoreOutgoingRequestHook, no outgoing requests will be traced. If that’s intentional (e.g., you only want to focus on inbound requests), this is fine. If you intended to trace external calls, consider adjusting this logic.

Please verify if ignoring all outgoing requests is desired. If not, you could remove or modify the hook to capture relevant outbound traffic.

services/workflows-service/src/traces/tracer.ts (2)

65-71: Graceful tracing shutdown is well-implemented.
The SIGTERM handling logic cleanly ends tracing. This approach helps ensure no pending spans are left unexported.


27-34:

✅ Verification successful

Verify environment naming and configuration.
Currently, you apply a SimpleSpanProcessor only when env.ENVIRONMENT_NAME === 'local'. Confirm that your environment variable aligns with your local development environment naming. Otherwise, the intended local logging behavior may not occur.


🏁 Script executed:

#!/bin/bash
# Check for references to ENVIRONMENT_NAME in the codebase to ensure consistency
rg 'ENVIRONMENT_NAME'

Length of output: 6173


ENVIRONMENT_NAME Configuration Verified

The code in services/workflows-service/src/traces/tracer.ts correctly uses env.ENVIRONMENT_NAME === 'local' to trigger the SimpleSpanProcessor for local logging. Our repository search confirms that:

  • The local environment is defined as local in .env.example and in the environment schema (src/env.ts).
  • This naming is used consistently throughout the codebase (e.g., in swagger, tests, and other modules).

No discrepancies were found. Please ensure that local development environments continue to use local for ENVIRONMENT_NAME so that this behavior remains intact.

services/workflows-service/src/common/middlewares/request-id.middleware.ts (3)

1-1: Adding OpenTelemetry imports for distributed tracing

These imports enable the middleware to access the trace context and integrate with the OpenTelemetry tracing system.


24-24: Method name updated to reflect new tracing functionality

The method name change from setReqId to setTraceId accurately reflects its new purpose of using OpenTelemetry trace IDs instead of randomly generated request IDs.


68-77: Properly implemented trace ID extraction from OpenTelemetry

The implementation correctly retrieves the trace ID from the active span context and uses it for request tracking. This enables distributed tracing across services while maintaining backward compatibility by continuing to use the same header name.

A few observations:

  • Good null safety check with the optional chaining operator and the conditional check
  • The trace ID is correctly stored in both the CLS service and response header
  • Maintains the same X-Request-ID header name for backward compatibility
services/workflows-service/package.json (1)

69-85: Comprehensive OpenTelemetry instrumentation dependencies added

The added dependencies provide a complete distributed tracing solution for the application:

  • Core OpenTelemetry packages for API and SDK functionality
  • Multiple exporters (Jaeger, OTLP) for flexible backend integration
  • Specific instrumentations for various components (AWS SDK, Express, HTTP, NestJS, Winston)
  • Resource detectors for cloud and container environments
  • Propagators for cross-service trace context sharing
  • Prisma instrumentation for database operation tracing

This comprehensive approach will enable end-to-end tracing across the entire application stack.

- Improve parsing logic for VITE_AUTH_ENABLED, VITE_MOCK_SERVER, and VITE_FETCH_SIGNED_URL
- Add console warnings for failed JSON parsing and default to true
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deploy-pr For running PR environment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants