-
Notifications
You must be signed in to change notification settings - Fork 222
Bal 1657 otel traces support new #3167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…-otel-traces-support
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…o/ballerine into bal-1657-otel-traces-support
…o/ballerine into bal-1657-otel-traces-support
|
WalkthroughThis pull request integrates distributed tracing capabilities into the workflows service using OpenTelemetry and Jaeger. It adds a new Docker Compose configuration for a Jaeger service and introduces multiple OpenTelemetry dependencies to enhance instrumentation. The Prisma schema now includes a preview feature for tracing. Additionally, components such as the logger, middleware, and Sentry service have been updated to use trace IDs from the OpenTelemetry context rather than generated request IDs. New modules for HTTP instrumentation and tracer initialization further support the tracing functionality. Changes
Suggested reviewers
Poem
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
⏰ Context from checks skipped due to timeout of 90000ms (12)
🔇 Additional comments (3)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (5)
services/workflows-service/src/main.ts (1)
1-3
: Early initialization of OpenTelemetry tracer is good practiceInitializing the tracing SDK at the very beginning of the application bootstrap process ensures that all components and behaviors are properly instrumented and captured in traces.
Consider adding error handling to prevent application startup failures if tracing initialization fails:
import { tracingSdk } from './traces/tracer'; -tracingSdk.start(); +try { + tracingSdk.start(); + console.log('OpenTelemetry tracing initialized successfully'); +} catch (error) { + console.error('Failed to initialize OpenTelemetry tracing:', error); + // Continue application startup even if tracing fails +}services/workflows-service/src/sentry/sentry.service.ts (1)
47-54
: Effective correlation of Sentry errors with OpenTelemetry tracesThe implementation correctly retrieves the trace ID from either the CLS context or the active OpenTelemetry span, which enables correlation between error reports and distributed traces - a critical capability for debugging in a microservice architecture.
Consider adding a brief comment explaining the significance of this correlation for future maintainers:
private _setExtraData(scope: Sentry.Scope, request?: Request) { + // Retrieve trace ID from CLS or active OpenTelemetry span to correlate Sentry errors with distributed traces let traceId = this.cls.get('traceId'); if (!traceId) { const activeSpan = api.trace.getSpan(api.context.active()); traceId = activeSpan?.spanContext().traceId; } - scope.setExtra('traceId', traceId ?? 'N/A'); + // Ensure the trace ID is in a valid format before using it + const isValidTraceId = traceId && typeof traceId === 'string' && traceId.length > 0; + scope.setExtra('traceId', isValidTraceId ? traceId : 'N/A'); scope.setExtra('assigneeId', this.cls.get('assigneeId')); scope.setExtra('workflowId', this.cls.get('workflowId'));services/workflows-service/src/common/app-logger/app-logger.service.ts (1)
53-53
: Consider logging the active trace ID for improved correlation.
Currently, the logger metadata no longer includes a request or trace correlation ID. Integrating the OpenTelemetry trace ID intometadata
can greatly aid in debugging distributed systems.A possible approach (using OpenTelemetry API) is:
+ import { context, trace } from '@opentelemetry/api'; ... private getLogMetadata() { const metadata = {}; + const span = trace.getSpan(context.active()); + if (span) { + const traceId = span.spanContext().traceId; + metadata.traceId = traceId; + } ... return metadata; }services/workflows-service/src/traces/tracer.ts (2)
1-25
: Question the default tracer name.
You are creating a tracer with the name'default'
. For clarity, you might consider providing a discipline-specific name, such as'wf-service'
, to reflect the codebase context.
35-63
: Review Express instrumentation layers for completeness.
Ignoring request handlers and middleware can reduce overhead, but it might also skip essential route-level trace spans. Ensure this behavior aligns with engineering goals for inbound request visibility.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pnpm-lock.yaml
is excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (9)
services/workflows-service/docker-compose.otel.yml
(1 hunks)services/workflows-service/package.json
(1 hunks)services/workflows-service/prisma/schema.prisma
(1 hunks)services/workflows-service/src/common/app-logger/app-logger.service.ts
(1 hunks)services/workflows-service/src/common/middlewares/request-id.middleware.ts
(3 hunks)services/workflows-service/src/main.ts
(1 hunks)services/workflows-service/src/sentry/sentry.service.ts
(2 hunks)services/workflows-service/src/traces/http-intrumentation.ts
(1 hunks)services/workflows-service/src/traces/tracer.ts
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (5)
- GitHub Check: build-wf-service / build-and-push
- GitHub Check: build-dashboard / build-and-push
- GitHub Check: build-backoffice / build-and-push
- GitHub Check: build-kyb / build-and-push
- GitHub Check: Analyze (javascript)
🔇 Additional comments (11)
services/workflows-service/docker-compose.otel.yml (1)
1-15
: OpenTelemetry integration with Jaeger is well-configuredThe Docker Compose configuration for Jaeger tracing service properly exposes all necessary ports for the various Jaeger functionalities, including UDP ports for trace ingestion (6831, 6832), UI access (16686), and OpenTelemetry Protocol ports (4317, 4318).
Consider adding these improvements for production readiness:
- A container name for easier reference in logs and commands
- Resource limits to prevent resource contention
- Health checks for better orchestration
- Volume mounts for persistent storage of traces
version: '3' services: tracing: image: jaegertracing/all-in-one:1.55 + container_name: jaeger-tracing ports: - 6831:6831/udp - 6832:6832/udp - 5778:5778 - 16686:16686 - 4317:4317 - 4318:4318 - 14250:14250 - 14268:14268 - 14269:14269 - 9411:9411 + volumes: + - jaeger-data:/badger + healthcheck: + test: ["CMD", "wget", "--spider", "-q", "http://localhost:16686"] + interval: 10s + timeout: 5s + retries: 3 + restart: unless-stopped + deploy: + resources: + limits: + cpus: '1' + memory: 1G + +volumes: + jaeger-data:services/workflows-service/prisma/schema.prisma (1)
8-8
: Good addition of Prisma tracing preview featureAdding the tracing preview feature to Prisma is a key component for end-to-end distributed tracing with OpenTelemetry. This enables capturing of database query performance metrics and correlation with traces.
Keep in mind this is a preview feature in Prisma, so it may have limitations or behavior changes in future releases.
services/workflows-service/src/sentry/sentry.service.ts (1)
1-1
: Good integration of OpenTelemetry APIAdding the OpenTelemetry API import facilitates accessing the current trace context.
services/workflows-service/src/traces/http-intrumentation.ts (2)
4-18
: Looks good for creating a flexible exclusion filter.
ThecreateFilter
function neatly converts the provided filters into regular expressions and checks for URL matches. This approach should be efficient and readable, as it’s only initialized once in the constructor.
20-33
: Confirm the decision to ignore all outgoing requests.
By returningtrue
inignoreOutgoingRequestHook
, no outgoing requests will be traced. If that’s intentional (e.g., you only want to focus on inbound requests), this is fine. If you intended to trace external calls, consider adjusting this logic.Please verify if ignoring all outgoing requests is desired. If not, you could remove or modify the hook to capture relevant outbound traffic.
services/workflows-service/src/traces/tracer.ts (2)
65-71
: Graceful tracing shutdown is well-implemented.
The SIGTERM handling logic cleanly ends tracing. This approach helps ensure no pending spans are left unexported.
27-34
:✅ Verification successful
Verify environment naming and configuration.
Currently, you apply aSimpleSpanProcessor
only whenenv.ENVIRONMENT_NAME === 'local'
. Confirm that your environment variable aligns with your local development environment naming. Otherwise, the intended local logging behavior may not occur.
🏁 Script executed:
#!/bin/bash # Check for references to ENVIRONMENT_NAME in the codebase to ensure consistency rg 'ENVIRONMENT_NAME'Length of output: 6173
ENVIRONMENT_NAME Configuration Verified
The code in
services/workflows-service/src/traces/tracer.ts
correctly usesenv.ENVIRONMENT_NAME === 'local'
to trigger theSimpleSpanProcessor
for local logging. Our repository search confirms that:
- The local environment is defined as
local
in.env.example
and in the environment schema (src/env.ts
).- This naming is used consistently throughout the codebase (e.g., in swagger, tests, and other modules).
No discrepancies were found. Please ensure that local development environments continue to use
local
forENVIRONMENT_NAME
so that this behavior remains intact.services/workflows-service/src/common/middlewares/request-id.middleware.ts (3)
1-1
: Adding OpenTelemetry imports for distributed tracingThese imports enable the middleware to access the trace context and integrate with the OpenTelemetry tracing system.
24-24
: Method name updated to reflect new tracing functionalityThe method name change from
setReqId
tosetTraceId
accurately reflects its new purpose of using OpenTelemetry trace IDs instead of randomly generated request IDs.
68-77
: Properly implemented trace ID extraction from OpenTelemetryThe implementation correctly retrieves the trace ID from the active span context and uses it for request tracking. This enables distributed tracing across services while maintaining backward compatibility by continuing to use the same header name.
A few observations:
- Good null safety check with the optional chaining operator and the conditional check
- The trace ID is correctly stored in both the CLS service and response header
- Maintains the same
X-Request-ID
header name for backward compatibilityservices/workflows-service/package.json (1)
69-85
: Comprehensive OpenTelemetry instrumentation dependencies addedThe added dependencies provide a complete distributed tracing solution for the application:
- Core OpenTelemetry packages for API and SDK functionality
- Multiple exporters (Jaeger, OTLP) for flexible backend integration
- Specific instrumentations for various components (AWS SDK, Express, HTTP, NestJS, Winston)
- Resource detectors for cloud and container environments
- Propagators for cross-service trace context sharing
- Prisma instrumentation for database operation tracing
This comprehensive approach will enable end-to-end tracing across the entire application stack.
- Improve parsing logic for VITE_AUTH_ENABLED, VITE_MOCK_SERVER, and VITE_FETCH_SIGNED_URL - Add console warnings for failed JSON parsing and default to true
Summary by CodeRabbit
New Features
Refactor