Skip to content

Conversation

@manas-narra
Copy link
Collaborator

@manas-narra manas-narra commented Sep 16, 2025

Summary by CodeRabbit

  • Chores

    • Added optional telemetry-backed structured logging to improve observability in production; disabled by default and configurable via environment variables.
    • Enhanced production logging pipeline to emit structured logs for better diagnostics without affecting normal operation.
    • Added OTLP HTTP logs exporter dependency to support remote log collection.
  • Documentation

    • Updated dependency notes to reflect support for OpenTelemetry-based logging.

@coderabbitai
Copy link

coderabbitai bot commented Sep 16, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

Adds configuration flags for OpenTelemetry logging, integrates an optional OTEL logging sink into the logger with lazy initialization, and updates dependencies to support OTLP HTTP log export. Production logging conditionally routes structured logs to both the existing JSON sink and a new OTEL sink when enabled.

Changes

Cohort / File(s) Summary
Config: OTEL flags
app/core/config.py
Adds ENABLE_OTEL_LOGGING (bool), OTEL_COLLECTOR_LOGS_ENDPOINT (str), OTEL_SERVICE_NAME (str). Emits a runtime log noting OTEL logging enablement status.
Logger: OTEL integration
app/core/logger.py
Introduces lazy OTEL logger initialization (provider, resource with service.name, OTLP HTTP exporter, batch processor, logging handler). Adds a Loguru sink (otel_sink) emitting structured JSON via Python logging to OTEL when enabled. Updates production sink setup to include OTEL sink. Handles init failures by disabling OTEL and logging errors.
Deps: OTEL logs exporter
requirements.txt
Adds opentelemetry-exporter-otlp-proto-http[logs]. Updates comment to include logging.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App as Application
  participant Logger as Logger Setup
  participant OTEL as OTEL Init
  participant Sinks as Sinks (json_sink, otel_sink)
  participant PyLog as Python logging.Handler
  participant Collector as OTLP Collector

  App->>Logger: Initialize logger (production)
  Logger->>Logger: Check ENABLE_OTEL_LOGGING
  alt OTEL enabled
    Logger->>OTEL: _get_otel_logger()
    OTEL->>OTEL: Create Resource(service.name)
    OTEL->>OTEL: Set/Get LoggerProvider
    OTEL->>OTEL: Configure OTLP HTTP Exporter + Batch Processor
    OTEL->>Logger: Return LoggingHandler
    Logger->>Sinks: Add otel_sink
  else OTEL disabled
    Logger->>Sinks: Skip otel_sink
  end
  Logger->>Sinks: Add json_sink

  App->>Logger: Emit log via Loguru
  Logger->>Sinks: Dispatch to sinks
  par json path
    Sinks-->>Collector: N/A (app internal JSON handling)
  and OTEL path (if enabled)
    Sinks->>PyLog: LogRecord(message=structured JSON)
    PyLog->>Collector: OTLP HTTP export (Batch)
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

I twitch my ears at logs that sing,
A carrot-colored trace they bring—
Through OTLP they hop in line,
With JSON friends, they both align.
If toggled off, I’ll softly wait;
When flags say go, I celebrate! 🥕🐇

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: add otel sink for clairvoyance to push logs to crane" is a concise single-sentence summary that accurately captures the primary change (adding an OpenTelemetry/OTEL sink to Clairvoyance to forward logs to Crane) and aligns with the modified config, logger, and dependency files in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (9)
requirements.txt (2)

25-29: Avoid duplicate OTLP exporter installs; use the [logs] extra only.

You list both the base opentelemetry-exporter-otlp-proto-http and the [logs] extra. The extra already pulls the base; keeping both is redundant and can cause resolver churn.

Apply this diff:

 opentelemetry-api
 opentelemetry-sdk
-opentelemetry-exporter-otlp-proto-http
 opentelemetry-exporter-otlp-proto-http[logs]

26-29: Pin a compatible OTEL version range to avoid surprise breaks.

Unpinned opentelemetry-* packages are brittle. Consider a conservative range across API/SDK/exporter to keep them in lockstep.

Example:

-opentelemetry-api
-opentelemetry-sdk
-opentelemetry-exporter-otlp-proto-http[logs]
+opentelemetry-api>=1.25,<2
+opentelemetry-sdk>=1.25,<2
+opentelemetry-exporter-otlp-proto-http[logs]>=1.25,<2
app/core/config.py (1)

147-147: Optional: log an explicit warning when OTEL is enabled with an empty endpoint.

If you don’t hard-fail, at least warn loudly to avoid silent misconfig.

-logger.info(f"OTEL logging enabled: {ENABLE_OTEL_LOGGING}")
+logger.info(f"OTEL logging enabled: {ENABLE_OTEL_LOGGING}")
+if ENABLE_OTEL_LOGGING and not OTEL_COLLECTOR_LOGS_ENDPOINT:
+    logger.warning("OTEL is enabled but OTEL_COLLECTOR_LOGS_ENDPOINT is empty")
app/core/logger.py (6)

26-31: Duplicate import of LoggingHandler; remove the second import.

Redefinition (Ruff F811). Keep one import only.

Apply this diff:

-            from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
+            from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
@@
-            from opentelemetry.sdk._logs import LoggingHandler
-            _otel_handler = LoggingHandler(logger_provider=_otel_logger_provider)
+            _otel_handler = LoggingHandler(logger_provider=_otel_logger_provider)

Also applies to: 61-63


52-58: Batch settings: consider lower schedule delay for near‑real‑time logs.

schedule_delay_millis=5000 adds up to 5s latency. If you need faster log visibility, reduce to 500–1000ms.

Apply this diff:

-            log_processor = BatchLogRecordProcessor(
+            log_processor = BatchLogRecordProcessor(
                 log_exporter,
                 max_export_batch_size=100,
                 export_timeout_millis=30000,
-                schedule_delay_millis=5000
+                schedule_delay_millis=1000
             )

68-75: Use logger.exception instead of manual traceback; narrow exception scope if possible.

This addresses Ruff BLE001/S110 and keeps stack traces without manual formatting.

Apply this diff:

-        except Exception as e:
-            logger.error(f"Failed to initialize OTEL logger: {e}")
-            # Add more detailed error info
-            import traceback
-            logger.error(f"OTEL initialization traceback: {traceback.format_exc()}")
+        except Exception:
+            logger.exception("Failed to initialize OTEL logger")

81-139: Harden sink: avoid re-imports, check handler presence, and don’t swallow exceptions silently.

  • Remove redundant imports.
  • Guard _otel_handler being None.
  • Replace try/except: pass with a minimal stderr write to avoid recursion.

Apply this diff:

 def otel_sink(message):
     """Custom OTEL sink that preserves full structured data like JSON sink"""
-    if not _otel_enabled:
+    if not _otel_enabled:
         return
@@
-    try:
-        import json
-        import logging
+    try:
@@
-        _otel_handler.emit(log_record)
+        if _otel_handler:
+            _otel_handler.emit(log_record)
+        else:
+            return
@@
-    except Exception as e:
-        # Don't let OTEL issues break the application
-        pass
+    except Exception as _:
+        # Don't let OTEL issues break the application; avoid logger recursion.
+        try:
+            sys.stderr.write("otel_sink error; log dropped\n")
+        except Exception:
+            pass

255-265: Optional: apply the spam filter to the OTEL sink too (or document why not).

If the collector shouldn’t ingest websockets/daily_core noise, reuse filter_spam_logs here to cut volume. If you intend to ship everything to Crane, ignore this.

Apply this diff:

-            logger.add(
+            logger.add(
                 otel_sink,  # Use custom sink that preserves structured data
                 level=PROD_LOG_LEVEL,
-                enqueue=True,
+                enqueue=True,
                 backtrace=False,
                 diagnose=False,
+                # filter=filter_spam_logs,  # uncomment if you want to suppress spam in OTEL as well
             )

64-66: Minor: _otel_logger isn’t used outside init; gate on _otel_handler instead.

You can avoid keeping _otel_logger and just treat _otel_handler as readiness.

Apply this diff:

-            # Get logger instance (for direct emission if needed)
-            _otel_logger = _otel_logger_provider.get_logger(__name__)
+            # Presence of handler is sufficient to emit records
+            _otel_logger = _otel_logger_provider.get_logger(__name__)  # optional; can be removed
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c615f17 and c07d707.

📒 Files selected for processing (3)
  • app/core/config.py (2 hunks)
  • app/core/logger.py (2 hunks)
  • requirements.txt (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
app/core/logger.py

40-40: Do not catch blind exception: Exception

(BLE001)


61-61: Redefinition of unused LoggingHandler from line 28

Remove definition: LoggingHandler

(F811)


70-70: Do not catch blind exception: Exception

(BLE001)


136-138: try-except-pass detected, consider logging the exception

(S110)


136-136: Do not catch blind exception: Exception

(BLE001)


136-136: Local variable e is assigned to but never used

Remove assignment to unused variable e

(F841)

🔇 Additional comments (1)
app/core/logger.py (1)

47-51: Remove headers={} — it overrides env-based OTLP headers

Passing headers={} programmatically clobbers OTEL_EXPORTER_OTLP_HEADERS / OTEL_EXPORTER_OTLP_LOGS_HEADERS (used for Authorization); remove the headers argument so the exporter reads env/config.

Apply this diff:

-            log_exporter = OTLPLogExporter(
-                endpoint=f"{OTEL_COLLECTOR_LOGS_ENDPOINT}/v1/logs",
-                headers={},
-                timeout=30
-            )
+            log_exporter = OTLPLogExporter(
+                endpoint=f"{OTEL_COLLECTOR_LOGS_ENDPOINT}/v1/logs",
+                timeout=30,
+            )

Also confirm OTEL_COLLECTOR_LOGS_ENDPOINT is a base URL (e.g. https://collector:4318); the SDK will append /v1/logs. If the variable is already a full logs URL, do not append /v1/logs or set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT explicitly.

Comment on lines 107 to 133
# OpenTelemetry Logging (NEW)
ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "https://crane.beta.breeze.in")
OTEL_SERVICE_NAME = "clairvoyance"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don’t default a release build to a beta collector; require explicit endpoint when OTEL is enabled.

Defaulting to https://crane.beta.breeze.in risks accidentally shipping prod logs to beta. Make the endpoint mandatory when ENABLE_OTEL_LOGGING=true and allow OTEL_SERVICE_NAME to be overridden.

Apply this diff:

-ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
-OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "https://crane.beta.breeze.in")
-OTEL_SERVICE_NAME = "clairvoyance"
+ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
+OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "")
+OTEL_SERVICE_NAME = os.environ.get("OTEL_SERVICE_NAME", "clairvoyance")
+if ENABLE_OTEL_LOGGING and not OTEL_COLLECTOR_LOGS_ENDPOINT:
+    logger.error("ENABLE_OTEL_LOGGING is true but OTEL_COLLECTOR_LOGS_ENDPOINT is not set")
+    raise ValueError("OTEL_COLLECTOR_LOGS_ENDPOINT is required when ENABLE_OTEL_LOGGING=true")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# OpenTelemetry Logging (NEW)
ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "https://crane.beta.breeze.in")
OTEL_SERVICE_NAME = "clairvoyance"
# OpenTelemetry Logging (NEW)
ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "")
OTEL_SERVICE_NAME = os.environ.get("OTEL_SERVICE_NAME", "clairvoyance")
if ENABLE_OTEL_LOGGING and not OTEL_COLLECTOR_LOGS_ENDPOINT:
logger.error("ENABLE_OTEL_LOGGING is true but OTEL_COLLECTOR_LOGS_ENDPOINT is not set")
raise ValueError("OTEL_COLLECTOR_LOGS_ENDPOINT is required when ENABLE_OTEL_LOGGING=true")
🤖 Prompt for AI Agents
In app/core/config.py around lines 107 to 111, currently
OTEL_COLLECTOR_LOGS_ENDPOINT defaults to the beta URL and OTEL_SERVICE_NAME is
hardcoded; change this so OTEL_COLLECTOR_LOGS_ENDPOINT is read from the
environment without a beta default (e.g. None or empty) and OTEL_SERVICE_NAME is
read from the environment with a sensible default, and add a startup-time check
that if ENABLE_OTEL_LOGGING is true and the endpoint is not explicitly provided,
raise an error (or exit) with a clear message instructing the operator to set
OTEL_COLLECTOR_LOGS_ENDPOINT.

Comment on lines 9 to 15
from app.core.config import ENVIRONMENT, PROD_LOG_LEVEL, ENABLE_OTEL_LOGGING

# NEW: Custom OTEL sink for structured logging
_otel_logger_provider = None
_otel_logger = None
_otel_handler = None

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Don’t mutate imported config flags; track local OTEL enablement in this module.

Toggling globals()['ENABLE_OTEL_LOGGING'] = False after importing the flag by value is brittle and confusing. Use a local _otel_enabled gate and reference it consistently.

Apply this diff:

-from app.core.config import ENVIRONMENT, PROD_LOG_LEVEL, ENABLE_OTEL_LOGGING
+from app.core.config import ENVIRONMENT, PROD_LOG_LEVEL, ENABLE_OTEL_LOGGING
+
+# Local gate so we can disable OTEL on runtime errors without mutating imported config
+_otel_enabled = ENABLE_OTEL_LOGGING
@@
-    if _otel_logger is None and ENABLE_OTEL_LOGGING:
+    if _otel_logger is None and _otel_enabled:
@@
-            globals()['ENABLE_OTEL_LOGGING'] = False
+            global _otel_enabled
+            _otel_enabled = False
@@
-    if not ENABLE_OTEL_LOGGING:
+    if not _otel_enabled:
         return
@@
-        if ENABLE_OTEL_LOGGING:
+        if _otel_enabled:
             logger.add(
                 otel_sink,  # Use custom sink that preserves structured data
                 level=PROD_LOG_LEVEL,
                 enqueue=True,
                 backtrace=False,
                 diagnose=False,
             )

Also applies to: 24-25, 76-76, 83-84, 255-265

🤖 Prompt for AI Agents
In app/core/logger.py around lines 9-15 (and also affecting lines 24-25, 76,
83-84, 255-265): the code currently mutates the imported config flag
ENABLE_OTEL_LOGGING via globals() which is brittle; create a module-level
boolean _otel_enabled initialized to the imported ENABLE_OTEL_LOGGING value and
use _otel_enabled everywhere instead of mutating the imported name, replace any
globals()['ENABLE_OTEL_LOGGING'] = ... assignments with updates to
_otel_enabled, update all conditionals and handlers to check _otel_enabled, and
remove any direct mutation of the imported config flag so the module tracks OTEL
state locally and consistently.

Comment on lines 16 to 40
def _get_otel_logger():
"""Get or create the global OTEL logger (singleton pattern)"""
global _otel_logger_provider, _otel_logger, _otel_handler

# Return existing logger if already initialized
if _otel_logger is not None:
return _otel_logger

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Guard OTEL initialization with a lock to avoid racy double-init.

Concurrent first-use can race and produce multiple providers/handlers. Add a lock and re-check inside the critical section.

Apply this diff:

+import threading
@@
+_otel_init_lock = threading.Lock()
@@
-    # Return existing logger if already initialized
-    if _otel_logger is not None:
-        return _otel_logger
+    # Return existing logger if already initialized
+    if _otel_logger is not None:
+        return _otel_logger
@@
-    if _otel_logger is None and _otel_enabled:
+    if _otel_logger is None and _otel_enabled:
-        try:
+        try:
+            with _otel_init_lock:
+                if _otel_logger is not None:
+                    return _otel_logger

Also applies to: 36-45

🤖 Prompt for AI Agents
In app/core/logger.py around lines 16-23 (and also apply the same change to the
init path at lines 36-45), wrap the OTEL logger/provider/handler initialization
in a module-level threading.Lock (or RLock): declare the lock at module scope,
acquire it before performing initialization, re-check if _otel_logger is already
set inside the critical section to avoid double init, then create and assign
_otel_logger, _otel_logger_provider and _otel_handler while holding the lock,
and finally release the lock; use the same lock and re-check pattern for the
other initialization block at 36-45.

Comment on lines 95 to 96
import json
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to top

Comment on lines 26 to 31
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk.resources import Resource
from opentelemetry._logs import set_logger_provider
from app.core.config import OTEL_COLLECTOR_LOGS_ENDPOINT, OTEL_SERVICE_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move to top

@badri-singhal
Copy link
Contributor

Rebase with release, and run ./scripts/setup.sh file before committing

@manas-narra manas-narra force-pushed the BZ-44137-add-otel-logging-for-clairvoyance branch 4 times, most recently from ff17769 to 3d7a089 Compare September 26, 2025 06:36
@@ -0,0 +1,88 @@
from loguru import logger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a different logger?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is causing a circular dependency because otel.py needs to log during the setup of our logging system. Our configured logger doesn't exist yet when otel.py is setting up!

logger.warning(f"OTEL logging components not available: {e}")

# Global OTEL components
_otel_logger_provider = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need global variables

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a logger instance is already initialized then that will be returned instead of initializing multiple times.

@manas-narra manas-narra force-pushed the BZ-44137-add-otel-logging-for-clairvoyance branch from 3d7a089 to 6878784 Compare November 3, 2025 06:16
@manas-narra manas-narra force-pushed the BZ-44137-add-otel-logging-for-clairvoyance branch from 6878784 to 80c4e3f Compare November 5, 2025 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants