-
Notifications
You must be signed in to change notification settings - Fork 46
feat: add otel sink for clairvoyance to push logs to crane #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release
Are you sure you want to change the base?
feat: add otel sink for clairvoyance to push logs to crane #217
Conversation
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughAdds configuration flags for OpenTelemetry logging, integrates an optional OTEL logging sink into the logger with lazy initialization, and updates dependencies to support OTLP HTTP log export. Production logging conditionally routes structured logs to both the existing JSON sink and a new OTEL sink when enabled. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App as Application
participant Logger as Logger Setup
participant OTEL as OTEL Init
participant Sinks as Sinks (json_sink, otel_sink)
participant PyLog as Python logging.Handler
participant Collector as OTLP Collector
App->>Logger: Initialize logger (production)
Logger->>Logger: Check ENABLE_OTEL_LOGGING
alt OTEL enabled
Logger->>OTEL: _get_otel_logger()
OTEL->>OTEL: Create Resource(service.name)
OTEL->>OTEL: Set/Get LoggerProvider
OTEL->>OTEL: Configure OTLP HTTP Exporter + Batch Processor
OTEL->>Logger: Return LoggingHandler
Logger->>Sinks: Add otel_sink
else OTEL disabled
Logger->>Sinks: Skip otel_sink
end
Logger->>Sinks: Add json_sink
App->>Logger: Emit log via Loguru
Logger->>Sinks: Dispatch to sinks
par json path
Sinks-->>Collector: N/A (app internal JSON handling)
and OTEL path (if enabled)
Sinks->>PyLog: LogRecord(message=structured JSON)
PyLog->>Collector: OTLP HTTP export (Batch)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (9)
requirements.txt (2)
25-29: Avoid duplicate OTLP exporter installs; use the[logs]extra only.You list both the base
opentelemetry-exporter-otlp-proto-httpand the[logs]extra. The extra already pulls the base; keeping both is redundant and can cause resolver churn.Apply this diff:
opentelemetry-api opentelemetry-sdk -opentelemetry-exporter-otlp-proto-http opentelemetry-exporter-otlp-proto-http[logs]
26-29: Pin a compatible OTEL version range to avoid surprise breaks.Unpinned
opentelemetry-*packages are brittle. Consider a conservative range across API/SDK/exporter to keep them in lockstep.Example:
-opentelemetry-api -opentelemetry-sdk -opentelemetry-exporter-otlp-proto-http[logs] +opentelemetry-api>=1.25,<2 +opentelemetry-sdk>=1.25,<2 +opentelemetry-exporter-otlp-proto-http[logs]>=1.25,<2app/core/config.py (1)
147-147: Optional: log an explicit warning when OTEL is enabled with an empty endpoint.If you don’t hard-fail, at least warn loudly to avoid silent misconfig.
-logger.info(f"OTEL logging enabled: {ENABLE_OTEL_LOGGING}") +logger.info(f"OTEL logging enabled: {ENABLE_OTEL_LOGGING}") +if ENABLE_OTEL_LOGGING and not OTEL_COLLECTOR_LOGS_ENDPOINT: + logger.warning("OTEL is enabled but OTEL_COLLECTOR_LOGS_ENDPOINT is empty")app/core/logger.py (6)
26-31: Duplicate import of LoggingHandler; remove the second import.Redefinition (Ruff F811). Keep one import only.
Apply this diff:
- from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler + from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler @@ - from opentelemetry.sdk._logs import LoggingHandler - _otel_handler = LoggingHandler(logger_provider=_otel_logger_provider) + _otel_handler = LoggingHandler(logger_provider=_otel_logger_provider)Also applies to: 61-63
52-58: Batch settings: consider lower schedule delay for near‑real‑time logs.
schedule_delay_millis=5000adds up to 5s latency. If you need faster log visibility, reduce to 500–1000ms.Apply this diff:
- log_processor = BatchLogRecordProcessor( + log_processor = BatchLogRecordProcessor( log_exporter, max_export_batch_size=100, export_timeout_millis=30000, - schedule_delay_millis=5000 + schedule_delay_millis=1000 )
68-75: Uselogger.exceptioninstead of manual traceback; narrow exception scope if possible.This addresses Ruff BLE001/S110 and keeps stack traces without manual formatting.
Apply this diff:
- except Exception as e: - logger.error(f"Failed to initialize OTEL logger: {e}") - # Add more detailed error info - import traceback - logger.error(f"OTEL initialization traceback: {traceback.format_exc()}") + except Exception: + logger.exception("Failed to initialize OTEL logger")
81-139: Harden sink: avoid re-imports, check handler presence, and don’t swallow exceptions silently.
- Remove redundant imports.
- Guard
_otel_handlerbeingNone.- Replace
try/except: passwith a minimal stderr write to avoid recursion.Apply this diff:
def otel_sink(message): """Custom OTEL sink that preserves full structured data like JSON sink""" - if not _otel_enabled: + if not _otel_enabled: return @@ - try: - import json - import logging + try: @@ - _otel_handler.emit(log_record) + if _otel_handler: + _otel_handler.emit(log_record) + else: + return @@ - except Exception as e: - # Don't let OTEL issues break the application - pass + except Exception as _: + # Don't let OTEL issues break the application; avoid logger recursion. + try: + sys.stderr.write("otel_sink error; log dropped\n") + except Exception: + pass
255-265: Optional: apply the spam filter to the OTEL sink too (or document why not).If the collector shouldn’t ingest
websockets/daily_corenoise, reusefilter_spam_logshere to cut volume. If you intend to ship everything to Crane, ignore this.Apply this diff:
- logger.add( + logger.add( otel_sink, # Use custom sink that preserves structured data level=PROD_LOG_LEVEL, - enqueue=True, + enqueue=True, backtrace=False, diagnose=False, + # filter=filter_spam_logs, # uncomment if you want to suppress spam in OTEL as well )
64-66: Minor:_otel_loggerisn’t used outside init; gate on_otel_handlerinstead.You can avoid keeping
_otel_loggerand just treat_otel_handleras readiness.Apply this diff:
- # Get logger instance (for direct emission if needed) - _otel_logger = _otel_logger_provider.get_logger(__name__) + # Presence of handler is sufficient to emit records + _otel_logger = _otel_logger_provider.get_logger(__name__) # optional; can be removed
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
app/core/config.py(2 hunks)app/core/logger.py(2 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
app/core/logger.py
40-40: Do not catch blind exception: Exception
(BLE001)
61-61: Redefinition of unused LoggingHandler from line 28
Remove definition: LoggingHandler
(F811)
70-70: Do not catch blind exception: Exception
(BLE001)
136-138: try-except-pass detected, consider logging the exception
(S110)
136-136: Do not catch blind exception: Exception
(BLE001)
136-136: Local variable e is assigned to but never used
Remove assignment to unused variable e
(F841)
🔇 Additional comments (1)
app/core/logger.py (1)
47-51: Remove headers={} — it overrides env-based OTLP headersPassing headers={} programmatically clobbers OTEL_EXPORTER_OTLP_HEADERS / OTEL_EXPORTER_OTLP_LOGS_HEADERS (used for Authorization); remove the headers argument so the exporter reads env/config.
Apply this diff:
- log_exporter = OTLPLogExporter( - endpoint=f"{OTEL_COLLECTOR_LOGS_ENDPOINT}/v1/logs", - headers={}, - timeout=30 - ) + log_exporter = OTLPLogExporter( + endpoint=f"{OTEL_COLLECTOR_LOGS_ENDPOINT}/v1/logs", + timeout=30, + )Also confirm OTEL_COLLECTOR_LOGS_ENDPOINT is a base URL (e.g. https://collector:4318); the SDK will append /v1/logs. If the variable is already a full logs URL, do not append /v1/logs or set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT explicitly.
| # OpenTelemetry Logging (NEW) | ||
| ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true" | ||
| OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "https://crane.beta.breeze.in") | ||
| OTEL_SERVICE_NAME = "clairvoyance" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t default a release build to a beta collector; require explicit endpoint when OTEL is enabled.
Defaulting to https://crane.beta.breeze.in risks accidentally shipping prod logs to beta. Make the endpoint mandatory when ENABLE_OTEL_LOGGING=true and allow OTEL_SERVICE_NAME to be overridden.
Apply this diff:
-ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
-OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "https://crane.beta.breeze.in")
-OTEL_SERVICE_NAME = "clairvoyance"
+ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true"
+OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "")
+OTEL_SERVICE_NAME = os.environ.get("OTEL_SERVICE_NAME", "clairvoyance")
+if ENABLE_OTEL_LOGGING and not OTEL_COLLECTOR_LOGS_ENDPOINT:
+ logger.error("ENABLE_OTEL_LOGGING is true but OTEL_COLLECTOR_LOGS_ENDPOINT is not set")
+ raise ValueError("OTEL_COLLECTOR_LOGS_ENDPOINT is required when ENABLE_OTEL_LOGGING=true")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # OpenTelemetry Logging (NEW) | |
| ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true" | |
| OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "https://crane.beta.breeze.in") | |
| OTEL_SERVICE_NAME = "clairvoyance" | |
| # OpenTelemetry Logging (NEW) | |
| ENABLE_OTEL_LOGGING = os.environ.get("ENABLE_OTEL_LOGGING", "false").lower() == "true" | |
| OTEL_COLLECTOR_LOGS_ENDPOINT = os.environ.get("OTEL_COLLECTOR_LOGS_ENDPOINT", "") | |
| OTEL_SERVICE_NAME = os.environ.get("OTEL_SERVICE_NAME", "clairvoyance") | |
| if ENABLE_OTEL_LOGGING and not OTEL_COLLECTOR_LOGS_ENDPOINT: | |
| logger.error("ENABLE_OTEL_LOGGING is true but OTEL_COLLECTOR_LOGS_ENDPOINT is not set") | |
| raise ValueError("OTEL_COLLECTOR_LOGS_ENDPOINT is required when ENABLE_OTEL_LOGGING=true") |
🤖 Prompt for AI Agents
In app/core/config.py around lines 107 to 111, currently
OTEL_COLLECTOR_LOGS_ENDPOINT defaults to the beta URL and OTEL_SERVICE_NAME is
hardcoded; change this so OTEL_COLLECTOR_LOGS_ENDPOINT is read from the
environment without a beta default (e.g. None or empty) and OTEL_SERVICE_NAME is
read from the environment with a sensible default, and add a startup-time check
that if ENABLE_OTEL_LOGGING is true and the endpoint is not explicitly provided,
raise an error (or exit) with a clear message instructing the operator to set
OTEL_COLLECTOR_LOGS_ENDPOINT.
app/core/logger.py
Outdated
| from app.core.config import ENVIRONMENT, PROD_LOG_LEVEL, ENABLE_OTEL_LOGGING | ||
|
|
||
| # NEW: Custom OTEL sink for structured logging | ||
| _otel_logger_provider = None | ||
| _otel_logger = None | ||
| _otel_handler = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don’t mutate imported config flags; track local OTEL enablement in this module.
Toggling globals()['ENABLE_OTEL_LOGGING'] = False after importing the flag by value is brittle and confusing. Use a local _otel_enabled gate and reference it consistently.
Apply this diff:
-from app.core.config import ENVIRONMENT, PROD_LOG_LEVEL, ENABLE_OTEL_LOGGING
+from app.core.config import ENVIRONMENT, PROD_LOG_LEVEL, ENABLE_OTEL_LOGGING
+
+# Local gate so we can disable OTEL on runtime errors without mutating imported config
+_otel_enabled = ENABLE_OTEL_LOGGING
@@
- if _otel_logger is None and ENABLE_OTEL_LOGGING:
+ if _otel_logger is None and _otel_enabled:
@@
- globals()['ENABLE_OTEL_LOGGING'] = False
+ global _otel_enabled
+ _otel_enabled = False
@@
- if not ENABLE_OTEL_LOGGING:
+ if not _otel_enabled:
return
@@
- if ENABLE_OTEL_LOGGING:
+ if _otel_enabled:
logger.add(
otel_sink, # Use custom sink that preserves structured data
level=PROD_LOG_LEVEL,
enqueue=True,
backtrace=False,
diagnose=False,
)Also applies to: 24-25, 76-76, 83-84, 255-265
🤖 Prompt for AI Agents
In app/core/logger.py around lines 9-15 (and also affecting lines 24-25, 76,
83-84, 255-265): the code currently mutates the imported config flag
ENABLE_OTEL_LOGGING via globals() which is brittle; create a module-level
boolean _otel_enabled initialized to the imported ENABLE_OTEL_LOGGING value and
use _otel_enabled everywhere instead of mutating the imported name, replace any
globals()['ENABLE_OTEL_LOGGING'] = ... assignments with updates to
_otel_enabled, update all conditionals and handlers to check _otel_enabled, and
remove any direct mutation of the imported config flag so the module tracks OTEL
state locally and consistently.
app/core/logger.py
Outdated
| def _get_otel_logger(): | ||
| """Get or create the global OTEL logger (singleton pattern)""" | ||
| global _otel_logger_provider, _otel_logger, _otel_handler | ||
|
|
||
| # Return existing logger if already initialized | ||
| if _otel_logger is not None: | ||
| return _otel_logger | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard OTEL initialization with a lock to avoid racy double-init.
Concurrent first-use can race and produce multiple providers/handlers. Add a lock and re-check inside the critical section.
Apply this diff:
+import threading
@@
+_otel_init_lock = threading.Lock()
@@
- # Return existing logger if already initialized
- if _otel_logger is not None:
- return _otel_logger
+ # Return existing logger if already initialized
+ if _otel_logger is not None:
+ return _otel_logger
@@
- if _otel_logger is None and _otel_enabled:
+ if _otel_logger is None and _otel_enabled:
- try:
+ try:
+ with _otel_init_lock:
+ if _otel_logger is not None:
+ return _otel_loggerAlso applies to: 36-45
🤖 Prompt for AI Agents
In app/core/logger.py around lines 16-23 (and also apply the same change to the
init path at lines 36-45), wrap the OTEL logger/provider/handler initialization
in a module-level threading.Lock (or RLock): declare the lock at module scope,
acquire it before performing initialization, re-check if _otel_logger is already
set inside the critical section to avoid double init, then create and assign
_otel_logger, _otel_logger_provider and _otel_handler while holding the lock,
and finally release the lock; use the same lock and re-check pattern for the
other initialization block at 36-45.
app/core/logger.py
Outdated
| import json | ||
| import logging |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to top
app/core/logger.py
Outdated
| from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter | ||
| from opentelemetry.sdk._logs.export import BatchLogRecordProcessor | ||
| from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler | ||
| from opentelemetry.sdk.resources import Resource | ||
| from opentelemetry._logs import set_logger_provider | ||
| from app.core.config import OTEL_COLLECTOR_LOGS_ENDPOINT, OTEL_SERVICE_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move to top
|
Rebase with release, and run ./scripts/setup.sh file before committing |
ff17769 to
3d7a089
Compare
| @@ -0,0 +1,88 @@ | |||
| from loguru import logger | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a different logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is causing a circular dependency because otel.py needs to log during the setup of our logging system. Our configured logger doesn't exist yet when otel.py is setting up!
| logger.warning(f"OTEL logging components not available: {e}") | ||
|
|
||
| # Global OTEL components | ||
| _otel_logger_provider = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need global variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a logger instance is already initialized then that will be returned instead of initializing multiple times.
3d7a089 to
6878784
Compare
6878784 to
80c4e3f
Compare
Summary by CodeRabbit
Chores
Documentation