Add OpenTelemetry integration #1027
Replies: 3 comments
-
Hello ! It's an excellent initiative, opening this discussion. I'm currently dissatisfied by the current state of observability in Procrastinate. I think investing in making it easier to integrate OpenTelemetry or other tools should definitely be a step in the right direction, for traces, but also metrics and logs (as far as I can tell, there's no code to modify to integrate logs properly, but it's worth testing that autoinstrumentation works well and maybe point to the relevant part of OTel's doc). Of course, this can work be split in multiple parts. I've never played with Open Telemetry before, so it's all very abstract for me, but I'll do my best to answer your questions:
OpenTelemetry in general, yes, though I believe it's probably sane to keep it optional. I believe some people won't have the rest of the infra ready for this.
Let's discuss what we can do in procrastinate so that this is less invasive, whether we decide to ship opentelemetry optional code or just make it easier to people to integrate these kind of libs without commitments to otel in particular.
(damn, I'm answering questions in advance)
I'm perfectly ok with that too, and I'll add multiple steps in between:
That said, for now, I'm ok having live in the contrib section, with potential optional dependencies (
Aha, that's a complex question. I don't know otel enough to know if it's simple or not, so it's hard to say. In procrastinate, we usually try to make things async at core, and use Let's keep this discussion moving. It's a bit blurry in my mind so far, and I'll let this sink in a bit. I guess I'd love to gather opinions from other frequent users/occasional contributors and get a sense of how we can work together on this to manage expectations and avoid spending time on some code until we know it's going to get merged. Maybe something that could be cool to better understand this could be to dust off the docker-compose file and setup a small observability stack that would gather logs, metrics, traces in a toy app (such as the Django app in the demos folder). That could also give us a playground for testing an equivalent async stack later. Let us play :) I don't expect an immediate decision but it's an opportunity to evolve the app in a good future-friendly way. |
Beta Was this translation helpful? Give feedback.
-
I'd be keen to see this as it was on my todo list, so +1 from me. One suggestion - it would be useful to go a level higher and wrap the execution of the job so that the job.id can be included in the span attributes (and perhaps also whether the job will retry). |
Beta Was this translation helpful? Give feedback.
-
Hello everybody! My take on an OpenTelemetry integration is the following: This is the example setup: def _set_task_attributes(span, task) -> None:
"""Set span attributes of the Procrastinate task.
Args:
span (opentelemetry.trace.Span): Current recording OTel span.
task (procrastinate.tasks.Task): Running Procrastinate task in the span.
"""
attributes = {
"task.name": task.name,
"task.priority": task.priority,
}
if task.lock is not None:
attributes["task.lock"] = task.lock
if task.queueing_lock is not None:
attributes["task.queueing_lock"] = task.lock
span.set_attributes(attributes)
def _set_job_context_attributes(span, context) -> None:
"""Set span attributes of the Procrastinate job context, including its task.
Args:
span (opentelemetry.trace.Span): Current recording OTel span.
context (procrastinate.job_context.JobContext): Job context.
"""
job = context.job
attributes = {
"job.id": job.id,
"job.priority": job.priority,
"job.queue": job.queue,
"worker.name": context.worker_name,
}
if job.queueing_lock is not None:
attributes["job.queueing_lock"] = job.queueing_lock
if job.scheduled_at is not None:
attributes["job.scheduled_at"] = job.scheduled_at
span.set_attributes(attributes)
_set_task_attributes(span, context.task)
def setup_worker_tracing():
"""Setup tracing of worker jobs with OpenTelemetry.
Monkey patches internal Procrastinate Worker functions.
"""
from opentelemetry import trace
from procrastinate import exceptions, job_context, jobs, worker
tracer = trace.get_tracer("worker")
# monkey patch job processing for starting spans
original_process_job = worker.Worker._process_job
# monkey patch job outcome for recording exceptions and duration
original_log_job_outcome = worker.Worker._log_job_outcome
async def traced_process_job(self, context: job_context.JobContext):
# start OTel span with context
with tracer.start_as_current_span(
context.task.name,
kind=trace.SpanKind.CONSUMER,
record_exception=True,
end_on_exit=False, # ends in traced_log_job_outcome
) as span:
if span.is_recording():
_set_job_context_attributes(span, context)
await original_process_job(self, context)
def traced_log_job_outcome(
self,
status: jobs.Status,
context: job_context.JobContext,
job_result: job_context.JobResult | None,
job_retry: exceptions.JobRetry | None,
exc_info: bool | BaseException = False,
):
task_span = trace.get_current_span()
if isinstance(exc_info, BaseException):
task_span.record_exception(exc_info)
task_span.set_status(trace.StatusCode.ERROR)
if job_retry is not None:
task_span.set_attribute("job.retry", str(job_retry.retry_decision.retry_at))
# the current time is precise enough for task duration
task_span.end()
original_log_job_outcome(self, status, context, job_result, job_retry, exc_info)
worker.Worker._process_job = traced_process_job
worker.Worker._log_job_outcome = traced_log_job_outcome |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi !
I'm currently evaluating procrastinate for my company as a replacement for Celery and I needed to have an OpenTelemetry integration where a span would be created when a task is deferred and when a task is ran.
Like many other OpenTelemetry integrations it relies on monkey patching, which is not very satisfying to me, but hey "it works™" !
I just wanted to share with you the task middleware that I made in order to discuss about it and have your feedbacks:
django
?procrastinate-opentelemetry
or something) ?Have a nice day 🌴
Beta Was this translation helpful? Give feedback.
All reactions