Skip to content

Commit 7f746cf

Browse files
authored
OpenTelemetry tracing support (#234)
Fixes #189
1 parent 6218c1f commit 7f746cf

25 files changed

+1298
-78
lines changed

README.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,10 @@ opinions. Please communicate with us on [Slack](https://t.mp/slack) in the `#rub
6767
- [Activity Worker Shutdown](#activity-worker-shutdown)
6868
- [Activity Concurrency and Executors](#activity-concurrency-and-executors)
6969
- [Activity Testing](#activity-testing)
70+
- [Telemetry](#telemetry)
71+
- [Metrics](#metrics)
72+
- [OpenTelemetry Tracing](#opentelemetry-tracing)
73+
- [OpenTelemetry Tracing in Workflows](#opentelemetry-tracing-in-workflows)
7074
- [Ractors](#ractors)
7175
- [Platform Support](#platform-support)
7276
- [Development](#development)
@@ -1034,6 +1038,122 @@ it will raise the error raised in the activity.
10341038
The constructor of the environment has multiple keyword arguments that can be set to affect the activity context for the
10351039
activity.
10361040

1041+
### Telemetry
1042+
1043+
#### Metrics
1044+
1045+
Metrics can be configured on a `Temporalio::Runtime`. Only one runtime is expected to be created for the entire
1046+
application and it should be created before any clients are created. For example, this configures Prometheus to export
1047+
metrics at `http://127.0.0.1:9000/metrics`:
1048+
1049+
```ruby
1050+
require 'temporalio/runtime'
1051+
1052+
Temporalio::Runtime.default = Temporalio::Runtime.new(
1053+
telemetry: Temporalio::Runtime::TelemetryOptions.new(
1054+
metrics: Temporalio::Runtime::MetricsOptions.new(
1055+
prometheus: Temporalio::Runtime::PrometheusMetricsOptions.new(
1056+
bind_address: '127.0.0.1:9000'
1057+
)
1058+
)
1059+
)
1060+
)
1061+
```
1062+
1063+
Now every client created will use this runtime. Setting the default will fail if a runtime has already been requested or
1064+
a default already set. Technically a runtime can be created without setting the default and be set on each client via
1065+
the `runtime` parameter, but this is discouraged because a runtime represents a heavy internal engine not meant to be
1066+
created multiple times.
1067+
1068+
OpenTelemetry metrics can be configured instead by passing `Temporalio::Runtime::OpenTelemetryMetricsOptions` as the
1069+
`opentelemetry` parameter to the metrics options. See API documentation for details.
1070+
1071+
#### OpenTelemetry Tracing
1072+
1073+
OpenTelemetry tracing for clients, activities, and workflows can be enabled using the
1074+
`Temporalio::Contrib::OpenTelemetry::TracingInterceptor`. Specifically, when creating a client, set the interceptor like
1075+
so:
1076+
1077+
```ruby
1078+
require 'opentelemetry/api'
1079+
require 'opentelemetry/sdk'
1080+
require 'temporalio/client'
1081+
require 'temporalio/contrib/open_telemetry'
1082+
1083+
# ... assumes my_otel_tracer_provider is a tracer provider created by the user
1084+
my_tracer = my_otel_tracer_provider.tracer('my-otel-tracer')
1085+
1086+
my_client = Temporalio::Client.connect(
1087+
'localhost:7233', 'my-namespace',
1088+
interceptors: [Temporalio::Contrib::OpenTelemetry::TracingInterceptor.new(my_tracer)]
1089+
)
1090+
```
1091+
1092+
Now many high-level client calls and activities/workflows on workers using this client will have spans created on that
1093+
OpenTelemetry tracer.
1094+
1095+
##### OpenTelemetry Tracing in Workflows
1096+
1097+
OpenTelemetry works by creating spans as necessary and in some cases serializing them to Temporal headers to be
1098+
deserialized by workflows/activities to be set on the context. However, OpenTelemetry requires spans to be finished
1099+
where they start, so spans cannot be resumed. This is fine for client calls and activity attempts, but Temporal
1100+
workflows are resumable functions that may start on a different machine than they complete. Due to this, spans created
1101+
by workflows are immediately closed since there is no way for the span to actually span machines. They are also not
1102+
created during replay. The spans still become the proper parents of other spans if they are created.
1103+
1104+
Custom spans can be created inside of workflows using class methods on the
1105+
`Temporalio::Contrib::OpenTelemetry::Workflow` module. For example:
1106+
1107+
```ruby
1108+
class MyWorkflow < Temporalio::Workflow::Definition
1109+
def execute
1110+
# Sleep for a bit
1111+
Temporalio::Workflow.sleep(10)
1112+
# Run activity in span
1113+
Temporalio::Contrib::OpenTelemetry::Workflow.with_completed_span(
1114+
'my-span',
1115+
attributes: { 'my-attr' => 'some val' }
1116+
) do
1117+
# Execute an activity
1118+
Temporalio::Workflow.execute_activity(MyActivity, start_to_close_timeout: 10)
1119+
end
1120+
end
1121+
end
1122+
```
1123+
1124+
If this all executes on one worker (because Temporal has a concept of stickiness that caches instances), the span tree
1125+
may look like:
1126+
1127+
```
1128+
StartWorkflow:MyWorkflow <-- created by client outbound
1129+
RunWorkflow:MyWorkflow <-- created inside workflow on first task
1130+
my-span <-- created inside workflow by code
1131+
StartActivity:MyActivity <-- created inside workflow when first called
1132+
RunActivity:MyActivity <-- created inside activity attempt 1
1133+
CompleteWorkflow:MyWorkflow <-- created inside workflow on last task
1134+
```
1135+
1136+
However if, say, the worker crashed during the 10s sleep and the workflow was resumed (i.e. replayed) on another worker,
1137+
the span tree may look like:
1138+
1139+
```
1140+
StartWorkflow:MyWorkflow <-- created by client outbound
1141+
RunWorkflow:MyWorkflow <-- created by workflow inbound on first task
1142+
my-span <-- created inside the workflow
1143+
StartActivity:MyActivity <-- created by workflow outbound
1144+
RunActivity:MyActivity <-- created by activity attempt 1 inbound
1145+
CompleteWorkflow:MyWorkflow <-- created by workflow inbound on last task
1146+
```
1147+
1148+
Notice how the spans are no longer under `RunWorkflow`. This is because spans inside the workflow are not created on
1149+
replay, so there is no parent on replay. But there are no orphans because we still have the overarching parent of
1150+
`StartWorkflow` that was created by the client and is serialized into Temporal headers so it can always be the parent.
1151+
1152+
And reminder that `StartWorkflow` and `RunActivity` spans do last the length of their calls (so time to start the
1153+
workflow and time to run the activity attempt respectively), but the other spans have no measurable time because they
1154+
are created in workflows and closed immediately since long-lived spans cannot work for durable software that may resume
1155+
on other machines.
1156+
10371157
### Ractors
10381158

10391159
It was an original goal to have workflows actually be Ractors for deterministic state isolation and have the library

temporalio/Gemfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ group :development do
1212
gem 'grpc', '~> 1.69'
1313
gem 'grpc-tools', '~> 1.69'
1414
gem 'minitest'
15+
# We are intentionally not pinning OTel versions here so that CI tests the latest. This also means that the OTel
16+
# contrib library also does not require specific versions, we are relying on the compatibility rigor of OTel.
17+
gem 'opentelemetry-api'
18+
gem 'opentelemetry-sdk'
1519
gem 'rake'
1620
gem 'rake-compiler'
1721
gem 'rbs', '~> 3.5.3'

temporalio/lib/temporalio/activity/info.rb

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,9 @@ module Activity
5959
# @return [String] Workflow run ID that started this activity.
6060
# @!attribute workflow_type
6161
# @return [String] Workflow type name that started this activity.
62+
#
63+
# @note WARNING: This class may have required parameters added to its constructor. Users should not instantiate this
64+
# class or it may break in incompatible ways.
6265
class Info; end # rubocop:disable Lint/EmptyClass
6366
end
6467
end

temporalio/lib/temporalio/client.rb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -242,7 +242,7 @@ def start_workflow(
242242
rpc_options: nil
243243
)
244244
@impl.start_workflow(Interceptor::StartWorkflowInput.new(
245-
workflow:,
245+
workflow: Workflow::Definition._workflow_type_from_workflow_parameter(workflow),
246246
args:,
247247
workflow_id: id,
248248
task_queue:,
@@ -386,7 +386,7 @@ def start_update_with_start_workflow(
386386
@impl.start_update_with_start_workflow(
387387
Interceptor::StartUpdateWithStartWorkflowInput.new(
388388
update_id: id,
389-
update:,
389+
update: Workflow::Definition::Update._name_from_parameter(update),
390390
args:,
391391
wait_for_stage:,
392392
start_workflow_operation:,
@@ -449,7 +449,7 @@ def signal_with_start_workflow(
449449
)
450450
@impl.signal_with_start_workflow(
451451
Interceptor::SignalWithStartWorkflowInput.new(
452-
signal:,
452+
signal: Workflow::Definition::Signal._name_from_parameter(signal),
453453
args:,
454454
start_workflow_operation:,
455455
rpc_options:

temporalio/lib/temporalio/client/with_start_workflow_operation.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ def initialize(
5858
headers: {}
5959
)
6060
@options = Options.new(
61-
workflow:,
61+
workflow: Workflow::Definition._workflow_type_from_workflow_parameter(workflow),
6262
args:,
6363
id:,
6464
task_queue:,

temporalio/lib/temporalio/client/workflow_handle.rb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -222,7 +222,7 @@ def signal(signal, *args, rpc_options: nil)
222222
@client._impl.signal_workflow(Interceptor::SignalWorkflowInput.new(
223223
workflow_id: id,
224224
run_id:,
225-
signal:,
225+
signal: Workflow::Definition::Signal._name_from_parameter(signal),
226226
args:,
227227
headers: {},
228228
rpc_options:
@@ -254,7 +254,7 @@ def query(
254254
@client._impl.query_workflow(Interceptor::QueryWorkflowInput.new(
255255
workflow_id: id,
256256
run_id:,
257-
query:,
257+
query: Workflow::Definition::Query._name_from_parameter(query),
258258
args:,
259259
reject_condition:,
260260
headers: {},
@@ -291,7 +291,7 @@ def start_update(
291291
workflow_id: self.id,
292292
run_id:,
293293
update_id: id,
294-
update:,
294+
update: Workflow::Definition::Update._name_from_parameter(update),
295295
args:,
296296
wait_for_stage:,
297297
headers: {},

0 commit comments

Comments
 (0)