Skip to content

Commit 24b41e1

Browse files
pavolloffaymax-cx
authored andcommitted
OBSDOCS-1910: Add docs for Tail Sampling Processor
Signed-off-by: Pavol Loffay <p.loffay@gmail.com>
1 parent 1545132 commit 24b41e1

File tree

1 file changed

+353
-1
lines changed

1 file changed

+353
-1
lines changed

observability/otel/otel-collector/otel-collector-processors.adoc

Lines changed: 353 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Currently, the following General Availability and Technology Preview processors
2222
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#cumulativetodelta-processor_otel-collector-processors[Cumulative-to-Delta Processor]
2323
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#groupbyattrsprocessor-processor_otel-collector-processors[Group-by-Attributes Processor]
2424
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#transform-processor_otel-collector-processors[Transform Processor]
25+
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#tail-sampling-processor_otel-collector-processors[Tail Sampling Processor]
2526
2627
[id="batch-processor_{context}"]
2728
== Batch Processor
@@ -450,6 +451,7 @@ include::snippets/technology-preview.adoc[]
450451

451452
At minimum, configuring this processor involves specifying an array of attribute keys to be used to group spans, log records, or metric datapoints together, as in the following example:
452453

454+
.Example of the OpenTelemetry Collector custom resource when using the Group-by-Attributes Processor
453455
[source,yaml]
454456
----
455457
# ...
@@ -507,7 +509,7 @@ config:
507509
<3> See the following table "Values for the `context` field".
508510
<4> Optional: Conditions for performing a transformation.
509511

510-
.Configuration example
512+
.Example of the OpenTelemetry Collector custom resource when using the Transform Processor
511513
[source,yaml]
512514
----
513515
# ...
@@ -568,6 +570,356 @@ config:
568570
|Returns errors up the pipeline and drops the payload. Implicit default.
569571

570572
|===
573+
574+
[id="tail-sampling-processor_{context}"]
575+
== Tail Sampling Processor
576+
577+
The Tail Sampling Processor samples traces according to user-defined policies when all of the spans are completed. Tail-based sampling enables you to filter the traces of interest and reduce your data ingestion and storage costs.
578+
579+
:FeatureName: The Tail Sampling Processor
580+
include::snippets/technology-preview.adoc[]
581+
582+
This processor reassembles spans into new batches and strips spans of their original context.
583+
584+
[TIP]
585+
====
586+
* In pipelines, place this processor downstream of any processors that rely on context: for example, after the Kubernetes Attributes Processor.
587+
* If scaling the Collector, ensure that one Collector instance receives all spans of the same trace so that this processor makes correct sampling decisions based on the specified sampling policies. You can achieve this by setting up two layers of Collectors: the first layer of Collectors with the Load Balancing Exporter, and the second layer of Collectors with the Tail Sampling Processor.
588+
====
589+
590+
.Example of the OpenTelemetry Collector custom resource when using the Tail Sampling Processor
591+
[source,yaml]
592+
----
593+
# ...
594+
config:
595+
processors:
596+
tail_sampling: # <1>
597+
decision_wait: 30s # <2>
598+
num_traces: 50000 # <3>
599+
expected_new_traces_per_sec: 10 # <4>
600+
policies: # <5>
601+
[
602+
{
603+
<definition_of_policy_1>
604+
},
605+
{
606+
<definition_of_policy_2>
607+
},
608+
{
609+
<definition_of_policy_3>
610+
},
611+
]
612+
# ...
613+
----
614+
<1> Processor name.
615+
<2> Optional: Decision delay time, counted from the time of the first span, before the processor makes a sampling decision on each trace. Defaults to `30s`.
616+
<3> Optional: The number of traces kept in memory. Defaults to `50000`.
617+
<4> Optional: The expected number of new traces per second, which is helpful for allocating data structures. Defaults to `0`.
618+
<5> Definitions of the policies for trace evaluation. The processor evaluates each trace against all of the specified policies and then either samples or drops the trace.
619+
620+
You can choose and combine policies from the following list:
621+
622+
* The following policy samples all traces:
623+
+
624+
[source,yaml]
625+
----
626+
# ...
627+
policies:
628+
[
629+
{
630+
name: <always_sample_policy>,
631+
type: always_sample,
632+
},
633+
]
634+
# ...
635+
----
636+
637+
* The following policy samples only traces of a duration that is within a specified range:
638+
+
639+
[source,yaml]
640+
----
641+
# ...
642+
policies:
643+
[
644+
{
645+
name: <latency_policy>,
646+
type: latency,
647+
latency: {threshold_ms: 5000, upper_threshold_ms: 10000} # <1>
648+
},
649+
]
650+
# ...
651+
----
652+
<1> The provided `5000` and `10000` values are examples. You can estimate the desired latency values by looking at the earliest start time value and latest end time value. If you omit the `upper_threshold_ms` field, this policy samples all latencies greater than the specified `threshold_ms` value.
653+
654+
* The following policy samples traces by numeric value matches for resource and record attributes:
655+
+
656+
[source,yaml]
657+
----
658+
# ...
659+
policies:
660+
[
661+
{
662+
name: <numeric_attribute_policy>,
663+
type: numeric_attribute,
664+
numeric_attribute: {key: <key1>, min_value: 50, max_value: 100} # <1>
665+
},
666+
]
667+
# ...
668+
----
669+
<1> The provided `50` and `100` values are examples.
670+
671+
* The following policy samples only a percentage of traces:
672+
+
673+
[source,yaml]
674+
----
675+
# ...
676+
policies:
677+
[
678+
{
679+
name: <probabilistic_policy>,
680+
type: probabilistic,
681+
probabilistic: {sampling_percentage: 10} # <1>
682+
},
683+
]
684+
# ...
685+
----
686+
<1> The provided `10` value is an example.
687+
688+
* The following policy samples traces by the status code: `OK`, `ERROR`, or `UNSET`:
689+
+
690+
[source,yaml]
691+
----
692+
# ...
693+
policies:
694+
[
695+
{
696+
name: <status_code_policy>,
697+
type: status_code,
698+
status_code: {status_codes: [ERROR, UNSET]}
699+
},
700+
]
701+
# ...
702+
----
703+
704+
* The following policy samples traces by string value matches for resource and record attributes:
705+
+
706+
[source,yaml]
707+
----
708+
# ...
709+
policies:
710+
[
711+
{
712+
name: <string_attribute_policy>,
713+
type: string_attribute,
714+
string_attribute: {key: <key2>, values: [<value1>, <val>*], enabled_regex_matching: true, cache_max_size: 10} # <1>
715+
},
716+
]
717+
# ...
718+
----
719+
<1> This policy definition supports both exact and regular-expression value matches. The provided `10` value in the `cache_max_size` field is an example.
720+
721+
* The following policy samples traces by the rate of spans per second:
722+
+
723+
[source,yaml]
724+
----
725+
# ...
726+
policies:
727+
[
728+
{
729+
name: <rate_limiting_policy>,
730+
type: rate_limiting,
731+
rate_limiting: {spans_per_second: 35} # <1>
732+
},
733+
]
734+
# ...
735+
----
736+
<1> The provided `35` value is an example.
737+
738+
* The following policy samples traces by the minimum and maximum number of spans inclusively:
739+
+
740+
[source,yaml]
741+
----
742+
# ...
743+
policies:
744+
[
745+
{
746+
name: <span_count_policy>,
747+
type: span_count,
748+
span_count: {min_spans: 2, max_spans: 20} # <1>
749+
},
750+
]
751+
# ...
752+
----
753+
<1> If the sum of all spans in the trace is outside the range threshold, the trace is not sampled. The provided `2` and `20` values are examples.
754+
755+
* The following policy samples traces by `TraceState` value matches:
756+
+
757+
[source,yaml]
758+
----
759+
# ...
760+
policies:
761+
[
762+
{
763+
name: <trace_state_policy>,
764+
type: trace_state,
765+
trace_state: { key: <key3>, values: [<value1>, <value2>] }
766+
},
767+
]
768+
# ...
769+
----
770+
771+
* The following policy samples traces by a boolean attribute (resource and record):
772+
+
773+
[source,yaml]
774+
----
775+
# ...
776+
policies:
777+
[
778+
{
779+
name: <bool_attribute_policy>,
780+
type: boolean_attribute,
781+
boolean_attribute: {key: <key4>, value: true}
782+
},
783+
]
784+
# ...
785+
----
786+
787+
* The following policy samples traces by a given boolean OTTL condition for a span or span event:
788+
+
789+
[source,yaml]
790+
----
791+
# ...
792+
policies:
793+
[
794+
{
795+
name: <ottl_policy>,
796+
type: ottl_condition,
797+
ottl_condition: {
798+
error_mode: ignore,
799+
span: [
800+
"attributes[\"<test_attr_key_1>\"] == \"<test_attr_value_1>\"",
801+
"attributes[\"<test_attr_key_2>\"] != \"<test_attr_value_1>\"",
802+
],
803+
spanevent: [
804+
"name != \"<test_span_event_name>\"",
805+
"attributes[\"<test_event_attr_key_2>\"] != \"<test_event_attr_value_1>\"",
806+
]
807+
}
808+
},
809+
]
810+
# ...
811+
----
812+
813+
* The following is an `AND` policy that samples traces based on a combination of multiple policies:
814+
+
815+
[source,yaml]
816+
----
817+
# ...
818+
policies:
819+
[
820+
{
821+
name: <and_policy>,
822+
type: and,
823+
and: {
824+
and_sub_policy:
825+
[
826+
{
827+
name: <and_policy_1>,
828+
type: numeric_attribute,
829+
numeric_attribute: { key: <key1>, min_value: 50, max_value: 100 } # <1>
830+
},
831+
{
832+
name: <and_policy_2>,
833+
type: string_attribute,
834+
string_attribute: { key: <key2>, values: [ <value1>, <value2> ] }
835+
},
836+
]
837+
}
838+
},
839+
]
840+
# ...
841+
----
842+
<1> The provided `50` and `100` values are examples.
843+
844+
* The following is a `DROP` policy that drops traces from sampling based on a combination of multiple policies:
845+
+
846+
[source,yaml]
847+
----
848+
# ...
849+
policies:
850+
[
851+
{
852+
name: <drop_policy>,
853+
type: drop,
854+
drop: {
855+
drop_sub_policy:
856+
[
857+
{
858+
name: <drop_policy_1>,
859+
type: string_attribute,
860+
string_attribute: {key: url.path, values: [\/health, \/metrics], enabled_regex_matching: true}
861+
}
862+
]
863+
}
864+
},
865+
]
866+
# ...
867+
----
868+
869+
* The following policy samples traces by a combination of the previous samplers and with ordering and rate allocation per sampler:
870+
+
871+
[source,yaml]
872+
----
873+
# ...
874+
policies:
875+
[
876+
{
877+
name: <composite_policy>,
878+
type: composite,
879+
composite:
880+
{
881+
max_total_spans_per_second: 100, # <1>
882+
policy_order: [<composite_policy_1>, <composite_policy_2>, <composite_policy_3>],
883+
composite_sub_policy:
884+
[
885+
{
886+
name: <composite_policy_1>,
887+
type: numeric_attribute,
888+
numeric_attribute: {key: <key1>, min_value: 50}
889+
},
890+
{
891+
name: <composite_policy_2>,
892+
type: string_attribute,
893+
string_attribute: {key: <key2>, values: [<value1>, <value2>]}
894+
},
895+
{
896+
name: <composite_policy_3>,
897+
type: always_sample
898+
}
899+
],
900+
rate_allocation:
901+
[
902+
{
903+
policy: <composite_policy_1>,
904+
percent: 50 # <1>
905+
},
906+
{
907+
policy: <composite_policy_2>,
908+
percent: 25
909+
}
910+
]
911+
}
912+
},
913+
]
914+
# ...
915+
----
916+
<1> Allocates percentages of spans according to the order of applied policies. For example, if you set the `100` value in the `max_total_spans_per_second` field, you can set the following values in the `rate_allocation` section: the `50` percent value in the `policy: <composite_policy_1>` section to allocate 50 spans per second, and the `25` percent value in the `policy: <composite_policy_2>` section to allocate 25 spans per second. To fill the remaining capacity, you can set the `always_sample` value in the `type` field of the `name: <composite_policy_3>` section.
917+
918+
[role="_additional-resources"]
919+
.Additional resources
920+
* link:https://opentelemetry.io/blog/2022/tail-sampling/[Tail Sampling with OpenTelemetry: Why it’s useful, how to do it, and what to consider] (OpenTelemetry Blog)
921+
* link:https://opentelemetry.io/docs/collector/deployment/gateway/[Gateway] (OpenTelemetry Documentation)
922+
571923
[role="_additional-resources"]
572924
[id="additional-resources_{context}"]
573925
== Additional resources

0 commit comments

Comments
 (0)