Skip to content

Commit d17f1c6

Browse files
authored
Merge pull request #93378 from max-cx/OBSDOCS-1910
OBSDOCS-1910: Add docs for Tail Sampling Processor
2 parents 2034e9d + 24b41e1 commit d17f1c6

File tree

1 file changed

+353
-1
lines changed

1 file changed

+353
-1
lines changed

observability/otel/otel-collector/otel-collector-processors.adoc

Lines changed: 353 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ Currently, the following General Availability and Technology Preview processors
2222
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#cumulativetodelta-processor_otel-collector-processors[Cumulative-to-Delta Processor]
2323
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#groupbyattrsprocessor-processor_otel-collector-processors[Group-by-Attributes Processor]
2424
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#transform-processor_otel-collector-processors[Transform Processor]
25+
- xref:../../../observability/otel/otel-collector/otel-collector-processors.adoc#tail-sampling-processor_otel-collector-processors[Tail Sampling Processor]
2526
2627
[id="batch-processor_{context}"]
2728
== Batch Processor
@@ -454,6 +455,7 @@ include::snippets/technology-preview.adoc[]
454455

455456
At minimum, configuring this processor involves specifying an array of attribute keys to be used to group spans, log records, or metric datapoints together, as in the following example:
456457

458+
.Example of the OpenTelemetry Collector custom resource when using the Group-by-Attributes Processor
457459
[source,yaml]
458460
----
459461
# ...
@@ -511,7 +513,7 @@ config:
511513
<3> See the following table "Values for the `context` field".
512514
<4> Optional: Conditions for performing a transformation.
513515

514-
.Configuration example
516+
.Example of the OpenTelemetry Collector custom resource when using the Transform Processor
515517
[source,yaml]
516518
----
517519
# ...
@@ -572,6 +574,356 @@ config:
572574
|Returns errors up the pipeline and drops the payload. Implicit default.
573575

574576
|===
577+
578+
[id="tail-sampling-processor_{context}"]
579+
== Tail Sampling Processor
580+
581+
The Tail Sampling Processor samples traces according to user-defined policies when all of the spans are completed. Tail-based sampling enables you to filter the traces of interest and reduce your data ingestion and storage costs.
582+
583+
:FeatureName: The Tail Sampling Processor
584+
include::snippets/technology-preview.adoc[]
585+
586+
This processor reassembles spans into new batches and strips spans of their original context.
587+
588+
[TIP]
589+
====
590+
* In pipelines, place this processor downstream of any processors that rely on context: for example, after the Kubernetes Attributes Processor.
591+
* If scaling the Collector, ensure that one Collector instance receives all spans of the same trace so that this processor makes correct sampling decisions based on the specified sampling policies. You can achieve this by setting up two layers of Collectors: the first layer of Collectors with the Load Balancing Exporter, and the second layer of Collectors with the Tail Sampling Processor.
592+
====
593+
594+
.Example of the OpenTelemetry Collector custom resource when using the Tail Sampling Processor
595+
[source,yaml]
596+
----
597+
# ...
598+
config:
599+
processors:
600+
tail_sampling: # <1>
601+
decision_wait: 30s # <2>
602+
num_traces: 50000 # <3>
603+
expected_new_traces_per_sec: 10 # <4>
604+
policies: # <5>
605+
[
606+
{
607+
<definition_of_policy_1>
608+
},
609+
{
610+
<definition_of_policy_2>
611+
},
612+
{
613+
<definition_of_policy_3>
614+
},
615+
]
616+
# ...
617+
----
618+
<1> Processor name.
619+
<2> Optional: Decision delay time, counted from the time of the first span, before the processor makes a sampling decision on each trace. Defaults to `30s`.
620+
<3> Optional: The number of traces kept in memory. Defaults to `50000`.
621+
<4> Optional: The expected number of new traces per second, which is helpful for allocating data structures. Defaults to `0`.
622+
<5> Definitions of the policies for trace evaluation. The processor evaluates each trace against all of the specified policies and then either samples or drops the trace.
623+
624+
You can choose and combine policies from the following list:
625+
626+
* The following policy samples all traces:
627+
+
628+
[source,yaml]
629+
----
630+
# ...
631+
policies:
632+
[
633+
{
634+
name: <always_sample_policy>,
635+
type: always_sample,
636+
},
637+
]
638+
# ...
639+
----
640+
641+
* The following policy samples only traces of a duration that is within a specified range:
642+
+
643+
[source,yaml]
644+
----
645+
# ...
646+
policies:
647+
[
648+
{
649+
name: <latency_policy>,
650+
type: latency,
651+
latency: {threshold_ms: 5000, upper_threshold_ms: 10000} # <1>
652+
},
653+
]
654+
# ...
655+
----
656+
<1> The provided `5000` and `10000` values are examples. You can estimate the desired latency values by looking at the earliest start time value and latest end time value. If you omit the `upper_threshold_ms` field, this policy samples all latencies greater than the specified `threshold_ms` value.
657+
658+
* The following policy samples traces by numeric value matches for resource and record attributes:
659+
+
660+
[source,yaml]
661+
----
662+
# ...
663+
policies:
664+
[
665+
{
666+
name: <numeric_attribute_policy>,
667+
type: numeric_attribute,
668+
numeric_attribute: {key: <key1>, min_value: 50, max_value: 100} # <1>
669+
},
670+
]
671+
# ...
672+
----
673+
<1> The provided `50` and `100` values are examples.
674+
675+
* The following policy samples only a percentage of traces:
676+
+
677+
[source,yaml]
678+
----
679+
# ...
680+
policies:
681+
[
682+
{
683+
name: <probabilistic_policy>,
684+
type: probabilistic,
685+
probabilistic: {sampling_percentage: 10} # <1>
686+
},
687+
]
688+
# ...
689+
----
690+
<1> The provided `10` value is an example.
691+
692+
* The following policy samples traces by the status code: `OK`, `ERROR`, or `UNSET`:
693+
+
694+
[source,yaml]
695+
----
696+
# ...
697+
policies:
698+
[
699+
{
700+
name: <status_code_policy>,
701+
type: status_code,
702+
status_code: {status_codes: [ERROR, UNSET]}
703+
},
704+
]
705+
# ...
706+
----
707+
708+
* The following policy samples traces by string value matches for resource and record attributes:
709+
+
710+
[source,yaml]
711+
----
712+
# ...
713+
policies:
714+
[
715+
{
716+
name: <string_attribute_policy>,
717+
type: string_attribute,
718+
string_attribute: {key: <key2>, values: [<value1>, <val>*], enabled_regex_matching: true, cache_max_size: 10} # <1>
719+
},
720+
]
721+
# ...
722+
----
723+
<1> This policy definition supports both exact and regular-expression value matches. The provided `10` value in the `cache_max_size` field is an example.
724+
725+
* The following policy samples traces by the rate of spans per second:
726+
+
727+
[source,yaml]
728+
----
729+
# ...
730+
policies:
731+
[
732+
{
733+
name: <rate_limiting_policy>,
734+
type: rate_limiting,
735+
rate_limiting: {spans_per_second: 35} # <1>
736+
},
737+
]
738+
# ...
739+
----
740+
<1> The provided `35` value is an example.
741+
742+
* The following policy samples traces by the minimum and maximum number of spans inclusively:
743+
+
744+
[source,yaml]
745+
----
746+
# ...
747+
policies:
748+
[
749+
{
750+
name: <span_count_policy>,
751+
type: span_count,
752+
span_count: {min_spans: 2, max_spans: 20} # <1>
753+
},
754+
]
755+
# ...
756+
----
757+
<1> If the sum of all spans in the trace is outside the range threshold, the trace is not sampled. The provided `2` and `20` values are examples.
758+
759+
* The following policy samples traces by `TraceState` value matches:
760+
+
761+
[source,yaml]
762+
----
763+
# ...
764+
policies:
765+
[
766+
{
767+
name: <trace_state_policy>,
768+
type: trace_state,
769+
trace_state: { key: <key3>, values: [<value1>, <value2>] }
770+
},
771+
]
772+
# ...
773+
----
774+
775+
* The following policy samples traces by a boolean attribute (resource and record):
776+
+
777+
[source,yaml]
778+
----
779+
# ...
780+
policies:
781+
[
782+
{
783+
name: <bool_attribute_policy>,
784+
type: boolean_attribute,
785+
boolean_attribute: {key: <key4>, value: true}
786+
},
787+
]
788+
# ...
789+
----
790+
791+
* The following policy samples traces by a given boolean OTTL condition for a span or span event:
792+
+
793+
[source,yaml]
794+
----
795+
# ...
796+
policies:
797+
[
798+
{
799+
name: <ottl_policy>,
800+
type: ottl_condition,
801+
ottl_condition: {
802+
error_mode: ignore,
803+
span: [
804+
"attributes[\"<test_attr_key_1>\"] == \"<test_attr_value_1>\"",
805+
"attributes[\"<test_attr_key_2>\"] != \"<test_attr_value_1>\"",
806+
],
807+
spanevent: [
808+
"name != \"<test_span_event_name>\"",
809+
"attributes[\"<test_event_attr_key_2>\"] != \"<test_event_attr_value_1>\"",
810+
]
811+
}
812+
},
813+
]
814+
# ...
815+
----
816+
817+
* The following is an `AND` policy that samples traces based on a combination of multiple policies:
818+
+
819+
[source,yaml]
820+
----
821+
# ...
822+
policies:
823+
[
824+
{
825+
name: <and_policy>,
826+
type: and,
827+
and: {
828+
and_sub_policy:
829+
[
830+
{
831+
name: <and_policy_1>,
832+
type: numeric_attribute,
833+
numeric_attribute: { key: <key1>, min_value: 50, max_value: 100 } # <1>
834+
},
835+
{
836+
name: <and_policy_2>,
837+
type: string_attribute,
838+
string_attribute: { key: <key2>, values: [ <value1>, <value2> ] }
839+
},
840+
]
841+
}
842+
},
843+
]
844+
# ...
845+
----
846+
<1> The provided `50` and `100` values are examples.
847+
848+
* The following is a `DROP` policy that drops traces from sampling based on a combination of multiple policies:
849+
+
850+
[source,yaml]
851+
----
852+
# ...
853+
policies:
854+
[
855+
{
856+
name: <drop_policy>,
857+
type: drop,
858+
drop: {
859+
drop_sub_policy:
860+
[
861+
{
862+
name: <drop_policy_1>,
863+
type: string_attribute,
864+
string_attribute: {key: url.path, values: [\/health, \/metrics], enabled_regex_matching: true}
865+
}
866+
]
867+
}
868+
},
869+
]
870+
# ...
871+
----
872+
873+
* The following policy samples traces by a combination of the previous samplers and with ordering and rate allocation per sampler:
874+
+
875+
[source,yaml]
876+
----
877+
# ...
878+
policies:
879+
[
880+
{
881+
name: <composite_policy>,
882+
type: composite,
883+
composite:
884+
{
885+
max_total_spans_per_second: 100, # <1>
886+
policy_order: [<composite_policy_1>, <composite_policy_2>, <composite_policy_3>],
887+
composite_sub_policy:
888+
[
889+
{
890+
name: <composite_policy_1>,
891+
type: numeric_attribute,
892+
numeric_attribute: {key: <key1>, min_value: 50}
893+
},
894+
{
895+
name: <composite_policy_2>,
896+
type: string_attribute,
897+
string_attribute: {key: <key2>, values: [<value1>, <value2>]}
898+
},
899+
{
900+
name: <composite_policy_3>,
901+
type: always_sample
902+
}
903+
],
904+
rate_allocation:
905+
[
906+
{
907+
policy: <composite_policy_1>,
908+
percent: 50 # <1>
909+
},
910+
{
911+
policy: <composite_policy_2>,
912+
percent: 25
913+
}
914+
]
915+
}
916+
},
917+
]
918+
# ...
919+
----
920+
<1> Allocates percentages of spans according to the order of applied policies. For example, if you set the `100` value in the `max_total_spans_per_second` field, you can set the following values in the `rate_allocation` section: the `50` percent value in the `policy: <composite_policy_1>` section to allocate 50 spans per second, and the `25` percent value in the `policy: <composite_policy_2>` section to allocate 25 spans per second. To fill the remaining capacity, you can set the `always_sample` value in the `type` field of the `name: <composite_policy_3>` section.
921+
922+
[role="_additional-resources"]
923+
.Additional resources
924+
* link:https://opentelemetry.io/blog/2022/tail-sampling/[Tail Sampling with OpenTelemetry: Why it’s useful, how to do it, and what to consider] (OpenTelemetry Blog)
925+
* link:https://opentelemetry.io/docs/collector/deployment/gateway/[Gateway] (OpenTelemetry Documentation)
926+
575927
[role="_additional-resources"]
576928
[id="additional-resources_{context}"]
577929
== Additional resources

0 commit comments

Comments
 (0)