refactor: split collect_chunks into two methods #1021

paullegranddc · 2025-04-14T18:23:54Z

What does this PR do?

This refactor splits the logic in collect_trace_chunks between the trace exporter spans (v04 and v05) and the mini agent spans (pb::Spans).
it completely removes usage of the TraceCollection struct from data-pipeline, and instead introduces the TraceChunks enum to differentiate between v04 and v05.

Currently the way the code is structured makes replacing ByteString with the slice harder due to shared lifetime.
Furthermore, the enums encodes two different codepaths, the spanBytes and span pb which never interact with each other. So having function that handle both span bytes and pb spans is pure complexity overhead.
This refactor also removes a bunch of panics and lines of code that were here because to handle the "fake" pb spans and trace exporter spans overlap, which is practice never happens.

Lastly, this remove the TracerParams struct. Every occurence of if was creating it, and invoking TryInto<TracerCollection> just after on it. So replacing it by a simple function is a lot less complex for the same feature set.

Motivation

Prepare for using SpanSlice<'a> instead of SpanBytes in the trace exporter.

Additional Notes

Anything else we should know when reviewing?

How to test the change?

Describe here in detail how the change can be validated.

pr-commenter · 2025-04-14T18:35:56Z

Benchmarks

Comparison

Benchmark execution time: 2025-04-16 12:02:48

Comparing candidate commit edfc921 in PR branch paullgdc/data-pipeline/split_trace_collect with baseline commit daf50ad in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 2 unstable metrics.

Candidate

Candidate benchmark details

Group 1

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	503.697µs	505.205µs ± 0.781µs	505.200µs ± 0.263µs	505.436µs	505.760µs	506.119µs	514.639µs	1.87%	8.753	104.601	0.15%	0.055µs	1	200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	1943108.378op/s	1979398.440op/s ± 3021.024op/s	1979414.746op/s ± 1030.018op/s	1980502.993op/s	1982269.187op/s	1984041.573op/s	1985321.635op/s	0.30%	-8.615	102.465	0.15%	213.619op/s	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	453.838µs	454.903µs ± 0.468µs	454.883µs ± 0.313µs	455.230µs	455.615µs	455.951µs	456.198µs	0.29%	0.099	-0.352	0.10%	0.033µs	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	2192030.173op/s	2198272.717op/s ± 2262.255op/s	2198365.559op/s ± 1511.630op/s	2199726.959op/s	2201880.407op/s	2203232.537op/s	2203429.175op/s	0.23%	-0.094	-0.354	0.10%	159.966op/s	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	176.782µs	177.389µs ± 0.254µs	177.418µs ± 0.202µs	177.596µs	177.733µs	177.822µs	177.991µs	0.32%	-0.316	-0.669	0.14%	0.018µs	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	5618264.604op/s	5637326.719op/s ± 8076.045op/s	5636405.171op/s ± 6420.246op/s	5643434.127op/s	5652315.420op/s	5655443.442op/s	5656692.885op/s	0.36%	0.322	-0.666	0.14%	571.063op/s	1	200
normalization/normalize_service/normalize_service/[empty string]	execution_time	37.580µs	37.682µs ± 0.041µs	37.677µs ± 0.027µs	37.711µs	37.748µs	37.799µs	37.840µs	0.43%	0.556	0.655	0.11%	0.003µs	1	200
normalization/normalize_service/normalize_service/[empty string]	throughput	26426718.039op/s	26537709.420op/s ± 28663.920op/s	26541456.970op/s ± 19292.746op/s	26557877.406op/s	26580409.385op/s	26588719.840op/s	26609910.884op/s	0.26%	-0.548	0.638	0.11%	2026.845op/s	1	200
normalization/normalize_service/normalize_service/test_ASCII	execution_time	48.085µs	48.294µs ± 0.221µs	48.118µs ± 0.031µs	48.531µs	48.618µs	48.684µs	48.739µs	1.29%	0.365	-1.665	0.46%	0.016µs	1	200
normalization/normalize_service/normalize_service/test_ASCII	throughput	20517524.108op/s	20707027.035op/s ± 94798.346op/s	20782397.927op/s ± 13488.053op/s	20792430.367op/s	20795040.086op/s	20795944.720op/s	20796377.865op/s	0.07%	-0.362	-1.669	0.46%	6703.255op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	[505.097µs; 505.313µs] or [-0.021%; +0.021%]	None	None	None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	[1978979.756op/s; 1979817.125op/s] or [-0.021%; +0.021%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	[454.838µs; 454.968µs] or [-0.014%; +0.014%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	[2197959.190op/s; 2198586.244op/s] or [-0.014%; +0.014%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	[177.354µs; 177.425µs] or [-0.020%; +0.020%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	[5636207.457op/s; 5638445.981op/s] or [-0.020%; +0.020%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	execution_time	[37.677µs; 37.688µs] or [-0.015%; +0.015%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	throughput	[26533736.876op/s; 26541681.963op/s] or [-0.015%; +0.015%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	execution_time	[48.263µs; 48.324µs] or [-0.064%; +0.064%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	throughput	[20693888.895op/s; 20720165.174op/s] or [-0.063%; +0.063%]	None	None	None

Group 2

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
two way interface	execution_time	17.720µs	26.170µs ± 11.089µs	17.921µs ± 0.096µs	35.158µs	44.296µs	46.728µs	90.155µs	403.06%	1.660	5.320	42.27%	0.784µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
two way interface	execution_time	[24.633µs; 27.707µs] or [-5.872%; +5.872%]	None	None	None

Group 3

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
write only interface	execution_time	1.170µs	3.204µs ± 1.425µs	3.011µs ± 0.028µs	3.034µs	3.666µs	13.845µs	14.953µs	396.69%	7.386	55.572	44.37%	0.101µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
write only interface	execution_time	[3.007µs; 3.402µs] or [-6.165%; +6.165%]	None	None	None

Group 4

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
redis/obfuscate_redis_string	execution_time	33.042µs	33.926µs ± 1.097µs	33.263µs ± 0.144µs	35.344µs	35.701µs	35.986µs	36.565µs	9.93%	0.918	-1.034	3.23%	0.078µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
redis/obfuscate_redis_string	execution_time	[33.774µs; 34.079µs] or [-0.448%; +0.448%]	None	None	None

Group 5

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
concentrator/add_spans_to_concentrator	execution_time	5.976ms	5.989ms ± 0.011ms	5.987ms ± 0.003ms	5.991ms	5.999ms	6.029ms	6.098ms	1.84%	6.064	48.938	0.19%	0.001ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
concentrator/add_spans_to_concentrator	execution_time	[5.988ms; 5.991ms] or [-0.026%; +0.026%]	None	None	None

Group 6

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching string interning on wordpress profile	execution_time	148.564µs	149.530µs ± 0.274µs	149.525µs ± 0.136µs	149.641µs	149.936µs	150.277µs	150.739µs	0.81%	0.346	2.950	0.18%	0.019µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching string interning on wordpress profile	execution_time	[149.492µs; 149.568µs] or [-0.025%; +0.025%]	None	None	None

Group 7

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
credit_card/is_card_number/	execution_time	3.894µs	3.913µs ± 0.003µs	3.913µs ± 0.001µs	3.914µs	3.917µs	3.919µs	3.921µs	0.21%	-1.482	9.380	0.07%	0.000µs	1	200
credit_card/is_card_number/	throughput	255036013.360op/s	255575810.444op/s ± 184088.588op/s	255569012.258op/s ± 89910.473op/s	255657363.122op/s	255851285.739op/s	256009354.430op/s	256811766.319op/s	0.49%	1.502	9.518	0.07%	13017.029op/s	1	200
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	76.578µs	77.627µs ± 0.468µs	77.613µs ± 0.331µs	77.904µs	78.474µs	78.716µs	78.935µs	1.70%	0.324	-0.285	0.60%	0.033µs	1	200
credit_card/is_card_number/ 3782-8224-6310-005	throughput	12668714.874op/s	12882587.146op/s ± 77507.656op/s	12884368.727op/s ± 55222.645op/s	12941337.556op/s	12998702.070op/s	13024951.649op/s	13058649.533op/s	1.35%	-0.295	-0.312	0.60%	5480.619op/s	1	200
credit_card/is_card_number/ 378282246310005	execution_time	70.929µs	71.416µs ± 0.289µs	71.391µs ± 0.181µs	71.569µs	71.973µs	72.322µs	72.374µs	1.38%	0.777	0.650	0.40%	0.020µs	1	200
credit_card/is_card_number/ 378282246310005	throughput	13817154.556op/s	14002739.373op/s ± 56581.608op/s	14007436.061op/s ± 35600.469op/s	14044691.304op/s	14080610.980op/s	14095272.954op/s	14098635.252op/s	0.65%	-0.752	0.590	0.40%	4000.924op/s	1	200
credit_card/is_card_number/37828224631	execution_time	3.893µs	3.912µs ± 0.003µs	3.913µs ± 0.002µs	3.914µs	3.918µs	3.919µs	3.919µs	0.15%	-1.216	6.721	0.08%	0.000µs	1	200
credit_card/is_card_number/37828224631	throughput	255185393.136op/s	255591667.256op/s ± 204265.381op/s	255572367.169op/s ± 114541.448op/s	255694630.501op/s	255912974.642op/s	255976972.340op/s	256885320.245op/s	0.51%	1.233	6.841	0.08%	14443.744op/s	1	200
credit_card/is_card_number/378282246310005	execution_time	67.079µs	67.661µs ± 0.331µs	67.631µs ± 0.200µs	67.860µs	68.256µs	68.416µs	69.096µs	2.17%	0.616	0.957	0.49%	0.023µs	1	200
credit_card/is_card_number/378282246310005	throughput	14472701.307op/s	14779847.882op/s ± 72047.837op/s	14786036.808op/s ± 43643.418op/s	14825226.613op/s	14893568.547op/s	14907154.835op/s	14907823.817op/s	0.82%	-0.579	0.841	0.49%	5094.551op/s	1	200
credit_card/is_card_number/37828224631000521389798	execution_time	51.745µs	51.835µs ± 0.040µs	51.838µs ± 0.025µs	51.860µs	51.903µs	51.927µs	51.942µs	0.20%	0.024	-0.165	0.08%	0.003µs	1	200
credit_card/is_card_number/37828224631000521389798	throughput	19252118.057op/s	19291866.555op/s ± 14818.334op/s	19290852.282op/s ± 9387.288op/s	19300650.433op/s	19318856.943op/s	19323975.257op/s	19325529.024op/s	0.18%	-0.020	-0.167	0.08%	1047.814op/s	1	200
credit_card/is_card_number/x371413321323331	execution_time	6.026µs	6.057µs ± 0.030µs	6.041µs ± 0.008µs	6.075µs	6.116µs	6.153µs	6.191µs	2.48%	1.453	2.139	0.49%	0.002µs	1	200
credit_card/is_card_number/x371413321323331	throughput	161530562.248op/s	165091701.019op/s ± 812731.235op/s	165541997.133op/s ± 231590.836op/s	165670972.590op/s	165818165.139op/s	165926772.292op/s	165948894.977op/s	0.25%	-1.424	1.993	0.49%	57468.777op/s	1	200
credit_card/is_card_number_no_luhn/	execution_time	3.896µs	3.913µs ± 0.002µs	3.913µs ± 0.001µs	3.914µs	3.916µs	3.917µs	3.919µs	0.16%	-2.579	19.336	0.05%	0.000µs	1	200
credit_card/is_card_number_no_luhn/	throughput	255143726.474op/s	255560423.569op/s ± 140920.351op/s	255545083.376op/s ± 67066.592op/s	255623709.099op/s	255772148.804op/s	255876530.416op/s	256686528.406op/s	0.45%	2.603	19.562	0.06%	9964.574op/s	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	65.880µs	66.330µs ± 0.212µs	66.312µs ± 0.123µs	66.437µs	66.719µs	66.922µs	67.038µs	1.09%	0.685	0.675	0.32%	0.015µs	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	14916867.114op/s	15076199.137op/s ± 48142.835op/s	15080134.938op/s ± 28006.033op/s	15106720.225op/s	15149219.080op/s	15168664.652op/s	15179076.919op/s	0.66%	-0.664	0.635	0.32%	3404.213op/s	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	59.484µs	59.667µs ± 0.070µs	59.661µs ± 0.040µs	59.710µs	59.776µs	59.796µs	60.177µs	0.86%	1.860	12.701	0.12%	0.005µs	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	16617674.202op/s	16759653.418op/s ± 19630.006op/s	16761365.890op/s ± 11118.305op/s	16771319.368op/s	16787281.539op/s	16797783.512op/s	16811327.472op/s	0.30%	-1.821	12.371	0.12%	1388.051op/s	1	200
credit_card/is_card_number_no_luhn/37828224631	execution_time	3.896µs	3.913µs ± 0.003µs	3.914µs ± 0.002µs	3.915µs	3.918µs	3.920µs	3.921µs	0.19%	-1.100	6.372	0.07%	0.000µs	1	200
credit_card/is_card_number_no_luhn/37828224631	throughput	255032654.357op/s	255526716.100op/s ± 190642.518op/s	255520022.293op/s ± 112408.078op/s	255634612.960op/s	255815003.567op/s	255959196.486op/s	256702047.560op/s	0.46%	1.116	6.471	0.07%	13480.462op/s	1	200
credit_card/is_card_number_no_luhn/378282246310005	execution_time	56.179µs	56.410µs ± 0.118µs	56.395µs ± 0.074µs	56.473µs	56.615µs	56.702µs	56.837µs	0.78%	0.606	0.141	0.21%	0.008µs	1	200
credit_card/is_card_number_no_luhn/378282246310005	throughput	17594303.241op/s	17727513.397op/s ± 37082.880op/s	17732136.454op/s ± 23343.877op/s	17752608.967op/s	17779502.900op/s	17792568.669op/s	17800119.261op/s	0.38%	-0.594	0.118	0.21%	2622.156op/s	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	51.718µs	51.829µs ± 0.035µs	51.830µs ± 0.019µs	51.849µs	51.878µs	51.911µs	51.972µs	0.27%	0.042	1.526	0.07%	0.002µs	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	19241062.701op/s	19294403.068op/s ± 13101.078op/s	19293942.667op/s ± 6944.031op/s	19300387.938op/s	19318585.252op/s	19325093.982op/s	19335602.645op/s	0.22%	-0.035	1.516	0.07%	926.386op/s	1	200
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	6.026µs	6.058µs ± 0.025µs	6.043µs ± 0.011µs	6.076µs	6.113µs	6.120µs	6.134µs	1.50%	0.875	-0.193	0.42%	0.002µs	1	200
credit_card/is_card_number_no_luhn/x371413321323331	throughput	163035025.292op/s	165082001.677op/s ± 690477.846op/s	165475558.185op/s ± 305163.394op/s	165641035.822op/s	165780902.083op/s	165881350.383op/s	165958013.225op/s	0.29%	-0.862	-0.230	0.42%	48824.157op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
credit_card/is_card_number/	execution_time	[3.912µs; 3.913µs] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number/	throughput	[255550297.536op/s; 255601323.352op/s] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	[77.562µs; 77.692µs] or [-0.084%; +0.084%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	throughput	[12871845.331op/s; 12893328.962op/s] or [-0.083%; +0.083%]	None	None	None
credit_card/is_card_number/ 378282246310005	execution_time	[71.376µs; 71.456µs] or [-0.056%; +0.056%]	None	None	None
credit_card/is_card_number/ 378282246310005	throughput	[13994897.706op/s; 14010581.039op/s] or [-0.056%; +0.056%]	None	None	None
credit_card/is_card_number/37828224631	execution_time	[3.912µs; 3.913µs] or [-0.011%; +0.011%]	None	None	None
credit_card/is_card_number/37828224631	throughput	[255563358.038op/s; 255619976.473op/s] or [-0.011%; +0.011%]	None	None	None
credit_card/is_card_number/378282246310005	execution_time	[67.615µs; 67.707µs] or [-0.068%; +0.068%]	None	None	None
credit_card/is_card_number/378282246310005	throughput	[14769862.744op/s; 14789833.019op/s] or [-0.068%; +0.068%]	None	None	None
credit_card/is_card_number/37828224631000521389798	execution_time	[51.830µs; 51.841µs] or [-0.011%; +0.011%]	None	None	None
credit_card/is_card_number/37828224631000521389798	throughput	[19289812.876op/s; 19293920.233op/s] or [-0.011%; +0.011%]	None	None	None
credit_card/is_card_number/x371413321323331	execution_time	[6.053µs; 6.062µs] or [-0.069%; +0.069%]	None	None	None
credit_card/is_card_number/x371413321323331	throughput	[164979064.286op/s; 165204337.752op/s] or [-0.068%; +0.068%]	None	None	None
credit_card/is_card_number_no_luhn/	execution_time	[3.913µs; 3.913µs] or [-0.008%; +0.008%]	None	None	None
credit_card/is_card_number_no_luhn/	throughput	[255540893.364op/s; 255579953.775op/s] or [-0.008%; +0.008%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	[66.301µs; 66.360µs] or [-0.044%; +0.044%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	[15069527.004op/s; 15082871.271op/s] or [-0.044%; +0.044%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	[59.657µs; 59.677µs] or [-0.016%; +0.016%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	[16756932.888op/s; 16762373.948op/s] or [-0.016%; +0.016%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	execution_time	[3.913µs; 3.914µs] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	throughput	[255500294.881op/s; 255553137.320op/s] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	execution_time	[56.393µs; 56.426µs] or [-0.029%; +0.029%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	throughput	[17722374.066op/s; 17732652.727op/s] or [-0.029%; +0.029%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	[51.824µs; 51.833µs] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	[19292587.384op/s; 19296218.751op/s] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	[6.054µs; 6.061µs] or [-0.058%; +0.058%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	throughput	[164986308.088op/s; 165177695.266op/s] or [-0.058%; +0.058%]	None	None	None

Group 8

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_trace/test_trace	execution_time	244.468ns	255.016ns ± 12.963ns	249.003ns ± 2.299ns	256.046ns	282.057ns	300.107ns	301.790ns	21.20%	2.073	3.759	5.07%	0.917ns	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_trace/test_trace	execution_time	[253.220ns; 256.813ns] or [-0.704%; +0.704%]	None	None	None

Group 9

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
ip_address/quantize_peer_ip_address_benchmark	execution_time	4.941µs	5.026µs ± 0.044µs	5.015µs ± 0.018µs	5.035µs	5.105µs	5.108µs	5.113µs	1.95%	0.734	-0.636	0.88%	0.003µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark	execution_time	[5.020µs; 5.033µs] or [-0.123%; +0.123%]	None	None	None

Group 10

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching deserializing traces from msgpack to their internal representation	execution_time	73.390ms	73.894ms ± 0.281ms	73.950ms ± 0.164ms	74.051ms	74.289ms	74.838ms	75.023ms	1.45%	0.703	1.582	0.38%	0.020ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching deserializing traces from msgpack to their internal representation	execution_time	[73.855ms; 73.933ms] or [-0.053%; +0.053%]	None	None	None

Group 11

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
tags/replace_trace_tags	execution_time	2.357µs	2.409µs ± 0.027µs	2.404µs ± 0.005µs	2.408µs	2.490µs	2.497µs	2.502µs	4.11%	1.894	4.130	1.12%	0.002µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
tags/replace_trace_tags	execution_time	[2.406µs; 2.413µs] or [-0.156%; +0.156%]	None	None	None

Group 12

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	208.626µs	209.037µs ± 0.150µs	209.023µs ± 0.087µs	209.140µs	209.283µs	209.432µs	209.444µs	0.20%	0.035	0.529	0.07%	0.011µs	1	200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	4774549.107op/s	4783834.252op/s ± 3442.295op/s	4784164.416op/s ± 1994.015op/s	4785758.540op/s	4789275.253op/s	4792709.905op/s	4793270.887op/s	0.19%	-0.030	0.530	0.07%	243.407op/s	1	200
normalization/normalize_name/normalize_name/bad-name	execution_time	18.619µs	18.681µs ± 0.091µs	18.667µs ± 0.017µs	18.694µs	18.724µs	18.776µs	19.846µs	6.31%	10.731	131.253	0.49%	0.006µs	1	200
normalization/normalize_name/normalize_name/bad-name	throughput	50389149.423op/s	53532695.934op/s ± 249198.845op/s	53569547.815op/s ± 50148.394op/s	53609638.013op/s	53666724.515op/s	53699963.928op/s	53708161.683op/s	0.26%	-10.467	126.398	0.46%	17621.019op/s	1	200
normalization/normalize_name/normalize_name/good	execution_time	10.926µs	10.998µs ± 0.048µs	10.993µs ± 0.041µs	11.034µs	11.081µs	11.101µs	11.138µs	1.32%	0.397	-0.728	0.43%	0.003µs	1	200
normalization/normalize_name/normalize_name/good	throughput	89780803.621op/s	90931069.865op/s ± 392402.529op/s	90967520.533op/s ± 334511.713op/s	91260779.961op/s	91443736.223op/s	91470476.576op/s	91524738.785op/s	0.61%	-0.383	-0.753	0.43%	27747.049op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	[209.017µs; 209.058µs] or [-0.010%; +0.010%]	None	None	None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	[4783357.183op/s; 4784311.321op/s] or [-0.010%; +0.010%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	execution_time	[18.668µs; 18.693µs] or [-0.068%; +0.068%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	throughput	[53498159.371op/s; 53567232.497op/s] or [-0.065%; +0.065%]	None	None	None
normalization/normalize_name/normalize_name/good	execution_time	[10.991µs; 11.004µs] or [-0.060%; +0.060%]	None	None	None
normalization/normalize_name/normalize_name/good	throughput	[90876686.649op/s; 90985453.082op/s] or [-0.060%; +0.060%]	None	None	None

Group 13

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`edfc921`	1744804265	paullgdc/data-pipeline/split_trace_collect

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	67.081µs	67.277µs ± 0.226µs	67.244µs ± 0.049µs	67.301µs	67.447µs	67.702µs	69.900µs	3.95%	8.653	92.283	0.34%	0.016µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[67.245µs; 67.308µs] or [-0.047%; +0.047%]	None	None	None

Baseline

Omitted due to size.

This refactor is needed because the shared logic in collect_trace_chunks and TracerPayloadParams. The way these structs are created makes replacing ByteString with the slice harder due to shared lifetime. Furthermore, the enums encodes two different codepaths, the spanBytes and span pb which never interact with each other. So having function that handle both span bytes and pb spans is pure complexity overhead. This refactor also has the advnatage if removing a bunch of panics and lines of code that were here because of the "fake" pb spans and trace exporter spans overlap

codecov-commenter · 2025-04-15T09:46:34Z

Codecov Report

Attention: Patch coverage is 92.95302% with 21 lines in your changes missing coverage. Please review.

Project coverage is 71.52%. Comparing base (daf50ad) to head (edfc921).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1021      +/-   ##
==========================================
- Coverage   71.52%   71.52%   -0.01%     
==========================================
  Files         339      339              
  Lines       50751    50628     -123     
==========================================
- Hits        36302    36214      -88     
+ Misses      14449    14414      -35

Components	Coverage Δ
crashtracker	`42.82% <ø> (-0.03%)`	⬇️
crashtracker-ffi	`6.30% <ø> (ø)`
datadog-alloc	`98.73% <ø> (ø)`
data-pipeline	`91.00% <88.46%> (+0.06%)`	⬆️
data-pipeline-ffi	`90.35% <ø> (ø)`
ddcommon	`78.57% <ø> (ø)`
ddcommon-ffi	`66.37% <ø> (ø)`
ddtelemetry	`60.29% <ø> (ø)`
ddtelemetry-ffi	`21.43% <ø> (ø)`
dogstatsd-client	`82.57% <ø> (ø)`
ipc	`82.41% <ø> (ø)`
profiling	`77.49% <ø> (ø)`
profiling-ffi	`62.12% <ø> (ø)`
serverless	`0.00% <ø> (ø)`
sidecar	`41.18% <0.00%> (+0.03%)`	⬆️
sidecar-ffi	`2.05% <ø> (ø)`
spawn-worker	`54.37% <ø> (ø)`
tinybytes	`89.86% <ø> (ø)`
trace-mini-agent	`73.80% <100.00%> (-0.03%)`	⬇️
trace-normalization	`98.24% <ø> (ø)`
trace-obfuscation	`96.00% <ø> (ø)`
trace-protobuf	`78.50% <ø> (ø)`
trace-utils	`93.13% <96.18%> (+0.30%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

r1viollet · 2025-04-15T10:00:47Z

Artifact Size Benchmark Report

aarch64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so	7.78 MB	7.77 MB	--.14% (-11.57 KB) 💪
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.so.debug	24.21 MB	24.06 MB	--.61% (-151.58 KB) 💪
/aarch64-alpine-linux-musl/lib/libdatadog_profiling.a	78.03 MB	77.61 MB	--.53% (-429.60 KB) 💪

aarch64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so.debug	22.82 MB	22.67 MB	--.64% (-150.05 KB) 💪
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.a	72.41 MB	71.99 MB	--.57% (-429.12 KB) 💪
/aarch64-unknown-linux-gnu/lib/libdatadog_profiling.so	7.72 MB	7.70 MB	--.15% (-12.14 KB) 💪

libdatadog-x64-windows

Artifact	Baseline	Commit	Change
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.dll	17.01 MB	16.89 MB	--.71% (-124.00 KB) 💪
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.lib	61.83 KB	61.83 KB	0% (0 B) 👌
/libdatadog-x64-windows/debug/dynamic/datadog_profiling_ffi.pdb	114.25 MB	113.76 MB	--.42% (-496.00 KB) 💪
/libdatadog-x64-windows/debug/static/datadog_profiling_ffi.lib	633.71 MB	633.28 MB	--.06% (-437.70 KB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.dll	5.06 MB	5.02 MB	--.74% (-38.50 KB) 💪
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.lib	61.83 KB	61.83 KB	0% (0 B) 👌
/libdatadog-x64-windows/release/dynamic/datadog_profiling_ffi.pdb	16.15 MB	16.03 MB	--.72% (-120.00 KB) 💪
/libdatadog-x64-windows/release/static/datadog_profiling_ffi.lib	26.84 MB	26.64 MB	--.73% (-202.58 KB) 💪

libdatadog-x86-windows

Artifact	Baseline	Commit	Change
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.dll	14.40 MB	14.29 MB	--.73% (-109.00 KB) 💪
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.lib	62.78 KB	62.78 KB	0% (0 B) 👌
/libdatadog-x86-windows/debug/dynamic/datadog_profiling_ffi.pdb	116.21 MB	115.72 MB	--.42% (-504.00 KB) 💪
/libdatadog-x86-windows/debug/static/datadog_profiling_ffi.lib	625.80 MB	625.38 MB	--.06% (-429.45 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.dll	3.83 MB	3.80 MB	--.70% (-27.50 KB) 💪
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.lib	62.78 KB	62.78 KB	0% (0 B) 👌
/libdatadog-x86-windows/release/dynamic/datadog_profiling_ffi.pdb	16.78 MB	16.66 MB	--.74% (-128.00 KB) 💪
/libdatadog-x86-windows/release/static/datadog_profiling_ffi.lib	24.74 MB	24.56 MB	--.74% (-188.35 KB) 💪

x86_64-alpine-linux-musl

Artifact	Baseline	Commit	Change
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.a	67.28 MB	66.92 MB	--.53% (-371.86 KB) 💪
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so	8.27 MB	8.22 MB	--.50% (-43.17 KB) 💪
/x86_64-alpine-linux-musl/lib/libdatadog_profiling.so.debug	23.35 MB	23.22 MB	--.54% (-130.46 KB) 💪

x86_64-unknown-linux-gnu

Artifact	Baseline	Commit	Change
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.a	68.23 MB	67.86 MB	--.54% (-380.20 KB) 💪
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so	8.14 MB	8.11 MB	--.37% (-31.49 KB) 💪
/x86_64-unknown-linux-gnu/lib/libdatadog_profiling.so.debug	20.98 MB	20.85 MB	--.61% (-132.32 KB) 💪

VianneyRuhlmann

LGTM

# What does this PR do? This refactor splits the logic in `collect_trace_chunks` between the trace exporter spans (v04 and v05) and the mini agent spans (pb::Spans). it completely removes usage of the `TraceCollection` struct from data-pipeline, and instead introduces the `TraceChunks` enum to differentiate between v04 and v05. Currently the way the code is structured makes replacing ByteString with the slice harder due to shared lifetime. Furthermore, the enums encodes two different codepaths, the spanBytes and span pb which never interact with each other. So having function that handle both span bytes and pb spans is pure complexity overhead. This refactor also removes a bunch of panics and lines of code that were here because to handle the "fake" pb spans and trace exporter spans overlap, which is practice never happens. Lastly, this remove the TracerParams struct. Every occurence of if was creating it, and invoking `TryInto<TracerCollection>` just after on it. So replacing it by a simple function is a lot less complex for the same feature set. # Motivation Prepare for using `SpanSlice<'a>` instead of `SpanBytes` in the trace exporter.

github-actions bot added mini-agent sidecar data-pipeline labels Apr 14, 2025

paullegranddc force-pushed the paullgdc/data-pipeline/split_trace_collect branch from 0c8f800 to f6aae6b Compare April 15, 2025 09:34

paullegranddc marked this pull request as ready for review April 15, 2025 13:33

paullegranddc requested review from a team as code owners April 15, 2025 13:33

VianneyRuhlmann approved these changes Apr 15, 2025

View reviewed changes

paullegranddc added 2 commits April 15, 2025 15:36

Remove v0.7 trace encoding

8f9c6d9

Make TraceChunk generic on the span text for easier migration

5db3bf2

paullegranddc force-pushed the paullgdc/data-pipeline/split_trace_collect branch from fe35ab6 to 5db3bf2 Compare April 15, 2025 14:50

Merge branch 'main' into paullgdc/data-pipeline/split_trace_collect

edfc921

paullegranddc enabled auto-merge (squash) April 16, 2025 11:58

paullegranddc disabled auto-merge April 16, 2025 12:33

paullegranddc merged commit 3dab0be into main Apr 16, 2025
35 checks passed

paullegranddc deleted the paullgdc/data-pipeline/split_trace_collect branch April 16, 2025 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: split collect_chunks into two methods #1021

refactor: split collect_chunks into two methods #1021

Uh oh!

paullegranddc commented Apr 14, 2025 •

edited

Loading

Uh oh!

pr-commenter bot commented Apr 14, 2025 •

edited

Loading

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

Group 13

Uh oh!

codecov-commenter commented Apr 15, 2025 •

edited

Loading

Uh oh!

r1viollet commented Apr 15, 2025 •

edited

Loading

Uh oh!

VianneyRuhlmann left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

refactor: split collect_chunks into two methods #1021

refactor: split collect_chunks into two methods #1021

Uh oh!

Conversation

paullegranddc commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Motivation

Additional Notes

How to test the change?

Uh oh!

pr-commenter bot commented Apr 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Comparison

Candidate

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

Group 13

Baseline

Uh oh!

codecov-commenter commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

r1viollet commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Artifact Size Benchmark Report

Uh oh!

VianneyRuhlmann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

paullegranddc commented Apr 14, 2025 •

edited

Loading

pr-commenter bot commented Apr 14, 2025 •

edited

Loading

codecov-commenter commented Apr 15, 2025 •

edited

Loading

r1viollet commented Apr 15, 2025 •

edited

Loading