Skip to content

Tracing: MsgPack encoder serializes arrays incorrectly #12224

@ArniDagur

Description

@ArniDagur

The symptom

FoundationDB sends packets like this (base64 encoded)

nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx

However, this is not valid MsgPack. These are the outputs of a Python parser and Rust parser

Python parser output
>>> import msgpack
>>> import base64
>>> msgpack.loads(base64.b64decode('nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I\
1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx'))
Traceback (most recent call last):
  File "<python-input-2>", line 1, in <module>
    msgpack.loads(base64.b64decode('nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx'))
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
Rust parser output
% echo "nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx" | base64 -d > packet.msgpack
% cargo run --example inspect packet.msgpack
   Compiling rmp v0.8.14 (/Users/arni/src/msgpack-rust/rmp)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.46s
     Running `target/debug/examples/inspect packet.msgpack`
Array0[
  U64(15935022132645131233),
  U64(3451977725406077931),
  U64(8297924566805818237),
  U64(0),
  Str0("GP:getLiveCommittedVersion"),
  F64(1751854050.2099946),
  F64(1751854050.2108421),
  U8(2),
  U8(1),
  Array0[ U64(5292175762882310400)],
  U64(10345094999515511970),
  U64(7528996143972006679),
]

This website is also useful.

The cause

I have parsed the message by hand. The bytes are hex encoded:

9c                 // array of length 0x9C ^ 0x90 = 12
cf8a214955fc706127 // u64 (span ID)
cfaeb1522bcb868d3a // u64 (trace ID first half)
cfe84abfad4bb47c08 // u64 (trace ID second half)
cf0000000000000000 // parent span ID
ba                 // string of length 0xBA ^ 0xA0 = 26
47503a6765744c697665436f6d6d697474656456657273696f6e // "GP:getLiveCommittedVersion"
cb41da1b0d19bdffa5 // f64 (start time)
cb41da1b0d19be0841 // f64 (end time)
cc02               // u8 (kind)
cc01               // u8 (status)

91                 // array of length 0x91 ^ 0x90 = 1 
^-- INCORRECT!
    Should be 93, since each link consists of 3 msgpack elements

cf7db1cf5c26274cbd // u64 (link trace ID first half)
cff8b81b74edcfcc62 // u64 (link trace ID second half)
cf5fd2c092980042c8 // u64 (link span ID)

90             // array of length 0 (events)
81             // 81 = map of size 1
a7             // string of length 0xA7 ^ 0xA0 = 7
61646472657373 // "address"
   
b3             // string of length 0xB3 ^ 0xA0 = 19
32322e3137312e3233352e3230393a34353031 // "22.171.235.209:4501"

As can be seen, the array length is wrong. It should be 3, since each trace link consists of 3 elements, but it's encoded as 1. This can be confirmed by looking at the code:

int size = vec.size();
if (size <= 15) {
buf.write_byte(static_cast<uint8_t>(size) | 0b10010000);
} else if (size <= 65535) {
buf.write_byte(0xdc);
buf.write_byte(reinterpret_cast<const uint8_t*>(&size)[1]);
buf.write_byte(reinterpret_cast<const uint8_t*>(&size)[0]);
} else {
TraceEvent(SevWarn, "TracingSpanSerializeVector").detail("Failed to MessagePack encode large vector", size);
ASSERT_WE_THINK(false);
}
for (const auto& link : vec) {
serialize_value(link.traceID.first(), buf, 0xcf); // trace id
serialize_value(link.traceID.second(), buf, 0xcf); // trace id
serialize_value(link.spanID, buf, 0xcf); // spanid
}

On line 166, we encode vec.size() which is 1, but then in the for loop on line 178 we produce 3 msgpack elements for each element of vec.

Discussion

Fixing this would be a backwards incompatible change. The trace receiver from the original announcement knows how to parse these corrupt MsgPack packets

	linksLen, err := dec.DecodeArrayLen()
	if err != nil {
		return err
	}

	var links []SpanContext
	for i := 0; i < linksLen; i++ {
		linkFirst, err := dec.DecodeUint64()
		if err != nil {
			return err
		}
		linkSecond, err := dec.DecodeUint64()
		if err != nil {
			return err
		}

		linkTraceId := getTraceId(linkFirst, linkSecond)
		spanId, err := dec.DecodeUint64()
		if err != nil {
			return err
		}
		links = append(links, SpanContext{TraceID: linkTraceId, SpanID: spanId})
	}

If everyone is using that implementation, that may be a reason why no-one has noticed this before. We should consider fixing this in the next major version of FoundationDB.

Is it known whether big users of FoundationDB such as Apple and Snowflake use FoundationDB tracing in production, and whether they use the collector from the original announcement?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions