-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
The symptom
FoundationDB sends packets like this (base64 encoded)
nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx
However, this is not valid MsgPack. These are the outputs of a Python parser and Rust parser
Python parser output
>>> import msgpack
>>> import base64
>>> msgpack.loads(base64.b64decode('nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I\
1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx'))
Traceback (most recent call last):
File "<python-input-2>", line 1, in <module>
msgpack.loads(base64.b64decode('nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx'))
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "msgpack/_unpacker.pyx", line 201, in msgpack._cmsgpack.unpackb
msgpack.exceptions.ExtraData: unpack(b) received extra data.
Rust parser output
% echo "nM/dJJIzyEzz4c8v5+Nzr2Pj689zKCZt4XKrfc8AAAAAAAAAALpHUDpnZXRMaXZlQ29tbWl0dGVkVmVyc2lvbstB2hrK+I1wjctB2hrK+I1+cMwCzAGRz0lxldEziDEAz4+RKRdlyryiz2h8XhpEvLsXkIGnYWRkcmVzc7MyMi4xNzEuMTA2LjEzMDo0NTAx" | base64 -d > packet.msgpack
% cargo run --example inspect packet.msgpack
Compiling rmp v0.8.14 (/Users/arni/src/msgpack-rust/rmp)
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.46s
Running `target/debug/examples/inspect packet.msgpack`
Array0[
U64(15935022132645131233),
U64(3451977725406077931),
U64(8297924566805818237),
U64(0),
Str0("GP:getLiveCommittedVersion"),
F64(1751854050.2099946),
F64(1751854050.2108421),
U8(2),
U8(1),
Array0[ U64(5292175762882310400)],
U64(10345094999515511970),
U64(7528996143972006679),
]
This website is also useful.
The cause
I have parsed the message by hand. The bytes are hex encoded:
9c // array of length 0x9C ^ 0x90 = 12
cf8a214955fc706127 // u64 (span ID)
cfaeb1522bcb868d3a // u64 (trace ID first half)
cfe84abfad4bb47c08 // u64 (trace ID second half)
cf0000000000000000 // parent span ID
ba // string of length 0xBA ^ 0xA0 = 26
47503a6765744c697665436f6d6d697474656456657273696f6e // "GP:getLiveCommittedVersion"
cb41da1b0d19bdffa5 // f64 (start time)
cb41da1b0d19be0841 // f64 (end time)
cc02 // u8 (kind)
cc01 // u8 (status)
91 // array of length 0x91 ^ 0x90 = 1
^-- INCORRECT!
Should be 93, since each link consists of 3 msgpack elements
cf7db1cf5c26274cbd // u64 (link trace ID first half)
cff8b81b74edcfcc62 // u64 (link trace ID second half)
cf5fd2c092980042c8 // u64 (link span ID)
90 // array of length 0 (events)
81 // 81 = map of size 1
a7 // string of length 0xA7 ^ 0xA0 = 7
61646472657373 // "address"
b3 // string of length 0xB3 ^ 0xA0 = 19
32322e3137312e3233352e3230393a34353031 // "22.171.235.209:4501"
As can be seen, the array length is wrong. It should be 3, since each trace link consists of 3 elements, but it's encoded as 1. This can be confirmed by looking at the code:
foundationdb/fdbclient/Tracing.actor.cpp
Lines 166 to 182 in 52db7f9
int size = vec.size(); | |
if (size <= 15) { | |
buf.write_byte(static_cast<uint8_t>(size) | 0b10010000); | |
} else if (size <= 65535) { | |
buf.write_byte(0xdc); | |
buf.write_byte(reinterpret_cast<const uint8_t*>(&size)[1]); | |
buf.write_byte(reinterpret_cast<const uint8_t*>(&size)[0]); | |
} else { | |
TraceEvent(SevWarn, "TracingSpanSerializeVector").detail("Failed to MessagePack encode large vector", size); | |
ASSERT_WE_THINK(false); | |
} | |
for (const auto& link : vec) { | |
serialize_value(link.traceID.first(), buf, 0xcf); // trace id | |
serialize_value(link.traceID.second(), buf, 0xcf); // trace id | |
serialize_value(link.spanID, buf, 0xcf); // spanid | |
} |
On line 166, we encode vec.size()
which is 1, but then in the for
loop on line 178 we produce 3 msgpack elements for each element of vec
.
Discussion
Fixing this would be a backwards incompatible change. The trace receiver from the original announcement knows how to parse these corrupt MsgPack packets
linksLen, err := dec.DecodeArrayLen()
if err != nil {
return err
}
var links []SpanContext
for i := 0; i < linksLen; i++ {
linkFirst, err := dec.DecodeUint64()
if err != nil {
return err
}
linkSecond, err := dec.DecodeUint64()
if err != nil {
return err
}
linkTraceId := getTraceId(linkFirst, linkSecond)
spanId, err := dec.DecodeUint64()
if err != nil {
return err
}
links = append(links, SpanContext{TraceID: linkTraceId, SpanID: spanId})
}
If everyone is using that implementation, that may be a reason why no-one has noticed this before. We should consider fixing this in the next major version of FoundationDB.
Is it known whether big users of FoundationDB such as Apple and Snowflake use FoundationDB tracing in production, and whether they use the collector from the original announcement?