-
Couldn't load subscription status.
- Fork 74
optimization: IPC: pass FieldNodes around by value instead of reference #543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
By the way, I've got another half dozen optimizations coming, at least. Most of them are small, non-behavioral changes like this. If it would be easier, I can send a single larger PR? |
|
A single larger PR (within reason) would be fine. Ideally, could you also include any relevant |
Unfortunately most of this has been measured against a larger binary - a database - where the wins are noticed via reductions in profile times. I can include screenshots of the flame graph portions that are relevant to Arrow, if that helps? Unfortunately, I'm likely not authorized to share the profiles themselves, as they also contain proprietary symbols. I can write up fresh benchmarks? |
|
Also, for optimizations like this one, short-lived benchmarks don't always catch the improvement - the actual change in runtime per invocation here is pretty trivial, the real win is that it reduces the number of live objects on the heap, which makes every GC pass a little bit faster, as well as slightly reducing their frequency... |
|
I'd be surprised if running |
|
Hadn't realized there was an existing benchmark suite, will do! |
|
Before: After: Writer code should be unaffected, I think it's just inter-run variance (possibly running on a different CCD?) |
|
CI failure looks like flakiness, since it's passing on the Ubunth image? |
That test is passing locally. |
|
With that last commit as well: |
This avoids an unnecessary heap allocation, reducing GC pressure and increasing latency and throughput of anything that processes messages.
This is a drive-by patch; the test was failing for the other optimization, and I noticed this when I went to take a look and figured I'd just send the patch.
The Table() method is unused, so it's fine to embed the table as is.
Drops another allocation per ipc/Message, improves throughput further.
This is a bit bloated, but I wasn't sure if the API for NewMessageReader and NewReaderFromMessageReader could be changed to take in the config struct directly. It would be a lot cleaner to expose both APIs with Config directly... This drops another allocation from every call to NewReader, which - if a new reader must be created for every API call - can be noticeable...
In branch 1, shift is defined as next-addr. We then slice buf with shift as the offset, and with size+shift as length and capacity. In branch 2, addr == next, and thus shift == 0. size+shift == size. The slice statements are thus equivalent; there's no reason to be branching here.
Avoids an allocation per Message call.
Before: cannotInlineFunction: complexity 148 exceeds budget After: canInlineFunction (cost: 68)
Removes four allocations from every NewReader call.
the meta object in messageReader is a wrapper around flatbuf.Message, which is static. There's no allocator usage. As such, we can reuse the memory.Buffer heap object to at least avoid re-allocating it on each call, since it never really changes.
Ideally, we'd reuse the underlying allocation instead of freeing it and reallocating each call, but that seems to be exposing a bug in the buffering logic. Still, this is a few more allocations gone...
Avoid branching in Message()
|
....hmmm, I'd assumed since the Message is invalidated on each call to Next() that that would mean that nothing would try holding on to the underlying buffers. It appears that may be false, from some of the failures? |
Avoids refetching Message flatbuf
2f05df2 to
8a0a500
Compare
.... So I only just noticed this message. Sigh. I'll have to send a patch to the generation even further upstream, I'm guessing? 😓 |
yup. Did you re-generate the flatbuffer code for this? |
I tested it with a patch to flatc - the problem is that some of these changes work by changing the interfaces that are generated, which is a breaking change. I'm not sure they'll want those changes, and they don't seem to look at patches much anyways :/ diff --git a/src/idl_gen_go.cpp b/src/idl_gen_go.cpp
index f4a03cec..0c2b14fc 100644
--- a/src/idl_gen_go.cpp
+++ b/src/idl_gen_go.cpp
@@ -186,7 +186,7 @@ class GoGenerator : public BaseGenerator {
// Most field accessors need to retrieve and test the field offset first,
// this is the prefix code for that.
std::string OffsetPrefix(const FieldDef& field) {
- return "{\n\to := flatbuffers.UOffsetT(rcv._tab.Offset(" +
+ return "{\n\to := flatbuffers.UOffsetT(rcv.Offset(" +
NumToString(field.value.offset) + "))\n\tif o != 0 {\n";
}
@@ -198,7 +198,7 @@ class GoGenerator : public BaseGenerator {
// _ is reserved in flatbuffers field names, so no chance of name
// conflict:
- code += "_tab ";
+ code += "";
code += struct_def.fixed ? "flatbuffers.Struct" : "flatbuffers.Table";
code += "\n}\n\n";
}
@@ -323,9 +323,9 @@ class GoGenerator : public BaseGenerator {
for (int i = 0; i < 2; i++) {
code += "func Get" + size_prefix[i] + "RootAs" + struct_type;
- code += "(buf []byte, offset flatbuffers.UOffsetT) ";
- code += "*" + struct_type + "";
- code += " {\n";
+ code += "(buf []byte, offset flatbuffers.UOffsetT) (x ";
+ code += struct_type + "";
+ code += ") {\n";
if (i == 0) {
code += "\tn := flatbuffers.GetUOffsetT(buf[offset:])\n";
} else {
@@ -333,11 +333,10 @@ class GoGenerator : public BaseGenerator {
"\tn := "
"flatbuffers.GetUOffsetT(buf[offset+flatbuffers.SizeUint32:])\n";
}
- code += "\tx := &" + struct_type + "{}\n";
if (i == 0) {
- code += "\tx.Init(buf, n+offset)\n";
+ code += "\tx.Table = flatbuffers.Table{Bytes: buf, Pos: n+offset}\n";
} else {
- code += "\tx.Init(buf, n+offset+flatbuffers.SizeUint32)\n";
+ code += "\tx.Table = flatbuffers.Table{Bytes: buf, Pos: n+offset+flatbuffers.SizeUint32}\n";
}
code += "\treturn x\n";
code += "}\n\n";
@@ -371,24 +370,8 @@ class GoGenerator : public BaseGenerator {
GenReceiver(struct_def, code_ptr);
code += " Init(buf []byte, i flatbuffers.UOffsetT) ";
code += "{\n";
- code += "\trcv._tab.Bytes = buf\n";
- code += "\trcv._tab.Pos = i\n";
- code += "}\n\n";
- }
-
- // Implement the table accessor
- void GenTableAccessor(const StructDef& struct_def, std::string* code_ptr) {
- std::string& code = *code_ptr;
-
- GenReceiver(struct_def, code_ptr);
- code += " Table() flatbuffers.Table ";
- code += "{\n";
-
- if (struct_def.fixed) {
- code += "\treturn rcv._tab.Table\n";
- } else {
- code += "\treturn rcv._tab\n";
- }
+ code += "\trcv.Bytes = buf\n";
+ code += "\trcv.Pos = i\n";
code += "}\n\n";
}
@@ -400,7 +383,7 @@ class GoGenerator : public BaseGenerator {
GenReceiver(struct_def, code_ptr);
code += " " + namer_.Function(field) + "Length(";
code += ") int " + OffsetPrefix(field);
- code += "\t\treturn rcv._tab.VectorLen(o)\n\t}\n";
+ code += "\t\treturn rcv.VectorLen(o)\n\t}\n";
code += "\treturn 0\n}\n\n";
}
@@ -412,7 +395,7 @@ class GoGenerator : public BaseGenerator {
GenReceiver(struct_def, code_ptr);
code += " " + namer_.Function(field) + "Bytes(";
code += ") []byte " + OffsetPrefix(field);
- code += "\t\treturn rcv._tab.ByteVector(o + rcv._tab.Pos)\n\t}\n";
+ code += "\t\treturn rcv.ByteVector(o + rcv.Pos)\n\t}\n";
code += "\treturn nil\n}\n\n";
}
@@ -426,7 +409,7 @@ class GoGenerator : public BaseGenerator {
code += "() " + TypeName(field) + " {\n";
code += "\treturn " +
CastToEnum(field.value.type,
- getter + "(rcv._tab.Pos + flatbuffers.UOffsetT(" +
+ getter + "(rcv.Pos + flatbuffers.UOffsetT(" +
NumToString(field.value.offset) + "))");
code += "\n}\n";
}
@@ -445,7 +428,7 @@ class GoGenerator : public BaseGenerator {
} else {
code += "\t\treturn ";
}
- code += CastToEnum(field.value.type, getter + "(o + rcv._tab.Pos)");
+ code += CastToEnum(field.value.type, getter + "(o + rcv.Pos)");
if (field.IsScalarOptional()) {
code += "\n\t\treturn &v";
}
@@ -467,7 +450,7 @@ class GoGenerator : public BaseGenerator {
code += "\tif obj == nil {\n";
code += "\t\tobj = new(" + TypeName(field) + ")\n";
code += "\t}\n";
- code += "\tobj.Init(rcv._tab.Bytes, rcv._tab.Pos+";
+ code += "\tobj.Init(rcv.Bytes, rcv.Pos+";
code += NumToString(field.value.offset) + ")";
code += "\n\treturn obj\n";
code += "}\n";
@@ -480,19 +463,14 @@ class GoGenerator : public BaseGenerator {
std::string& code = *code_ptr;
GenReceiver(struct_def, code_ptr);
code += " " + namer_.Function(field);
- code += "(obj *";
- code += TypeName(field);
- code += ") *" + TypeName(field) + " " + OffsetPrefix(field);
+ code += "() (obj " + TypeName(field) + ", ok bool) " + OffsetPrefix(field);
if (field.value.type.struct_def->fixed) {
- code += "\t\tx := o + rcv._tab.Pos\n";
+ code += "\t\tx := o + rcv.Pos\n";
} else {
- code += "\t\tx := rcv._tab.Indirect(o + rcv._tab.Pos)\n";
+ code += "\t\tx := rcv.Indirect(o + rcv.Pos)\n";
}
- code += "\t\tif obj == nil {\n";
- code += "\t\t\tobj = new(" + TypeName(field) + ")\n";
- code += "\t\t}\n";
- code += "\t\tobj.Init(rcv._tab.Bytes, x)\n";
- code += "\t\treturn obj\n\t}\n\treturn nil\n";
+ code += "\t\tobj.Init(rcv.Bytes, x)\n\t\tok = true\n";
+ code += "\t}\n\treturn\n";
code += "}\n\n";
}
@@ -504,7 +482,7 @@ class GoGenerator : public BaseGenerator {
code += " " + namer_.Function(field);
code += "() " + TypeName(field) + " ";
code += OffsetPrefix(field) + "\t\treturn " + GenGetter(field.value.type);
- code += "(o + rcv._tab.Pos)\n\t}\n\treturn nil\n";
+ code += "(o + rcv.Pos)\n\t}\n\treturn nil\n";
code += "}\n\n";
}
@@ -532,13 +510,13 @@ class GoGenerator : public BaseGenerator {
code += " " + namer_.Function(field);
code += "(obj *" + TypeName(field);
code += ", j int) bool " + OffsetPrefix(field);
- code += "\t\tx := rcv._tab.Vector(o)\n";
+ code += "\t\tx := rcv.Vector(o)\n";
code += "\t\tx += flatbuffers.UOffsetT(j) * ";
code += NumToString(InlineSize(vectortype)) + "\n";
if (!(vectortype.struct_def->fixed)) {
- code += "\t\tx = rcv._tab.Indirect(x)\n";
+ code += "\t\tx = rcv.Indirect(x)\n";
}
- code += "\t\tobj.Init(rcv._tab.Bytes, x)\n";
+ code += "\t\tobj.Init(rcv.Bytes, x)\n";
code += "\t\treturn true\n\t}\n";
code += "\treturn false\n";
code += "}\n\n";
@@ -566,9 +544,9 @@ class GoGenerator : public BaseGenerator {
code += "(obj *" + TypeName(field);
code += ", key " + NativeType(key_field.value.type) + ") bool " +
OffsetPrefix(field);
- code += "\t\tx := rcv._tab.Vector(o)\n";
+ code += "\t\tx := rcv.Vector(o)\n";
code += "\t\treturn ";
- code += "obj.LookupByKey(key, x, rcv._tab.Bytes)\n";
+ code += "obj.LookupByKey(key, x, rcv.Bytes)\n";
code += "\t}\n";
code += "\treturn false\n";
code += "}\n\n";
@@ -585,7 +563,7 @@ class GoGenerator : public BaseGenerator {
code += " " + namer_.Function(field);
code += "(j int) " + TypeName(field) + " ";
code += OffsetPrefix(field);
- code += "\t\ta := rcv._tab.Vector(o)\n";
+ code += "\t\ta := rcv.Vector(o)\n";
code += "\t\treturn " +
CastToEnum(field.value.type,
GenGetter(field.value.type) +
@@ -806,12 +784,12 @@ class GoGenerator : public BaseGenerator {
const FieldDef& field, std::string* code_ptr) {
std::string& code = *code_ptr;
std::string setter =
- "rcv._tab.Mutate" + namer_.Method(GenTypeBasic(field.value.type));
+ "rcv.Mutate" + namer_.Method(GenTypeBasic(field.value.type));
GenReceiver(struct_def, code_ptr);
code += " Mutate" + namer_.Function(field);
code +=
"(n " + GenTypeGet(field.value.type) + ") bool {\n\treturn " + setter;
- code += "(rcv._tab.Pos+flatbuffers.UOffsetT(";
+ code += "(rcv.Pos+flatbuffers.UOffsetT(";
code += NumToString(field.value.offset) + "), ";
code += CastToBaseType(field.value.type, "n") + ")\n}\n\n";
}
@@ -820,7 +798,7 @@ class GoGenerator : public BaseGenerator {
void MutateScalarFieldOfTable(const StructDef& struct_def,
const FieldDef& field, std::string* code_ptr) {
std::string& code = *code_ptr;
- std::string setter = "rcv._tab.Mutate" +
+ std::string setter = "rcv.Mutate" +
namer_.Method(GenTypeBasic(field.value.type)) + "Slot";
GenReceiver(struct_def, code_ptr);
code += " Mutate" + namer_.Function(field);
@@ -837,12 +815,12 @@ class GoGenerator : public BaseGenerator {
std::string& code = *code_ptr;
auto vectortype = field.value.type.VectorType();
std::string setter =
- "rcv._tab.Mutate" + namer_.Method(GenTypeBasic(vectortype));
+ "rcv.Mutate" + namer_.Method(GenTypeBasic(vectortype));
GenReceiver(struct_def, code_ptr);
code += " Mutate" + namer_.Function(field);
code += "(j int, n " + TypeName(field) + ") bool ";
code += OffsetPrefix(field);
- code += "\t\ta := rcv._tab.Vector(o)\n";
+ code += "\t\ta := rcv.Vector(o)\n";
code += "\t\treturn " + setter + "(";
code += "a+flatbuffers.UOffsetT(j*";
code += NumToString(InlineSize(vectortype)) + "), ";
@@ -907,8 +885,6 @@ class GoGenerator : public BaseGenerator {
// Generate the Init method that sets the field in a pre-existing
// accessor object. This is to allow object reuse.
InitializeExisting(struct_def, code_ptr);
- // Generate _tab accessor
- GenTableAccessor(struct_def, code_ptr);
// Generate struct fields accessors
for (auto it = struct_def.fields.vec.begin();
@@ -1282,13 +1258,13 @@ class GoGenerator : public BaseGenerator {
code +=
"\tt." + field_field + " = rcv." + field_field + "(nil).UnPack()\n";
} else if (field.value.type.base_type == BASE_TYPE_UNION) {
- const std::string field_table = field_var + "Table";
- code += "\t" + field_table + " := flatbuffers.Table{}\n";
+ const std::string fielde = field_var + "Table";
+ code += "\t" + fielde + " := flatbuffers.Table{}\n";
code +=
- "\tif rcv." + namer_.Method(field) + "(&" + field_table + ") {\n";
+ "\tif rcv." + namer_.Method(field) + "(&" + fielde + ") {\n";
code += "\t\tt." + field_field + " = rcv." +
namer_.Method(field.name + UnionTypeFieldSuffix()) +
- "().UnPack(" + field_table + ")\n";
+ "().UnPack(" + fielde + ")\n";
code += "\t}\n";
} else {
FLATBUFFERS_ASSERT(0);
@@ -1399,13 +1375,13 @@ class GoGenerator : public BaseGenerator {
std::string GenGetter(const Type& type) {
switch (type.base_type) {
case BASE_TYPE_STRING:
- return "rcv._tab.ByteVector";
+ return "rcv.ByteVector";
case BASE_TYPE_UNION:
- return "rcv._tab.Union";
+ return "rcv.Union";
case BASE_TYPE_VECTOR:
return GenGetter(type.VectorType());
default:
- return "rcv._tab.Get" + namer_.Function(GenTypeBasic(type));
+ return "rcv.Get" + namer_.Function(GenTypeBasic(type));
}
}
Regenerated the files with that patch to flatc, and updated the usage slightly to conform, and that's another alloc per NewReader and another alloc per Message call gone... |
|
...ah. Regenerated files are missing the licenses. |
|
Before: Current: |
|
I also haven't been able to look into the arrow/flight failure, as that test has never passed for me locally anyways... |
|
I have more optimizations locally, but unfortunately most of them deliberately break functionality (e.g. arrowflight). Going to take a look at profiles of the benchmarks and try to improve this as much as I can without those :/ |
|
I think I'm going to split this into a few small PRs if that makes sense:
That okay? |
|
Splitting it up sounds good |
Rationale for this change
This avoids an unnecessary heap allocation, reducing GC pressure and increasing latency and throughput of anything that processes messages.
What changes are included in this PR?
fieldMetadata returns the
flatbuf.FieldNodethat it loads from metadata directly, instead of a pointer to it. The three callers that pass through the pointer also now return the FieldNode directly.Are these changes tested?
Yes.
Are there any user-facing changes?
No; the pointer was not exposed anyways.