-
Notifications
You must be signed in to change notification settings - Fork 14.4k
Add support in LLVM BitstreamWriter to automatically choose abbrevs. #147191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds a new API `BitstreamWriter::EmitRecordAutoAbbrev`, which chooses a valid abbreviation that can encode the provided record, if one exists, otherwise emits an unabbreviated record. It then uses this new functionality in Clang's ASTWriter, eliminating all error-prone manual specification of abbrevs. This PR was created as an alternative to the proposal to stop using abbrevs in Clang record emission, in https://discourse.llvm.org/t/rfc-c-modules-stop-using-abbrev-and-drop-the-maintainance. (Note, only after starting this did I discover that records encoded with a "Blob" abbrev vs those written as unabbreviated records are not interchangeable to the current BitstreamReader code. Given that, I'm wondering if I should remove blob support from EmitRecordAutoAbbrev, and switch the blob-encoding callers back to EmitRecordWithBlob.)
@llvm/pr-subscribers-clang-modules @llvm/pr-subscribers-clang Author: James Y Knight (jyknight) ChangesThis adds a new API It then uses this new functionality in Clang's ASTWriter, eliminating all error-prone manual specification of abbrevs. This PR was created as an alternative to the proposal to stop using abbrevs in Clang record emission, in https://discourse.llvm.org/t/rfc-c-modules-stop-using-abbrev-and-drop-the-maintainance. (Note, only after starting this did I discover that records encoded with a "Blob" abbrev are not interchangeable to the current BitstreamReader code with an unabbreviated record (or an abbrev with e.g. an Array). Given that, I'm wondering if I should remove blob support from EmitRecordAutoAbbrev, and switch the blob-encoding callers back to EmitRecordWithBlob.) Patch is 93.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147191.diff 6 Files Affected:
diff --git a/clang/include/clang/Serialization/ASTRecordWriter.h b/clang/include/clang/Serialization/ASTRecordWriter.h
index ee005ec287708..3639ad5828ce6 100644
--- a/clang/include/clang/Serialization/ASTRecordWriter.h
+++ b/clang/include/clang/Serialization/ASTRecordWriter.h
@@ -91,20 +91,19 @@ class ASTRecordWriter
/// Emit the record to the stream, followed by its substatements, and
/// return its offset.
- // FIXME: Allow record producers to suggest Abbrevs.
- uint64_t Emit(unsigned Code, unsigned Abbrev = 0) {
+ uint64_t Emit(unsigned Code) {
uint64_t Offset = Writer->Stream.GetCurrentBitNo();
PrepareToEmit(Offset);
- Writer->Stream.EmitRecord(Code, *Record, Abbrev);
+ Writer->Stream.EmitRecord(Code, *Record);
FlushStmts();
return Offset;
}
/// Emit the record to the stream, preceded by its substatements.
- uint64_t EmitStmt(unsigned Code, unsigned Abbrev = 0) {
+ uint64_t EmitStmt(unsigned Code) {
FlushSubStmts();
PrepareToEmit(Writer->Stream.GetCurrentBitNo());
- Writer->Stream.EmitRecord(Code, *Record, Abbrev);
+ Writer->Stream.EmitRecordAutoAbbrev(Code, *Record);
return Writer->Stream.GetCurrentBitNo();
}
diff --git a/clang/include/clang/Serialization/ASTWriter.h b/clang/include/clang/Serialization/ASTWriter.h
index c86019f01d9f3..5ee94514d85e2 100644
--- a/clang/include/clang/Serialization/ASTWriter.h
+++ b/clang/include/clang/Serialization/ASTWriter.h
@@ -638,44 +638,6 @@ class ASTWriter : public ASTDeserializationListener,
void WriteModuleFileExtension(Sema &SemaRef,
ModuleFileExtensionWriter &Writer);
- unsigned DeclParmVarAbbrev = 0;
- unsigned DeclContextLexicalAbbrev = 0;
- unsigned DeclContextVisibleLookupAbbrev = 0;
- unsigned DeclModuleLocalVisibleLookupAbbrev = 0;
- unsigned DeclTULocalLookupAbbrev = 0;
- unsigned UpdateVisibleAbbrev = 0;
- unsigned ModuleLocalUpdateVisibleAbbrev = 0;
- unsigned TULocalUpdateVisibleAbbrev = 0;
- unsigned DeclRecordAbbrev = 0;
- unsigned DeclTypedefAbbrev = 0;
- unsigned DeclVarAbbrev = 0;
- unsigned DeclFieldAbbrev = 0;
- unsigned DeclEnumAbbrev = 0;
- unsigned DeclObjCIvarAbbrev = 0;
- unsigned DeclCXXMethodAbbrev = 0;
- unsigned DeclSpecializationsAbbrev = 0;
- unsigned DeclPartialSpecializationsAbbrev = 0;
-
- unsigned DeclDependentNonTemplateCXXMethodAbbrev = 0;
- unsigned DeclTemplateCXXMethodAbbrev = 0;
- unsigned DeclMemberSpecializedCXXMethodAbbrev = 0;
- unsigned DeclTemplateSpecializedCXXMethodAbbrev = 0;
- unsigned DeclDependentSpecializationCXXMethodAbbrev = 0;
- unsigned DeclTemplateTypeParmAbbrev = 0;
- unsigned DeclUsingShadowAbbrev = 0;
-
- unsigned DeclRefExprAbbrev = 0;
- unsigned CharacterLiteralAbbrev = 0;
- unsigned IntegerLiteralAbbrev = 0;
- unsigned ExprImplicitCastAbbrev = 0;
- unsigned BinaryOperatorAbbrev = 0;
- unsigned CompoundAssignOperatorAbbrev = 0;
- unsigned CallExprAbbrev = 0;
- unsigned CXXOperatorCallExprAbbrev = 0;
- unsigned CXXMemberCallExprAbbrev = 0;
-
- unsigned CompoundStmtAbbrev = 0;
-
void WriteDeclAbbrevs();
void WriteDecl(ASTContext &Context, Decl *D);
@@ -844,53 +806,6 @@ class ASTWriter : public ASTDeserializationListener,
void ClearSwitchCaseIDs();
- unsigned getTypeExtQualAbbrev() const {
- return TypeExtQualAbbrev;
- }
-
- unsigned getDeclParmVarAbbrev() const { return DeclParmVarAbbrev; }
- unsigned getDeclRecordAbbrev() const { return DeclRecordAbbrev; }
- unsigned getDeclTypedefAbbrev() const { return DeclTypedefAbbrev; }
- unsigned getDeclVarAbbrev() const { return DeclVarAbbrev; }
- unsigned getDeclFieldAbbrev() const { return DeclFieldAbbrev; }
- unsigned getDeclEnumAbbrev() const { return DeclEnumAbbrev; }
- unsigned getDeclObjCIvarAbbrev() const { return DeclObjCIvarAbbrev; }
- unsigned getDeclCXXMethodAbbrev(FunctionDecl::TemplatedKind Kind) const {
- switch (Kind) {
- case FunctionDecl::TK_NonTemplate:
- return DeclCXXMethodAbbrev;
- case FunctionDecl::TK_FunctionTemplate:
- return DeclTemplateCXXMethodAbbrev;
- case FunctionDecl::TK_MemberSpecialization:
- return DeclMemberSpecializedCXXMethodAbbrev;
- case FunctionDecl::TK_FunctionTemplateSpecialization:
- return DeclTemplateSpecializedCXXMethodAbbrev;
- case FunctionDecl::TK_DependentNonTemplate:
- return DeclDependentNonTemplateCXXMethodAbbrev;
- case FunctionDecl::TK_DependentFunctionTemplateSpecialization:
- return DeclDependentSpecializationCXXMethodAbbrev;
- }
- llvm_unreachable("Unknwon Template Kind!");
- }
- unsigned getDeclTemplateTypeParmAbbrev() const {
- return DeclTemplateTypeParmAbbrev;
- }
- unsigned getDeclUsingShadowAbbrev() const { return DeclUsingShadowAbbrev; }
-
- unsigned getDeclRefExprAbbrev() const { return DeclRefExprAbbrev; }
- unsigned getCharacterLiteralAbbrev() const { return CharacterLiteralAbbrev; }
- unsigned getIntegerLiteralAbbrev() const { return IntegerLiteralAbbrev; }
- unsigned getExprImplicitCastAbbrev() const { return ExprImplicitCastAbbrev; }
- unsigned getBinaryOperatorAbbrev() const { return BinaryOperatorAbbrev; }
- unsigned getCompoundAssignOperatorAbbrev() const {
- return CompoundAssignOperatorAbbrev;
- }
- unsigned getCallExprAbbrev() const { return CallExprAbbrev; }
- unsigned getCXXOperatorCallExprAbbrev() { return CXXOperatorCallExprAbbrev; }
- unsigned getCXXMemberCallExprAbbrev() { return CXXMemberCallExprAbbrev; }
-
- unsigned getCompoundStmtAbbrev() const { return CompoundStmtAbbrev; }
-
bool hasChain() const { return Chain; }
ASTReader *getChain() const { return Chain; }
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 874b24b532b06..886e4140ceb52 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -305,27 +305,25 @@ GetAffectingModuleMaps(const Preprocessor &PP, Module *RootModule) {
}
class ASTTypeWriter {
- ASTWriter &Writer;
ASTWriter::RecordData Record;
ASTRecordWriter BasicWriter;
public:
ASTTypeWriter(ASTContext &Context, ASTWriter &Writer)
- : Writer(Writer), BasicWriter(Context, Writer, Record) {}
+ : BasicWriter(Context, Writer, Record) {}
uint64_t write(QualType T) {
if (T.hasLocalNonFastQualifiers()) {
Qualifiers Qs = T.getLocalQualifiers();
BasicWriter.writeQualType(T.getLocalUnqualifiedType());
BasicWriter.writeQualifiers(Qs);
- return BasicWriter.Emit(TYPE_EXT_QUAL, Writer.getTypeExtQualAbbrev());
+ return BasicWriter.Emit(TYPE_EXT_QUAL);
}
const Type *typePtr = T.getTypePtr();
serialization::AbstractTypeWriter<ASTRecordWriter> atw(BasicWriter);
atw.write(typePtr);
- return BasicWriter.Emit(getTypeCodeForTypeClass(typePtr->getTypeClass()),
- /*abbrev*/ 0);
+ return BasicWriter.Emit(getTypeCodeForTypeClass(typePtr->getTypeClass()));
}
};
@@ -711,7 +709,7 @@ void ASTWriter::WriteTypeAbbrevs() {
Abv->Add(BitCodeAbbrevOp(serialization::TYPE_EXT_QUAL));
Abv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // Type
Abv->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 3)); // Quals
- TypeExtQualAbbrev = Stream.EmitAbbrev(std::move(Abv));
+ Stream.EmitAbbrev(std::move(Abv));
}
//===----------------------------------------------------------------------===//
@@ -1336,10 +1334,9 @@ void ASTWriter::writeUnhashedControlBlock(Preprocessor &PP) {
auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(AST_BLOCK_HASH));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));
- unsigned ASTBlockHashAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
- Record.push_back(AST_BLOCK_HASH);
- Stream.EmitRecordWithBlob(ASTBlockHashAbbrev, Record, Blob);
+ Stream.EmitRecordAutoAbbrev(AST_BLOCK_HASH, Record, Blob);
ASTBlockHashOffset = Stream.GetCurrentBitNo() - Blob.size() * 8;
Record.clear();
}
@@ -1347,10 +1344,9 @@ void ASTWriter::writeUnhashedControlBlock(Preprocessor &PP) {
auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(SIGNATURE));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob));
- unsigned SignatureAbbrev = Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
- Record.push_back(SIGNATURE);
- Stream.EmitRecordWithBlob(SignatureAbbrev, Record, Blob);
+ Stream.EmitRecordAutoAbbrev(SIGNATURE, Record, Blob);
SignatureOffset = Stream.GetCurrentBitNo() - Blob.size() * 8;
Record.clear();
}
@@ -1414,10 +1410,10 @@ void ASTWriter::writeUnhashedControlBlock(Preprocessor &PP) {
Abbrev->Add(BitCodeAbbrevOp(HEADER_SEARCH_ENTRY_USAGE));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // Number of bits.
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Bit vector.
- unsigned HSUsageAbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
- RecordData::value_type Record[] = {HEADER_SEARCH_ENTRY_USAGE,
- HSEntryUsage.size()};
- Stream.EmitRecordWithBlob(HSUsageAbbrevCode, Record, bytes(HSEntryUsage));
+ Stream.EmitAbbrev(std::move(Abbrev));
+ RecordData::value_type Record[] = {HSEntryUsage.size()};
+ Stream.EmitRecordAutoAbbrev(HEADER_SEARCH_ENTRY_USAGE, Record,
+ bytes(HSEntryUsage));
}
// VFS usage.
@@ -1427,9 +1423,9 @@ void ASTWriter::writeUnhashedControlBlock(Preprocessor &PP) {
Abbrev->Add(BitCodeAbbrevOp(VFS_USAGE));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32)); // Number of bits.
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Bit vector.
- unsigned VFSUsageAbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
- RecordData::value_type Record[] = {VFS_USAGE, VFSUsage.size()};
- Stream.EmitRecordWithBlob(VFSUsageAbbrevCode, Record, bytes(VFSUsage));
+ Stream.EmitAbbrev(std::move(Abbrev));
+ RecordData::value_type Record[] = {VFSUsage.size()};
+ Stream.EmitRecordAutoAbbrev(VFS_USAGE, Record, bytes(VFSUsage));
}
// Leave the options block.
@@ -1460,21 +1456,16 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
MetadataAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Timestamps
MetadataAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Errors
MetadataAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // SVN branch/tag
- unsigned MetadataAbbrevCode = Stream.EmitAbbrev(std::move(MetadataAbbrev));
+ Stream.EmitAbbrev(std::move(MetadataAbbrev));
assert((!WritingModule || isysroot.empty()) &&
"writing module as a relocatable PCH?");
{
- RecordData::value_type Record[] = {METADATA,
- VERSION_MAJOR,
- VERSION_MINOR,
- CLANG_VERSION_MAJOR,
- CLANG_VERSION_MINOR,
- !isysroot.empty(),
- isWritingStdCXXNamedModules(),
- IncludeTimestamps,
- ASTHasCompilerErrors};
- Stream.EmitRecordWithBlob(MetadataAbbrevCode, Record,
- getClangFullRepositoryVersion());
+ RecordData::value_type Record[] = {
+ VERSION_MAJOR, VERSION_MINOR, CLANG_VERSION_MAJOR,
+ CLANG_VERSION_MINOR, !isysroot.empty(), isWritingStdCXXNamedModules(),
+ IncludeTimestamps, ASTHasCompilerErrors};
+ Stream.EmitRecordAutoAbbrev(METADATA, Record,
+ getClangFullRepositoryVersion());
}
if (WritingModule) {
@@ -1482,9 +1473,9 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(MODULE_NAME));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Name
- unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
- RecordData::value_type Record[] = {MODULE_NAME};
- Stream.EmitRecordWithBlob(AbbrevCode, Record, WritingModule->Name);
+ Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitRecordAutoAbbrev(MODULE_NAME, ArrayRef<uint32_t>(),
+ WritingModule->Name);
auto BaseDir = [&]() -> std::optional<SmallString<128>> {
if (PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd) {
@@ -1512,10 +1503,10 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
auto Abbrev = std::make_shared<BitCodeAbbrev>();
Abbrev->Add(BitCodeAbbrevOp(MODULE_DIRECTORY));
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Directory
- unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
- RecordData::value_type Record[] = {MODULE_DIRECTORY};
- Stream.EmitRecordWithBlob(AbbrevCode, Record, *BaseDir);
+ Stream.EmitRecordAutoAbbrev(MODULE_DIRECTORY, ArrayRef<uint32_t>(),
+ *BaseDir);
}
// Write out all other paths relative to the base directory if possible.
@@ -1567,7 +1558,7 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // File timestamp
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // File name len
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Strings
- unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
SmallString<128> Blob;
@@ -1579,7 +1570,6 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
Record.clear();
Blob.clear();
- Record.push_back(IMPORT);
Record.push_back((unsigned)M.Kind); // FIXME: Stable encoding
AddSourceLocation(M.ImportLoc, Record);
AddStringBlob(M.ModuleName, Record, Blob);
@@ -1602,7 +1592,7 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
AddPathBlob(M.FileName, Record, Blob);
}
- Stream.EmitRecordWithBlob(AbbrevCode, Record, Blob);
+ Stream.EmitRecordAutoAbbrev(IMPORT, Record, Blob);
}
}
@@ -1737,12 +1727,11 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
FileAbbrev->Add(BitCodeAbbrevOp(ORIGINAL_FILE));
FileAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // File ID
FileAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // File name
- unsigned FileAbbrevCode = Stream.EmitAbbrev(std::move(FileAbbrev));
+ Stream.EmitAbbrev(std::move(FileAbbrev));
Record.clear();
- Record.push_back(ORIGINAL_FILE);
AddFileID(SourceMgr.getMainFileID(), Record);
- EmitRecordWithPath(FileAbbrevCode, Record, MainFile->getName());
+ EmitRecordWithPath(ORIGINAL_FILE, Record, MainFile->getName());
}
Record.clear();
@@ -1822,14 +1811,14 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr) {
IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Module map
IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 16)); // Name as req. len
IFAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Name as req. + name
- unsigned IFAbbrevCode = Stream.EmitAbbrev(std::move(IFAbbrev));
+ Stream.EmitAbbrev(std::move(IFAbbrev));
// Create input file hash abbreviation.
auto IFHAbbrev = std::make_shared<BitCodeAbbrev>();
IFHAbbrev->Add(BitCodeAbbrevOp(INPUT_FILE_HASH));
IFHAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
IFHAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 32));
- unsigned IFHAbbrevCode = Stream.EmitAbbrev(std::move(IFHAbbrev));
+ Stream.EmitAbbrev(std::move(IFHAbbrev));
uint64_t InputFilesOffsetBase = Stream.GetCurrentBitNo();
@@ -1931,7 +1920,6 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr) {
Name.clear();
RecordData::value_type Record[] = {
- INPUT_FILE,
InputFileOffsets.size(),
(uint64_t)Entry.File.getSize(),
(uint64_t)getTimestampForOutput(Entry.File),
@@ -1941,15 +1929,15 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr) {
Entry.IsModuleMap,
NameAsRequested.size()};
- Stream.EmitRecordWithBlob(IFAbbrevCode, Record,
- (NameAsRequested + Name).str());
+ Stream.EmitRecordAutoAbbrev(INPUT_FILE, Record,
+ (NameAsRequested + Name).str());
}
// Emit content hash for this file.
{
- RecordData::value_type Record[] = {INPUT_FILE_HASH, Entry.ContentHash[0],
+ RecordData::value_type Record[] = {Entry.ContentHash[0],
Entry.ContentHash[1]};
- Stream.EmitRecordWithAbbrev(IFHAbbrevCode, Record);
+ Stream.EmitRecordAutoAbbrev(INPUT_FILE_HASH, Record);
}
}
@@ -1962,12 +1950,12 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr) {
OffsetsAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 6)); // # non-system
// input files
OffsetsAbbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Array
- unsigned OffsetsAbbrevCode = Stream.EmitAbbrev(std::move(OffsetsAbbrev));
+ Stream.EmitAbbrev(std::move(OffsetsAbbrev));
// Write input file offsets.
- RecordData::value_type Record[] = {INPUT_FILE_OFFSETS,
- InputFileOffsets.size(), UserFilesNum};
- Stream.EmitRecordWithBlob(OffsetsAbbrevCode, Record, bytes(InputFileOffsets));
+ RecordData::value_type Record[] = {InputFileOffsets.size(), UserFilesNum};
+ Stream.EmitRecordAutoAbbrev(INPUT_FILE_OFFSETS, Record,
+ bytes(InputFileOffsets));
}
//===----------------------------------------------------------------------===//
@@ -1976,7 +1964,7 @@ void ASTWriter::WriteInputFiles(SourceManager &SourceMgr) {
/// Create an abbreviation for the SLocEntry that refers to a
/// file.
-static unsigned CreateSLocFileAbbrev(llvm::BitstreamWriter &Stream) {
+static void CreateSLocFileAbbrev(llvm::BitstreamWriter &Stream) {
using namespace llvm;
auto Abbrev = std::make_shared<BitCodeAbbrev>();
@@ -1990,12 +1978,12 @@ static unsigned CreateSLocFileAbbrev(llvm::BitstreamWriter &Stream) {
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // NumCreatedFIDs
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 24)); // FirstDeclIndex
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // NumDecls
- return Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
}
/// Create an abbreviation for the SLocEntry that refers to a
/// buffer.
-static unsigned CreateSLocBufferAbbrev(llvm::BitstreamWriter &Stream) {
+static void CreateSLocBufferAbbrev(llvm::BitstreamWriter &Stream) {
using namespace llvm;
auto Abbrev = std::make_shared<BitCodeAbbrev>();
@@ -2005,13 +1993,13 @@ static unsigned CreateSLocBufferAbbrev(llvm::BitstreamWriter &Stream) {
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 3)); // Characteristic
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Fixed, 1)); // Line directives
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Buffer name blob
- return Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
}
/// Create an abbreviation for the SLocEntry that refers to a
/// buffer's blob.
-static unsigned CreateSLocBufferBlobAbbrev(llvm::BitstreamWriter &Stream,
- bool Compressed) {
+static void CreateSLocBufferBlobAbbrev(llvm::BitstreamWriter &Stream,
+ bool Compressed) {
using namespace llvm;
auto Abbrev = std::make_shared<BitCodeAbbrev>();
@@ -2020,12 +2008,12 @@ static unsigned CreateSLocBufferBlobAbbrev(llvm::BitstreamWriter &Stream,
if (Compressed)
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::VBR, 8)); // Uncompressed size
Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Blob
- return Stream.EmitAbbrev(std::move(Abbrev));
+ Stream.EmitAbbrev(std::move(Abbrev));
}
/// Create an abbreviation for the SLocEntry tha...
[truncated]
|
15bde22
to
3afcc3d
Compare
Yes, this is an improvement over the status quo, but non-withstanding the fact that many of the suggestions in the discourse thread are still applicable. Do we have sufficient test coverage that we emit the existing abbreviations? I am afraid that if we don't have test coverage, then this patch would be a moral equivalent of just removing all the abbreviations altogether, as it will happen silently over the course of development, as the code will converge such that all existing abbreviations will become non-functional, and this will just become dead code. |
template <typename uintty> | ||
unsigned FindValidAbbrev(unsigned Code, ArrayRef<uintty> Vals, | ||
StringRef Blob) { | ||
if (auto it = CodeAbbrevIndex.find(Code); it != CodeAbbrevIndex.end()) { | ||
for (unsigned AbbrevToTry : it->second) { | ||
const BitCodeAbbrev *Abbv = | ||
CurAbbrevs[AbbrevToTry - bitc::FIRST_APPLICATION_ABBREV].get(); | ||
if (CheckAbbrevValidity(Vals, Blob, Abbv)) | ||
return AbbrevToTry; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we went with the tablegen route, then we would be generating code similar to the existing manual approach, for performance reasons.
I am assuming this passes the google monorepo build without any significant CPU time performance regressions, is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we ought to; tven if we did it with tablegen, it's still questionable to me to emit code which chooses abbrevs based on high-level properties that we hope result in record data matching the abbrev, instead of basing the decision on the abbrev definition. We emit the same record data in either case, so we might as well make the decision afterward emitting it, instead of before emitting it!
As I said, I have done NO performance testing of this whatsoever. I just threw it together in a couple hours and posted it. It could have a negative performance impact, but also it may not, since it also removes a bunch of conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that this is much cheaper to decide before emitting, and probably the reason why it was done that way.
Agree, I think there are many suggestions there that would be good to do in addition to this change. E.g. autogenerating abbrevs based on a corpus of common usage would be nice. (Though, at some point it might be worth re-evaluating if a generic compression algorithm could suffice)
As far as I can tell, we have no test coverage that abbrevs are ever used.
That outcome is a definite possibility. But today, we may already be effectively failing to emit abbrevs because the usage conditions have changed such that all the common cases for a given type are being excluded. If this goes in, people who care about the serialized AST file size can significantly more easily fix that by adjusting JUST the abbrevs, on their own, without needing to modify anything else. So, potentially the situation can be better than it is now, as it's easier to fix size issues. |
Right, there is a difference between the abbrevs being functional and them being effective. As it is right now, the abbrevs become less effective over time as usage patterns change, and they are emitted less often, but these abbrevs remain functional, they are not dead code as they can still be emitted. With this patch, when someone makes a fundamental change to an AST node and doesn't update the abbrev, it will silently become non-functional, non-emittable, instead of the assertion violation we get today which prevents these changes from landing.
Though it will be harder to detect we have regressed on PCH file size, and it will take longer to track down the change which caused it. We will be effectively chucking all of the burden on the people who care about these sizes. But I thought that would be you! If you are okay with that burden, I can't complain there. |
This adds a new API
BitstreamWriter::EmitRecordAutoAbbrev
, which chooses a valid abbreviation that can encode the provided record, if one exists, otherwise emits an unabbreviated record.It then uses this new functionality in Clang's ASTWriter, eliminating all error-prone manual specification of abbrevs.
This PR was created as an alternative to the proposal to stop using abbrevs in Clang record emission, in https://discourse.llvm.org/t/rfc-c-modules-stop-using-abbrev-and-drop-the-maintainance.
(Note, only after starting this did I discover that records encoded with a "Blob" abbrev are not interchangeable to the current BitstreamReader code with an unabbreviated record (or an abbrev with e.g. an Array). Given that, I'm wondering if I should remove blob support from EmitRecordAutoAbbrev, and switch the blob-encoding callers back to EmitRecordWithBlob.)