Skip to content

[NFC][IR2Vec] Minor refactoring of opcode access in vocabulary #147585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: users/svkeerthy/06-20-vocab_changes
Choose a base branch
from

Conversation

svkeerthy
Copy link
Contributor

@svkeerthy svkeerthy commented Jul 8, 2025

Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool.

(Tracking issue - #141817)

@svkeerthy svkeerthy force-pushed the users/svkeerthy/07-07-_nfc_ir2vec_minor_refactoring_of_opcode_access_in_vocabulary branch from 5408737 to 80c29d9 Compare July 8, 2025 19:33
@svkeerthy svkeerthy force-pushed the users/svkeerthy/06-20-vocab_changes branch from 31590b9 to 70dcd29 Compare July 8, 2025 19:33
@svkeerthy svkeerthy marked this pull request as ready for review July 9, 2025 22:47
@llvmbot llvmbot added mlgo llvm:analysis Includes value tracking, cost tables and constant folding labels Jul 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2025

@llvm/pr-subscribers-llvm-analysis

@llvm/pr-subscribers-mlgo

Author: S. VenkataKeerthy (svkeerthy)

Changes

Refactored IR2Vec vocabulary handling to improve code organization and error handling. This would help in upcoming PRs related to the IR2Vec tool.

(Tracking issue - #141817)


Full diff: https://github.com/llvm/llvm-project/pull/147585.diff

2 Files Affected:

  • (modified) llvm/include/llvm/Analysis/IR2Vec.h (+6-3)
  • (modified) llvm/lib/Analysis/IR2Vec.cpp (+24-21)
diff --git a/llvm/include/llvm/Analysis/IR2Vec.h b/llvm/include/llvm/Analysis/IR2Vec.h
index f5a4e450cf160..176cdaf7b5378 100644
--- a/llvm/include/llvm/Analysis/IR2Vec.h
+++ b/llvm/include/llvm/Analysis/IR2Vec.h
@@ -163,15 +163,18 @@ class Vocabulary {
   static constexpr unsigned MaxOperandKinds =
       static_cast<unsigned>(OperandKind::MaxOperandKind);
 
+  /// Helper function to get vocabulary key for a given Opcode
+  static StringRef getVocabKeyForOpcode(unsigned Opcode);
+
+  /// Helper function to get vocabulary key for a given TypeID
+  static StringRef getVocabKeyForTypeID(Type::TypeID TypeID);
+
   /// Helper function to get vocabulary key for a given OperandKind
   static StringRef getVocabKeyForOperandKind(OperandKind Kind);
 
   /// Helper function to classify an operand into OperandKind
   static OperandKind getOperandKind(const Value *Op);
 
-  /// Helper function to get vocabulary key for a given TypeID
-  static StringRef getVocabKeyForTypeID(Type::TypeID TypeID);
-
 public:
   Vocabulary() = default;
   Vocabulary(VocabVector &&Vocab);
diff --git a/llvm/lib/Analysis/IR2Vec.cpp b/llvm/lib/Analysis/IR2Vec.cpp
index d3dc2e36fd23e..f97644b93a3d4 100644
--- a/llvm/lib/Analysis/IR2Vec.cpp
+++ b/llvm/lib/Analysis/IR2Vec.cpp
@@ -243,6 +243,17 @@ const ir2vec::Embedding &Vocabulary::operator[](const Value *Arg) const {
   return Vocab[MaxOpcodes + MaxTypeIDs + static_cast<unsigned>(ArgKind)];
 }
 
+StringRef Vocabulary::getVocabKeyForOpcode(unsigned Opcode) {
+  assert(Opcode >= 1 && Opcode <= MaxOpcodes && "Invalid opcode");
+#define HANDLE_INST(NUM, OPCODE, CLASS)                                        \
+  if (Opcode == NUM) {                                                         \
+    return #OPCODE;                                                            \
+  }
+#include "llvm/IR/Instruction.def"
+#undef HANDLE_INST
+  return "UnknownOpcode";
+}
+
 StringRef Vocabulary::getVocabKeyForTypeID(Type::TypeID TypeID) {
   switch (TypeID) {
   case Type::VoidTyID:
@@ -280,6 +291,7 @@ StringRef Vocabulary::getVocabKeyForTypeID(Type::TypeID TypeID) {
   default:
     return "UnknownTy";
   }
+  return "UnknownTy";
 }
 
 // Operand kinds supported by IR2Vec - string mappings
@@ -297,9 +309,9 @@ StringRef Vocabulary::getVocabKeyForOperandKind(Vocabulary::OperandKind Kind) {
     OPERAND_KINDS
 #undef OPERAND_KIND
   case Vocabulary::OperandKind::MaxOperandKind:
-    llvm_unreachable("Invalid OperandKind");
+    return "UnknownOperand";
   }
-  llvm_unreachable("Unknown OperandKind");
+  return "UnknownOperand";
 }
 
 #undef OPERAND_KINDS
@@ -332,14 +344,8 @@ StringRef Vocabulary::getStringKey(unsigned Pos) {
   assert(Pos < MaxOpcodes + MaxTypeIDs + MaxOperandKinds &&
          "Position out of bounds in vocabulary");
   // Opcode
-  if (Pos < MaxOpcodes) {
-#define HANDLE_INST(NUM, OPCODE, CLASS)                                        \
-  if (Pos == NUM - 1) {                                                        \
-    return #OPCODE;                                                            \
-  }
-#include "llvm/IR/Instruction.def"
-#undef HANDLE_INST
-  }
+  if (Pos < MaxOpcodes)
+    return getVocabKeyForOpcode(Pos + 1);
   // Type
   if (Pos < MaxOpcodes + MaxTypeIDs)
     return getVocabKeyForTypeID(static_cast<Type::TypeID>(Pos - MaxOpcodes));
@@ -447,21 +453,18 @@ void IR2VecVocabAnalysis::generateNumMappedVocab() {
   // Handle Opcodes
   std::vector<Embedding> NumericOpcodeEmbeddings(Vocabulary::MaxOpcodes,
                                                  Embedding(Dim, 0));
-#define HANDLE_INST(NUM, OPCODE, CLASS)                                        \
-  {                                                                            \
-    auto It = OpcVocab.find(#OPCODE);                                          \
-    if (It != OpcVocab.end())                                                  \
-      NumericOpcodeEmbeddings[NUM - 1] = It->second;                           \
-    else                                                                       \
-      handleMissingEntity(#OPCODE);                                            \
+  for (unsigned Opcode : seq(0u, Vocabulary::MaxOpcodes)) {
+    StringRef VocabKey = Vocabulary::getVocabKeyForOpcode(Opcode + 1);
+    auto It = OpcVocab.find(VocabKey.str());
+    if (It != OpcVocab.end())
+      NumericOpcodeEmbeddings[Opcode] = It->second;
+    else
+      handleMissingEntity(VocabKey.str());
   }
-#include "llvm/IR/Instruction.def"
-#undef HANDLE_INST
   Vocab.insert(Vocab.end(), NumericOpcodeEmbeddings.begin(),
                NumericOpcodeEmbeddings.end());
 
-  // Handle Types using direct iteration through TypeID enum
-  // We iterate through all possible TypeID values and map them to embeddings
+  // Handle Types
   std::vector<Embedding> NumericTypeEmbeddings(Vocabulary::MaxTypeIDs,
                                                Embedding(Dim, 0));
   for (unsigned TypeID : seq(0u, Vocabulary::MaxTypeIDs)) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:analysis Includes value tracking, cost tables and constant folding mlgo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants