Skip to content

[LifetimeSafety] Introduce intra-procedural analysis in Clang #142313

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

usx95
Copy link
Contributor

@usx95 usx95 commented Jun 1, 2025

This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291

The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.

Key Components

  • Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.
  • Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.
  • Testing: llvm-lit tests validate the analysis by checking the generated facts.

Next Steps

(Not covered in this PR but planned for subsequent patches)

The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:

  • Dataflow Lattice: A dataflow lattice used to map each pointer's symbolic Origin to the set of Loans it may contain at any given program point.
  • Fixed-Point Analysis: A worklist-based, flow-sensitive analysis that propagates the lattice state across the CFG to a fixed point.
  • Placeholder Loans: Introduce placeholder loans to represent the lifetimes of function parameters, forming the basis for analysis involving function calls.
  • Annotation and Opaque Call Handling: Use placeholder loans to correctly model function calls, both by respecting [[clang::lifetimebound]] annotations and by conservatively handling opaque/un-annotated functions.
  • Error Reporting: Implement the final analysis phase that consumes the dataflow results to generate user-facing diagnostics. This will likely require liveness analysis to identify live origins holding expired loans.
  • Strict vs. Permissive Modes: Add the logic to support both high-confidence (permissive) and more comprehensive (strict) warning levels.
  • Expanded C++ Coverage: Broaden support for common patterns, including the lifetimes of temporary objects and pointers within aggregate types (structs/containers).
  • Performance benchmarking
  • Capping number of iterations or number of times a CFGBlock is processed.

Performance on pathological test cases:

The pathological case arise when we have N origins initially holding N loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches after N iterations when all the origins contain all the loans.
For N = 4, the test case would like like:

struct MyObj {
  int id;
  ~MyObj() {}
};

void long_cycle(bool condition) {
  MyObj v1{1};
  MyObj v2{1};
  MyObj v3{1};
  MyObj v4{1};

  MyObj* p1 = &v1;
  MyObj* p2 = &v2;
  MyObj* p3 = &v3;
  MyObj* p4 = &v4;

  while (condition) {
    MyObj* temp = p1;
    p1 = p2;
    p2 = p3;
    p3 = p4;
    p4 = temp;
  }
}

@usx95 usx95 force-pushed the dangling-references-latest branch from c8f6277 to 3624359 Compare June 2, 2025 13:06
Copy link

github-actions bot commented Jun 2, 2025

✅ With the latest revision this PR passed the Python code formatter.

@usx95 usx95 force-pushed the dangling-references-latest branch 4 times, most recently from 668e329 to 7b929ee Compare June 2, 2025 23:41
@usx95 usx95 changed the title Introduce Intra-procedural lifetime analysis in Clang Introduce intra-procedural lifetime analysis in Clang Jun 2, 2025
@usx95 usx95 force-pushed the dangling-references-latest branch 2 times, most recently from 136238f to 942648e Compare June 3, 2025 13:52
@usx95 usx95 requested review from ymand, gribozavr and jvoung June 4, 2025 14:32
@usx95 usx95 force-pushed the dangling-references-latest branch from 942648e to cb7bd7f Compare June 5, 2025 13:52
@usx95 usx95 force-pushed the dangling-references-latest branch from cb7bd7f to 0cd187b Compare June 5, 2025 13:57
@usx95 usx95 marked this pull request as ready for review June 10, 2025 05:49
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:analysis labels Jun 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-clang-analysis

Author: Utkarsh Saxena (usx95)

Changes

This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291

The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.

Key Components

  • Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.
  • Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.
  • Dataflow Lattice: A dataflow lattice used to map each pointer's symbolic Origin to the set of Loans it may contain at any given program point.
  • Fixed-Point Analysis: A worklist-based, flow-sensitive analysis that propagates the lattice state across the CFG to a fixed point.
  • Testing: llvm-lit tests validate the analysis by checking the generated facts and final dataflow state.

Next Steps

(Not covered in this PR but planned for subsequent patches)

The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:

  • Placeholder Loans: Introduce placeholder loans to represent the lifetimes of function parameters, forming the basis for analysis involving function calls.
  • Annotation and Opaque Call Handling: Use placeholder loans to correctly model function calls, both by respecting [[clang::lifetimebound]] annotations and by conservatively handling opaque/un-annotated functions.
  • Error Reporting: Implement the final analysis phase that consumes the dataflow results to generate user-facing diagnostics.
  • Strict vs. Permissive Modes: Add the logic to support both high-confidence (permissive) and more comprehensive (strict) warning levels.
  • Expanded C++ Coverage: Broaden support for common patterns, including the lifetimes of temporary objects and pointers within aggregate types (structs/containers).
  • Performance benchmarking
  • Capping number of iterations or number of times a CFGBlock is processed.

<details>
<summary>

Performance on pathological test cases:

</summary>

The pathological case arise when we have N origins initially holding N loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches after N iterations when all the origins contain all the loans.
For N = 4, the test case would like like:

struct MyObj {
  int id;
  ~MyObj() {}
};

void long_cycle(bool condition) {
  MyObj v1{1};
  MyObj v2{1};
  MyObj v3{1};
  MyObj v4{1};

  MyObj* p1 = &amp;v1;
  MyObj* p2 = &amp;v2;
  MyObj* p3 = &amp;v3;
  MyObj* p4 = &amp;v4;

  while (condition) {
    MyObj* temp = p1;
    p1 = p2;
    p2 = p3;
    p3 = p4;
    p4 = temp;
  }
}

</details>


Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff

5 Files Affected:

  • (added) clang/include/clang/Analysis/Analyses/LifetimeSafety.h (+13)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/LifetimeSafety.cpp (+728)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+8)
  • (added) clang/test/Sema/warn-lifetime-safety-dataflow.cpp (+362)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+                         AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
   FixitUtil.cpp
   IntervalPartition.cpp
   IssueHash.cpp
+  LifetimeSafety.cpp
   LiveVariables.cpp
   MacroExpansionContext.cpp
   ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+  const clang::CFGBlock *Block;
+  /// Index into Block->Elements().
+  unsigned ElementIndex;
+
+  Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+      : Block(B), ElementIndex(Idx) {}
+
+  bool operator==(const Point &Other) const {
+    return Block == Other.Block && ElementIndex == Other.ElementIndex;
+  }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+  const clang::ValueDecl *D;
+
+  enum class Kind : uint8_t {
+    StackVariable,
+    Heap,            // TODO: Handle.
+    Field,           // TODO: Handle.
+    ArrayElement,    // TODO: Handle.
+    TemporaryObject, // TODO: Handle.
+    StaticOrGlobal,  // TODO: Handle.
+  };
+
+  Kind PathKind;
+
+  Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+  /// TODO: Represent opaque loans.
+  /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+  /// is represented as empty LoanSet
+  LoanID ID;
+  Path SourcePath;
+  SourceLocation IssueLoc;
+
+  LoanInfo(LoanID id, Path path, SourceLocation loc)
+      : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+  OriginID ID;
+  OriginKind Kind;
+  union {
+    const clang::ValueDecl *Decl;
+    const clang::Expr *Expression;
+  };
+  OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+      : ID(id), Kind(kind), Decl(D) {}
+  OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+      : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+  LoanManager() = default;
+
+  LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+    NextLoanIDVal++;
+    AllLoans.emplace_back(NextLoanIDVal, path, loc);
+    return AllLoans.back();
+  }
+
+  const LoanInfo *getLoanInfo(LoanID id) const {
+    if (id < AllLoans.size())
+      return &AllLoans[id];
+    return nullptr;
+  }
+  llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+  LoanID NextLoanIDVal = 0;
+  llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+  OriginManager() = default;
+
+  OriginID getNextOriginID() { return NextOriginIDVal++; }
+  OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+    assert(D != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::Variable, D);
+    return AllOrigins.back();
+  }
+  OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+    assert(E != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+    return AllOrigins.back();
+  }
+
+  OriginID getOrCreate(const Expr *E) {
+    auto It = ExprToOriginID.find(E);
+    if (It != ExprToOriginID.end())
+      return It->second;
+
+    if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+      // Origin of DeclRefExpr is that of the declaration it refers to.
+      return getOrCreate(DRE->getDecl());
+    }
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, E);
+    ExprToOriginID[E] = NewID;
+    return NewID;
+  }
+
+  const OriginInfo *getOriginInfo(OriginID id) const {
+    if (id < AllOrigins.size())
+      return &AllOrigins[id];
+    return nullptr;
+  }
+
+  llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+  OriginID getOrCreate(const ValueDecl *D) {
+    auto It = DeclToOriginID.find(D);
+    if (It != DeclToOriginID.end())
+      return It->second;
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, D);
+    DeclToOriginID[D] = NewID;
+    return NewID;
+  }
+
+private:
+  OriginID NextOriginIDVal = 0;
+  llvm::SmallVector<OriginInfo> AllOrigins;
+  llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+  llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+  enum class Kind : uint8_t {
+    /// A new loan is issued from a borrow expression (e.g., &x).
+    Issue,
+    /// A loan expires as its underlying storage is freed (e.g., variable goes
+    /// out of scope).
+    Expire,
+    /// An origin is propagated from a source to a destination (e.g., p = q).
+    AssignOrigin,
+    /// An origin is part of a function's return value.
+    ReturnOfOrigin
+  };
+
+private:
+  Kind K;
+
+protected:
+  Point P;
+  Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+  virtual ~Fact() = default;
+  Kind getKind() const { return K; }
+  Point getPoint() const { return P; }
+
+  template <typename T> const T *getAs() const {
+    if (T::classof(this))
+      return static_cast<const T *>(this);
+    return nullptr;
+  }
+
+  virtual void dump(llvm::raw_ostream &OS) const {
+    OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+       << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+  }
+};
+
+class IssueFact : public Fact {
+  LoanID LID;
+  OriginID OID;
+
+public:
+  IssueFact(LoanID LID, OriginID OID, Point Pt)
+      : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+  LoanID getLoanID() const { return LID; }
+  OriginID getOriginID() const { return OID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+       << ")\n";
+  }
+};
+
+class ExpireFact : public Fact {
+  LoanID LID;
+
+public:
+  ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+  LoanID getLoanID() const { return LID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Expire (LoanID: " << getLoanID() << ")\n";
+  }
+};
+
+class AssignOriginFact : public Fact {
+  OriginID OIDDest;
+  OriginID OIDSrc;
+
+public:
+  AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+      : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+  OriginID getDestOriginID() const { return OIDDest; }
+  OriginID getSrcOriginID() const { return OIDSrc; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::AssignOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "AssignOrigin (DestID: " << getDestOriginID()
+       << ", SrcID: " << getSrcOriginID() << ")\n";
+  }
+};
+
+class ReturnOfOriginFact : public Fact {
+  OriginID OID;
+
+public:
+  ReturnOfOriginFact(OriginID OID, Point Pt)
+      : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+  OriginID getReturnedOriginID() const { return OID; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::ReturnOfOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+  }
+};
+
+class FactManager {
+public:
+  llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+    auto It = BlockToFactsMap.find(B);
+    if (It != BlockToFactsMap.end())
+      return It->second;
+    return {};
+  }
+
+  void addFact(const CFGBlock *B, Fact *NewFact) {
+    BlockToFactsMap[B].push_back(NewFact);
+  }
+
+  template <typename FactType, typename... Args>
+  FactType *createFact(Args &&...args) {
+    void *Mem = FactAllocator.Allocate<FactType>();
+    return new (Mem) FactType(std::forward<Args>(args)...);
+  }
+
+  void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+    llvm::dbgs() << "==========================================\n";
+    llvm::dbgs() << "       Lifetime Analysis Facts:\n";
+    llvm::dbgs() << "==========================================\n";
+    if (const Decl *D = AC.getDecl()) {
+      if (const auto *ND = dyn_cast<NamedDecl>(D))
+        llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+    }
+    // Print blocks in the order as they appear in code for a stable ordering.
+    ForwardDataflowWorklist worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      worklist.enqueueBlock(B);
+    while (const CFGBlock *B = worklist.dequeue()) {
+      llvm::dbgs() << "  Block B" << B->getBlockID() << ":\n";
+      auto It = BlockToFactsMap.find(B);
+      if (It != BlockToFactsMap.end()) {
+        for (const Fact *F : It->second) {
+          llvm::dbgs() << "    ";
+          F->dump(llvm::dbgs());
+        }
+      }
+      llvm::dbgs() << "  End of Block\n";
+    }
+  }
+
+private:
+  llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+      BlockToFactsMap;
+  llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+  FactManager Facts;
+  LoanManager Loans;
+  OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+  FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+      : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+  void run() {
+    llvm::TimeTraceScope TimeProfile("Fact Generation");
+    // Iterate through the CFG blocks in pre-order to ensure that
+    // initializations and destructions are processed in the correct sequence.
+    // TODO: A pre-order traversal utility should be provided by Dataflow
+    // framework.
+    ForwardDataflowWorklist Worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      Worklist.enqueueBlock(B);
+    while (const CFGBlock *Block = Worklist.dequeue()) {
+      CurrentBlock = Block;
+      for (unsigned I = 0; I < Block->size(); ++I) {
+        const CFGElement &Element = Block->Elements[I];
+        CurrentPoint = Point(Block, I);
+        if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+          Visit(CS->getStmt());
+        else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+                     Element.getAs<CFGAutomaticObjDtor>())
+          handleDestructor(*DtorOpt);
+      }
+    }
+  }
+
+  void VisitDeclStmt(const DeclStmt *DS) {
+    for (const Decl *D : DS->decls()) {
+      if (const auto *VD = dyn_cast<VarDecl>(D)) {
+        if (hasOrigin(VD->getType())) {
+          if (const Expr *InitExpr = VD->getInit()) {
+            OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+            OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<AssignOriginFact>(
+                                       DestOID, SrcOID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+    if (!hasOrigin(ICE->getType()))
+      return;
+    // An ImplicitCastExpr node itself gets an origin, which flows from the
+    // origin of its sub-expression (after stripping its own parens/casts).
+    if (ICE->getCastKind() == CK_LValueToRValue) {
+      OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+      OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+      FactsCtx.Facts.addFact(CurrentBlock,
+                             FactsCtx.Facts.createFact<AssignOriginFact>(
+                                 IceOID, SubExprOID, CurrentPoint));
+    }
+  }
+
+  void VisitUnaryOperator(const UnaryOperator *UO) {
+    if (UO->getOpcode() == UO_AddrOf) {
+      const Expr *SubExpr = UO->getSubExpr();
+      if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+        if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+          // Check if it's a local variable.
+          if (VD->hasLocalStorage()) {
+            OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+            Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+            LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+                                                        UO->getOperatorLoc());
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<IssueFact>(
+                                       Loan.ID, OID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitReturnStmt(const ReturnStmt *RS) {
+    if (const Expr *RetExpr = RS->getRetValue()) {
+      if (hasOrigin(RetExpr->getType())) {
+        OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+        FactsCtx.Facts.addFact(
+            CurrentBlock,
+            FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+      }
+    }
+  }
+
+  void VisitBinaryOperator(const BinaryOperator *BO) {
+    if (BO->isAssignmentOp()) {
+      const Expr *LHSExpr = BO->getLHS();
+      const Expr *RHSExpr = BO->getRHS();
+
+      // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+      // LHS must be a pointer/reference type that can be an origin.
+      // RHS must also represent an origin (either another pointer/ref or an
+      // address-of).
+      if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+        if (const auto *VD_LHS =
+                dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+            VD_LHS && hasOrigin(VD_LHS->getType())) {
+          OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+          OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+          FactsCtx.Facts.addFact(CurrentBlock,
+                                 FactsCtx.Facts.createFact<AssignOriginFact>(
+                                     DestOID, SrcOID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+private:
+  // Check if a type have an origin.
+  bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+  void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+    /// TODO: Also handle trivial destructors (e.g., for `int`
+    // variables) which will never have a CFGAutomaticObjDtor node.
+    /// TODO: Handle loans to temporaries.
+    const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+    if (!DestructedVD)
+      return;
+    // Iterate through all loans to see if any expire.
+    for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+      const Path &LoanPath = Loan.SourcePath;
+      // Check if the loan is for a stack variable and if that variable
+      // is the one being destructed.
+      if (LoanPath.PathKind == Path::Kind::StackVariable) {
+        if (LoanPath.D == DestructedVD) {
+          FactsCtx.Facts.addFact(
+              CurrentBlock,
+              FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+  FactsContext &FactsCtx;
+  const CFG &Cfg;
+  AnalysisDeclContext &AC;
+  const CFGBlock *CurrentBlock;
+  Point CurrentPoint;
+};
+
+// ========================================================================= //
+//                              The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+  OriginLoanMap::Factory OriginMapFact;
+  LoanSet::Factory LoanSetFact;
+
+  LoanSet createLoanSet(LoanID LID) {
+    return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+  }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+  /// The map from an origin to the set of loans it contains.
+  /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+  /// not expressions, because expressions are not visible across blocks.
+  OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+  explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+  LifetimeLattice() = default;
+
+  bool operator==(const LifetimeLattice &Other) const {
+    return Origins == Other.Origins;
+  }
+  bool operator!=(const LifetimeLattice &Other) const {
+    return !(*this == Other);
+  }
+
+  LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+    if (auto *Loans = Origins.lookup(OID))
+      return *Loans;
+    return Factory.LoanSetFact.getEmptySet();
+  }
+
+  /// Computes the union of two lattices by performing a key-wise merge of
+  /// their OriginLoanMaps.
+  LifetimeLattice merge(const LifetimeLattice &Other,
+                        LifetimeFactory &Factory) const {
+    /// Merge the smaller map into the larger one ensuring we iterate over the
+    /// smaller map.
+    if (Origins.getHeight() < Other.Origins.getHeight())
+      return Other.merge(*this, Factory);
+
+    OriginLoanMap MergedState = Origins;
+    // For each origin in the other map, union its loan set with ours.
+    for (const auto &Entry : Other.Origins) {
+      OriginID OID = Entry.first;
+      LoanSet OtherLoanSet = Entry.second;
+      MergedState = Factory.OriginMapFact.add(
+          MergedState, OID,
+          merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+    }
+    return LifetimeLattice(MergedState);
+  }
+
+  LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+    /// Merge the smaller set into the larger one ensuring we iterate over the
+    /// smaller set.
+    if (a.getHeight() < b.getHeight())
+      std::swap(a, b);
+    LoanSet Result = a;
+    for (LoanID LID : b) {
+      /// TODO(opt): Profiling shows that this loop is a major performance
+      /// bottleneck. Investigate using a BitVector to represent the set of
+      /// loans for improved merge performance.
+      Result = Factory.LoanSetFact.add(Result, LID);
+    }
+    return Result;
+  }
+
+  void dump(llvm::raw_ostream &OS) const {
+    OS << "LifetimeLattice State:\n";
+    if (Origins.isEmpty())
+      OS << "  <empty>\n";
+    for (const auto &Entry : Origi...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-clang

Author: Utkarsh Saxena (usx95)

Changes

This patch introduces the initial implementation of the intra-procedural, flow-sensitive lifetime analysis for Clang, as proposed in the recent RFC: https://discourse.llvm.org/t/rfc-intra-procedural-lifetime-analysis-in-clang/86291

The primary goal of this initial submission is to establish the core dataflow framework and gather feedback on the overall design, fact representation, and testing strategy. The focus is on the dataflow mechanism itself rather than exhaustively covering all C++ AST edge cases, which will be addressed in subsequent patches.

Key Components

  • Conceptual Model: Introduces the fundamental concepts of Loan, Origin, and Path to model memory borrows and the lifetime of pointers.
  • Fact Generation: A frontend pass traverses the Clang CFG to generate a representation of lifetime-relevant events, such as pointer assignments, taking an address, and variables going out of scope.
  • Dataflow Lattice: A dataflow lattice used to map each pointer's symbolic Origin to the set of Loans it may contain at any given program point.
  • Fixed-Point Analysis: A worklist-based, flow-sensitive analysis that propagates the lattice state across the CFG to a fixed point.
  • Testing: llvm-lit tests validate the analysis by checking the generated facts and final dataflow state.

Next Steps

(Not covered in this PR but planned for subsequent patches)

The following functionality is planned for the upcoming patches to build upon this foundation and make the analysis usable in practice:

  • Placeholder Loans: Introduce placeholder loans to represent the lifetimes of function parameters, forming the basis for analysis involving function calls.
  • Annotation and Opaque Call Handling: Use placeholder loans to correctly model function calls, both by respecting [[clang::lifetimebound]] annotations and by conservatively handling opaque/un-annotated functions.
  • Error Reporting: Implement the final analysis phase that consumes the dataflow results to generate user-facing diagnostics.
  • Strict vs. Permissive Modes: Add the logic to support both high-confidence (permissive) and more comprehensive (strict) warning levels.
  • Expanded C++ Coverage: Broaden support for common patterns, including the lifetimes of temporary objects and pointers within aggregate types (structs/containers).
  • Performance benchmarking
  • Capping number of iterations or number of times a CFGBlock is processed.

<details>
<summary>

Performance on pathological test cases:

</summary>

The pathological case arise when we have N origins initially holding N loans and we have a cyclic assignment of these origins in a loop. The fixed point reaches after N iterations when all the origins contain all the loans.
For N = 4, the test case would like like:

struct MyObj {
  int id;
  ~MyObj() {}
};

void long_cycle(bool condition) {
  MyObj v1{1};
  MyObj v2{1};
  MyObj v3{1};
  MyObj v4{1};

  MyObj* p1 = &amp;v1;
  MyObj* p2 = &amp;v2;
  MyObj* p3 = &amp;v3;
  MyObj* p4 = &amp;v4;

  while (condition) {
    MyObj* temp = p1;
    p1 = p2;
    p2 = p3;
    p3 = p4;
    p4 = temp;
  }
}

</details>


Patch is 41.22 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/142313.diff

5 Files Affected:

  • (added) clang/include/clang/Analysis/Analyses/LifetimeSafety.h (+13)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/LifetimeSafety.cpp (+728)
  • (modified) clang/lib/Sema/AnalysisBasedWarnings.cpp (+8)
  • (added) clang/test/Sema/warn-lifetime-safety-dataflow.cpp (+362)
diff --git a/clang/include/clang/Analysis/Analyses/LifetimeSafety.h b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
new file mode 100644
index 0000000000000..daf24fff72b9b
--- /dev/null
+++ b/clang/include/clang/Analysis/Analyses/LifetimeSafety.h
@@ -0,0 +1,13 @@
+#ifndef LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#define LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
+#include "clang/AST/DeclBase.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+namespace clang {
+
+void runLifetimeAnalysis(const DeclContext &DC, const CFG &Cfg,
+                         AnalysisDeclContext &AC);
+
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_ANALYSES_LIFETIME_SAFETY_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 8cd3990db4c3e..0523d92480cb3 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -21,6 +21,7 @@ add_clang_library(clangAnalysis
   FixitUtil.cpp
   IntervalPartition.cpp
   IssueHash.cpp
+  LifetimeSafety.cpp
   LiveVariables.cpp
   MacroExpansionContext.cpp
   ObjCNoReturn.cpp
diff --git a/clang/lib/Analysis/LifetimeSafety.cpp b/clang/lib/Analysis/LifetimeSafety.cpp
new file mode 100644
index 0000000000000..8dbb4dc37026c
--- /dev/null
+++ b/clang/lib/Analysis/LifetimeSafety.cpp
@@ -0,0 +1,728 @@
+#include "clang/Analysis/Analyses/LifetimeSafety.h"
+#include "clang/AST/Decl.h"
+#include "clang/AST/Expr.h"
+#include "clang/AST/StmtVisitor.h"
+#include "clang/AST/Type.h"
+#include "clang/Analysis/AnalysisDeclContext.h"
+#include "clang/Analysis/CFG.h"
+#include "clang/Analysis/FlowSensitive/DataflowWorklist.h"
+#include "llvm/ADT/ImmutableMap.h"
+#include "llvm/ADT/ImmutableSet.h"
+#include "llvm/ADT/PointerUnion.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include <vector>
+
+namespace clang {
+namespace {
+
+struct Point {
+  const clang::CFGBlock *Block;
+  /// Index into Block->Elements().
+  unsigned ElementIndex;
+
+  Point(const clang::CFGBlock *B = nullptr, unsigned Idx = 0)
+      : Block(B), ElementIndex(Idx) {}
+
+  bool operator==(const Point &Other) const {
+    return Block == Other.Block && ElementIndex == Other.ElementIndex;
+  }
+};
+
+/// Represents the storage location being borrowed, e.g., a specific stack
+/// variable.
+/// TODO: Handle member accesseslike `s.y`.
+struct Path {
+  const clang::ValueDecl *D;
+
+  enum class Kind : uint8_t {
+    StackVariable,
+    Heap,            // TODO: Handle.
+    Field,           // TODO: Handle.
+    ArrayElement,    // TODO: Handle.
+    TemporaryObject, // TODO: Handle.
+    StaticOrGlobal,  // TODO: Handle.
+  };
+
+  Kind PathKind;
+
+  Path(const clang::ValueDecl *D, Kind K) : D(D), PathKind(K) {}
+};
+
+using LoanID = uint32_t;
+using OriginID = uint32_t;
+
+/// Information about a single borrow, or "Loan". A loan is created when a
+/// reference or pointer is taken.
+struct LoanInfo {
+  /// TODO: Represent opaque loans.
+  /// TODO: Represent nullptr: loans to no path. Accessing it UB! Currently it
+  /// is represented as empty LoanSet
+  LoanID ID;
+  Path SourcePath;
+  SourceLocation IssueLoc;
+
+  LoanInfo(LoanID id, Path path, SourceLocation loc)
+      : ID(id), SourcePath(path), IssueLoc(loc) {}
+};
+
+enum class OriginKind : uint8_t { Variable, ExpressionResult };
+
+/// An Origin is a symbolic identifier that represents the set of possible
+/// loans a pointer-like object could hold at any given time.
+/// TODO: Also represent Origins of complex types (fields, inner types).
+struct OriginInfo {
+  OriginID ID;
+  OriginKind Kind;
+  union {
+    const clang::ValueDecl *Decl;
+    const clang::Expr *Expression;
+  };
+  OriginInfo(OriginID id, OriginKind kind, const clang::ValueDecl *D)
+      : ID(id), Kind(kind), Decl(D) {}
+  OriginInfo(OriginID id, OriginKind kind, const clang::Expr *E)
+      : ID(id), Kind(kind), Expression(E) {}
+};
+
+class LoanManager {
+public:
+  LoanManager() = default;
+
+  LoanInfo &addLoanInfo(Path path, SourceLocation loc) {
+    NextLoanIDVal++;
+    AllLoans.emplace_back(NextLoanIDVal, path, loc);
+    return AllLoans.back();
+  }
+
+  const LoanInfo *getLoanInfo(LoanID id) const {
+    if (id < AllLoans.size())
+      return &AllLoans[id];
+    return nullptr;
+  }
+  llvm::ArrayRef<LoanInfo> getLoanInfos() const { return AllLoans; }
+
+private:
+  LoanID NextLoanIDVal = 0;
+  llvm::SmallVector<LoanInfo> AllLoans;
+};
+
+class OriginManager {
+public:
+  OriginManager() = default;
+
+  OriginID getNextOriginID() { return NextOriginIDVal++; }
+  OriginInfo &addOriginInfo(OriginID id, const clang::ValueDecl *D) {
+    assert(D != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::Variable, D);
+    return AllOrigins.back();
+  }
+  OriginInfo &addOriginInfo(OriginID id, const clang::Expr *E) {
+    assert(E != nullptr);
+    AllOrigins.emplace_back(id, OriginKind::ExpressionResult, E);
+    return AllOrigins.back();
+  }
+
+  OriginID getOrCreate(const Expr *E) {
+    auto It = ExprToOriginID.find(E);
+    if (It != ExprToOriginID.end())
+      return It->second;
+
+    if (const auto *DRE = dyn_cast<DeclRefExpr>(E)) {
+      // Origin of DeclRefExpr is that of the declaration it refers to.
+      return getOrCreate(DRE->getDecl());
+    }
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, E);
+    ExprToOriginID[E] = NewID;
+    return NewID;
+  }
+
+  const OriginInfo *getOriginInfo(OriginID id) const {
+    if (id < AllOrigins.size())
+      return &AllOrigins[id];
+    return nullptr;
+  }
+
+  llvm::ArrayRef<OriginInfo> getOriginInfos() const { return AllOrigins; }
+
+  OriginID getOrCreate(const ValueDecl *D) {
+    auto It = DeclToOriginID.find(D);
+    if (It != DeclToOriginID.end())
+      return It->second;
+    OriginID NewID = getNextOriginID();
+    addOriginInfo(NewID, D);
+    DeclToOriginID[D] = NewID;
+    return NewID;
+  }
+
+private:
+  OriginID NextOriginIDVal = 0;
+  llvm::SmallVector<OriginInfo> AllOrigins;
+  llvm::DenseMap<const clang::ValueDecl *, OriginID> DeclToOriginID;
+  llvm::DenseMap<const clang::Expr *, OriginID> ExprToOriginID;
+};
+
+/// An abstract base class for a single, atomic lifetime-relevant event.
+class Fact {
+
+public:
+  enum class Kind : uint8_t {
+    /// A new loan is issued from a borrow expression (e.g., &x).
+    Issue,
+    /// A loan expires as its underlying storage is freed (e.g., variable goes
+    /// out of scope).
+    Expire,
+    /// An origin is propagated from a source to a destination (e.g., p = q).
+    AssignOrigin,
+    /// An origin is part of a function's return value.
+    ReturnOfOrigin
+  };
+
+private:
+  Kind K;
+
+protected:
+  Point P;
+  Fact(Kind K, Point Pt) : K(K), P(Pt) {}
+
+public:
+  virtual ~Fact() = default;
+  Kind getKind() const { return K; }
+  Point getPoint() const { return P; }
+
+  template <typename T> const T *getAs() const {
+    if (T::classof(this))
+      return static_cast<const T *>(this);
+    return nullptr;
+  }
+
+  virtual void dump(llvm::raw_ostream &OS) const {
+    OS << "Fact (Kind: " << static_cast<int>(K) << ", Point: B"
+       << P.Block->getBlockID() << ":" << P.ElementIndex << ")\n";
+  }
+};
+
+class IssueFact : public Fact {
+  LoanID LID;
+  OriginID OID;
+
+public:
+  IssueFact(LoanID LID, OriginID OID, Point Pt)
+      : Fact(Kind::Issue, Pt), LID(LID), OID(OID) {}
+  LoanID getLoanID() const { return LID; }
+  OriginID getOriginID() const { return OID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Issue; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Issue (LoanID: " << getLoanID() << ", OriginID: " << getOriginID()
+       << ")\n";
+  }
+};
+
+class ExpireFact : public Fact {
+  LoanID LID;
+
+public:
+  ExpireFact(LoanID LID, Point Pt) : Fact(Kind::Expire, Pt), LID(LID) {}
+  LoanID getLoanID() const { return LID; }
+  static bool classof(const Fact *F) { return F->getKind() == Kind::Expire; }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "Expire (LoanID: " << getLoanID() << ")\n";
+  }
+};
+
+class AssignOriginFact : public Fact {
+  OriginID OIDDest;
+  OriginID OIDSrc;
+
+public:
+  AssignOriginFact(OriginID OIDDest, OriginID OIDSrc, Point Pt)
+      : Fact(Kind::AssignOrigin, Pt), OIDDest(OIDDest), OIDSrc(OIDSrc) {}
+  OriginID getDestOriginID() const { return OIDDest; }
+  OriginID getSrcOriginID() const { return OIDSrc; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::AssignOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "AssignOrigin (DestID: " << getDestOriginID()
+       << ", SrcID: " << getSrcOriginID() << ")\n";
+  }
+};
+
+class ReturnOfOriginFact : public Fact {
+  OriginID OID;
+
+public:
+  ReturnOfOriginFact(OriginID OID, Point Pt)
+      : Fact(Kind::ReturnOfOrigin, Pt), OID(OID) {}
+  OriginID getReturnedOriginID() const { return OID; }
+  static bool classof(const Fact *F) {
+    return F->getKind() == Kind::ReturnOfOrigin;
+  }
+  void dump(llvm::raw_ostream &OS) const override {
+    OS << "ReturnOfOrigin (OriginID: " << getReturnedOriginID() << ")\n";
+  }
+};
+
+class FactManager {
+public:
+  llvm::ArrayRef<Fact *> getFacts(const CFGBlock *B) const {
+    auto It = BlockToFactsMap.find(B);
+    if (It != BlockToFactsMap.end())
+      return It->second;
+    return {};
+  }
+
+  void addFact(const CFGBlock *B, Fact *NewFact) {
+    BlockToFactsMap[B].push_back(NewFact);
+  }
+
+  template <typename FactType, typename... Args>
+  FactType *createFact(Args &&...args) {
+    void *Mem = FactAllocator.Allocate<FactType>();
+    return new (Mem) FactType(std::forward<Args>(args)...);
+  }
+
+  void dump(const CFG &Cfg, AnalysisDeclContext &AC) const {
+    llvm::dbgs() << "==========================================\n";
+    llvm::dbgs() << "       Lifetime Analysis Facts:\n";
+    llvm::dbgs() << "==========================================\n";
+    if (const Decl *D = AC.getDecl()) {
+      if (const auto *ND = dyn_cast<NamedDecl>(D))
+        llvm::dbgs() << "Function: " << ND->getQualifiedNameAsString() << "\n";
+    }
+    // Print blocks in the order as they appear in code for a stable ordering.
+    ForwardDataflowWorklist worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      worklist.enqueueBlock(B);
+    while (const CFGBlock *B = worklist.dequeue()) {
+      llvm::dbgs() << "  Block B" << B->getBlockID() << ":\n";
+      auto It = BlockToFactsMap.find(B);
+      if (It != BlockToFactsMap.end()) {
+        for (const Fact *F : It->second) {
+          llvm::dbgs() << "    ";
+          F->dump(llvm::dbgs());
+        }
+      }
+      llvm::dbgs() << "  End of Block\n";
+    }
+  }
+
+private:
+  llvm::DenseMap<const clang::CFGBlock *, llvm::SmallVector<Fact *>>
+      BlockToFactsMap;
+  llvm::BumpPtrAllocator FactAllocator;
+};
+
+struct FactsContext {
+  FactManager Facts;
+  LoanManager Loans;
+  OriginManager Origins;
+};
+
+class FactGenerator : public ConstStmtVisitor<FactGenerator> {
+
+public:
+  FactGenerator(const CFG &Cfg, FactsContext &FactsCtx, AnalysisDeclContext &AC)
+      : FactsCtx(FactsCtx), Cfg(Cfg), AC(AC), CurrentBlock(nullptr) {}
+
+  void run() {
+    llvm::TimeTraceScope TimeProfile("Fact Generation");
+    // Iterate through the CFG blocks in pre-order to ensure that
+    // initializations and destructions are processed in the correct sequence.
+    // TODO: A pre-order traversal utility should be provided by Dataflow
+    // framework.
+    ForwardDataflowWorklist Worklist(Cfg, AC);
+    for (const CFGBlock *B : Cfg.const_nodes())
+      Worklist.enqueueBlock(B);
+    while (const CFGBlock *Block = Worklist.dequeue()) {
+      CurrentBlock = Block;
+      for (unsigned I = 0; I < Block->size(); ++I) {
+        const CFGElement &Element = Block->Elements[I];
+        CurrentPoint = Point(Block, I);
+        if (std::optional<CFGStmt> CS = Element.getAs<CFGStmt>())
+          Visit(CS->getStmt());
+        else if (std::optional<CFGAutomaticObjDtor> DtorOpt =
+                     Element.getAs<CFGAutomaticObjDtor>())
+          handleDestructor(*DtorOpt);
+      }
+    }
+  }
+
+  void VisitDeclStmt(const DeclStmt *DS) {
+    for (const Decl *D : DS->decls()) {
+      if (const auto *VD = dyn_cast<VarDecl>(D)) {
+        if (hasOrigin(VD->getType())) {
+          if (const Expr *InitExpr = VD->getInit()) {
+            OriginID DestOID = FactsCtx.Origins.getOrCreate(VD);
+            OriginID SrcOID = FactsCtx.Origins.getOrCreate(InitExpr);
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<AssignOriginFact>(
+                                       DestOID, SrcOID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitImplicitCastExpr(const ImplicitCastExpr *ICE) {
+    if (!hasOrigin(ICE->getType()))
+      return;
+    // An ImplicitCastExpr node itself gets an origin, which flows from the
+    // origin of its sub-expression (after stripping its own parens/casts).
+    if (ICE->getCastKind() == CK_LValueToRValue) {
+      OriginID IceOID = FactsCtx.Origins.getOrCreate(ICE);
+      OriginID SubExprOID = FactsCtx.Origins.getOrCreate(ICE->getSubExpr());
+      FactsCtx.Facts.addFact(CurrentBlock,
+                             FactsCtx.Facts.createFact<AssignOriginFact>(
+                                 IceOID, SubExprOID, CurrentPoint));
+    }
+  }
+
+  void VisitUnaryOperator(const UnaryOperator *UO) {
+    if (UO->getOpcode() == UO_AddrOf) {
+      const Expr *SubExpr = UO->getSubExpr();
+      if (const auto *DRE = dyn_cast<DeclRefExpr>(SubExpr)) {
+        if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) {
+          // Check if it's a local variable.
+          if (VD->hasLocalStorage()) {
+            OriginID OID = FactsCtx.Origins.getOrCreate(UO);
+            Path AddrOfLocalVarPath(VD, Path::Kind::StackVariable);
+            LoanInfo &Loan = FactsCtx.Loans.addLoanInfo(AddrOfLocalVarPath,
+                                                        UO->getOperatorLoc());
+            FactsCtx.Facts.addFact(CurrentBlock,
+                                   FactsCtx.Facts.createFact<IssueFact>(
+                                       Loan.ID, OID, CurrentPoint));
+          }
+        }
+      }
+    }
+  }
+
+  void VisitReturnStmt(const ReturnStmt *RS) {
+    if (const Expr *RetExpr = RS->getRetValue()) {
+      if (hasOrigin(RetExpr->getType())) {
+        OriginID OID = FactsCtx.Origins.getOrCreate(RetExpr);
+        FactsCtx.Facts.addFact(
+            CurrentBlock,
+            FactsCtx.Facts.createFact<ReturnOfOriginFact>(OID, CurrentPoint));
+      }
+    }
+  }
+
+  void VisitBinaryOperator(const BinaryOperator *BO) {
+    if (BO->isAssignmentOp()) {
+      const Expr *LHSExpr = BO->getLHS();
+      const Expr *RHSExpr = BO->getRHS();
+
+      // We are interested in assignments like `ptr1 = ptr2` or `ptr = &var`
+      // LHS must be a pointer/reference type that can be an origin.
+      // RHS must also represent an origin (either another pointer/ref or an
+      // address-of).
+      if (const auto *DRE_LHS = dyn_cast<DeclRefExpr>(LHSExpr)) {
+        if (const auto *VD_LHS =
+                dyn_cast<ValueDecl>(DRE_LHS->getDecl()->getCanonicalDecl());
+            VD_LHS && hasOrigin(VD_LHS->getType())) {
+          OriginID DestOID = FactsCtx.Origins.getOrCreate(VD_LHS);
+          OriginID SrcOID = FactsCtx.Origins.getOrCreate(RHSExpr);
+          FactsCtx.Facts.addFact(CurrentBlock,
+                                 FactsCtx.Facts.createFact<AssignOriginFact>(
+                                     DestOID, SrcOID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+private:
+  // Check if a type have an origin.
+  bool hasOrigin(QualType QT) { return QT->isPointerOrReferenceType(); }
+
+  void handleDestructor(const CFGAutomaticObjDtor &DtorOpt) {
+    /// TODO: Also handle trivial destructors (e.g., for `int`
+    // variables) which will never have a CFGAutomaticObjDtor node.
+    /// TODO: Handle loans to temporaries.
+    const VarDecl *DestructedVD = DtorOpt.getVarDecl();
+    if (!DestructedVD)
+      return;
+    // Iterate through all loans to see if any expire.
+    for (const LoanInfo &Loan : FactsCtx.Loans.getLoanInfos()) {
+      const Path &LoanPath = Loan.SourcePath;
+      // Check if the loan is for a stack variable and if that variable
+      // is the one being destructed.
+      if (LoanPath.PathKind == Path::Kind::StackVariable) {
+        if (LoanPath.D == DestructedVD) {
+          FactsCtx.Facts.addFact(
+              CurrentBlock,
+              FactsCtx.Facts.createFact<ExpireFact>(Loan.ID, CurrentPoint));
+        }
+      }
+    }
+  }
+
+  FactsContext &FactsCtx;
+  const CFG &Cfg;
+  AnalysisDeclContext &AC;
+  const CFGBlock *CurrentBlock;
+  Point CurrentPoint;
+};
+
+// ========================================================================= //
+//                              The Dataflow Lattice
+// ========================================================================= //
+
+// Using LLVM's immutable collections is efficient for dataflow analysis
+// as it avoids deep copies during state transitions.
+// TODO(opt): Consider using a bitset to represent the set of loans.
+using LoanSet = llvm::ImmutableSet<LoanID>;
+using OriginLoanMap = llvm::ImmutableMap<OriginID, LoanSet>;
+
+/// A context object to hold the factories for immutable collections, ensuring
+/// that all created states share the same underlying memory management.
+struct LifetimeFactory {
+  OriginLoanMap::Factory OriginMapFact;
+  LoanSet::Factory LoanSetFact;
+
+  LoanSet createLoanSet(LoanID LID) {
+    return LoanSetFact.add(LoanSetFact.getEmptySet(), LID);
+  }
+};
+
+/// LifetimeLattice represents the state of our analysis at a given program
+/// point. It is an immutable object, and all operations produce a new
+/// instance rather than modifying the existing one.
+struct LifetimeLattice {
+  /// The map from an origin to the set of loans it contains.
+  /// TODO(opt): To reduce the lattice size, propagate origins of declarations,
+  /// not expressions, because expressions are not visible across blocks.
+  OriginLoanMap Origins = OriginLoanMap(nullptr);
+
+  explicit LifetimeLattice(const OriginLoanMap &S) : Origins(S) {}
+  LifetimeLattice() = default;
+
+  bool operator==(const LifetimeLattice &Other) const {
+    return Origins == Other.Origins;
+  }
+  bool operator!=(const LifetimeLattice &Other) const {
+    return !(*this == Other);
+  }
+
+  LoanSet getLoans(OriginID OID, LifetimeFactory &Factory) const {
+    if (auto *Loans = Origins.lookup(OID))
+      return *Loans;
+    return Factory.LoanSetFact.getEmptySet();
+  }
+
+  /// Computes the union of two lattices by performing a key-wise merge of
+  /// their OriginLoanMaps.
+  LifetimeLattice merge(const LifetimeLattice &Other,
+                        LifetimeFactory &Factory) const {
+    /// Merge the smaller map into the larger one ensuring we iterate over the
+    /// smaller map.
+    if (Origins.getHeight() < Other.Origins.getHeight())
+      return Other.merge(*this, Factory);
+
+    OriginLoanMap MergedState = Origins;
+    // For each origin in the other map, union its loan set with ours.
+    for (const auto &Entry : Other.Origins) {
+      OriginID OID = Entry.first;
+      LoanSet OtherLoanSet = Entry.second;
+      MergedState = Factory.OriginMapFact.add(
+          MergedState, OID,
+          merge(getLoans(OID, Factory), OtherLoanSet, Factory));
+    }
+    return LifetimeLattice(MergedState);
+  }
+
+  LoanSet merge(LoanSet a, LoanSet b, LifetimeFactory &Factory) const {
+    /// Merge the smaller set into the larger one ensuring we iterate over the
+    /// smaller set.
+    if (a.getHeight() < b.getHeight())
+      std::swap(a, b);
+    LoanSet Result = a;
+    for (LoanID LID : b) {
+      /// TODO(opt): Profiling shows that this loop is a major performance
+      /// bottleneck. Investigate using a BitVector to represent the set of
+      /// loans for improved merge performance.
+      Result = Factory.LoanSetFact.add(Result, LID);
+    }
+    return Result;
+  }
+
+  void dump(llvm::raw_ostream &OS) const {
+    OS << "LifetimeLattice State:\n";
+    if (Origins.isEmpty())
+      OS << "  <empty>\n";
+    for (const auto &Entry : Origi...
[truncated]

@usx95 usx95 requested a review from Xazax-hun June 10, 2025 05:49
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a very rudimentary first pass, will take a second look a bit later.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I don't have any major comments on the design, I have mostly some nits inline.

return;
// An ImplicitCastExpr node itself gets an origin, which flows from the
// origin of its sub-expression (after stripping its own parens/casts).
if (ICE->getCastKind() == CK_LValueToRValue) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test so I can see how this works?

int a;
int *p = &a;
int **pp = &p;
int *q = *pp; 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I missed to mention the plans to handle this.
It doesn't work right now because the analysis gets tripped up by both address-of &p and the reborrowing *pp.
My plan is to fix this by moving from a single origin per pointer type to a list of origins.

The idea is to build up the origin list based on the type:

&p

We start with p (type int*), which has its own origin list, [O_p]. Taking the address & then prepends a new origin that holds a loan on the variable p itself. The resulting origin list for the expression &p (type int**) is therefore [O_&p, O_p].

*pp

The origin list for the dereference *pp is simply the tail of pp's list, which is [O_p].

subtyping : e.g. assignment int **pp = &p;

This is checked using a slightly more complex subtyping rule for the origin lists. For an assignment where the LHS has origin list [O1, O2] and the RHS has [O3, O4], the subtyping enforces:

  • O3 is a subset of O1. (Standard subtyping for the outer pointer).
  • O2 is an alias of O4. A stricter, invariant relationship for the inner pointers. Polonius also handles these by creating two subset constraints: O2 ⊆ O4 and O4 ⊆ O2 essentially making them equivalent.

struct LifetimeLattice {
/// The map from an origin to the set of loans it contains.
/// TODO(opt): To reduce the lattice size, propagate origins of declarations,
/// not expressions, because expressions are not visible across blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because expressions are not visible across blocks

This is not entirely true, e.g., f(cond ? *p : *q); Here, *p and *q are not declarations but they are visible across blocks. That being said, there are a limited number of ways this can happen, so we still can make the optimization suggested here. It is just a bit more elaborate.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.

Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have one question though. I sort of expected this to require a liveness analysis. Is this something we will need down the line? Or is that not required for polonius?

@usx95
Copy link
Contributor Author

usx95 commented Jun 10, 2025

I have one question though. I sort of expected this to require a liveness analysis. Is this something we will need down the line? Or is that not required for polonius?

Good point. This is actually going to be a part of the error reporting. And yes, polonius does have a notion of live origins/loans.
I can think of primarily two different strategies for generating diagnostics.

  1. (preferred) Decide at the point where a loan expires: What are the origins that contain this expired loan and is still used later at some point. This would need a liveness analysis, i.e., it is an error to have a live origin which holds an expired loan.
  2. Decide at the point where an origin is used: It is an error, if there is a use of an origin which holds an expired loan. The current dataflow results are enough to power this but we would need to remember which expired loans are already reported so that we do not error on each use of an origin holding an expired loan.

I prefer approach 1 as this more directly works on the invalid memory as compared to origins carrying invalid loans to a potential use.
I have left this out from the current implementation but let me know if you would find it helpful to implement a warning (maybe return-stack-addr) in this patch.

@Xazax-hun
Copy link
Collaborator

implement a warning (maybe return-stack-addr) in this patch.

This patch is already big enough. But I think a small comment of how reporting will work down the line might be useful.

Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks very good. I would recommend you consider splitting this into multiple smaller PRs, so reviews can focus on a specific aspect. E.g. the FactGenerator and the LifetimeDataflow seems to deserve focused reviews.

struct LifetimeLattice {
/// The map from an origin to the set of loans it contains.
/// TODO(opt): To reduce the lattice size, propagate origins of declarations,
/// not expressions, because expressions are not visible across blocks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to take a look at how this is handled in the dataflow framework. Martin Braenne worked this out with care -- it's not trivial.

/// Orchestrates the analysis by iterating over the CFG using a worklist
/// algorithm. It computes a fixed point by propagating the LifetimeLattice
/// state through each block until the state no longer changes.
/// TODO: Maybe use the dataflow framework!
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in the current form, where you use CFGBlocks to drive the computation, using the framework would make sense. But, if you're attached to the current algorithm (comparing at the before state rather than the after state), you'll need to implement it as an alternative (or modify the current one), as we discussed in our last meeting.

Copy link
Contributor Author

@usx95 usx95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed few comments.

@usx95 usx95 changed the title Introduce intra-procedural lifetime analysis in Clang [LifetimeSafety] Introduce intra-procedural analysis in Clang Jul 6, 2025
@usx95 usx95 self-assigned this Jul 6, 2025
Copy link
Contributor

@vbvictor vbvictor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work! Interested to see this feature in clang.
Left a couple of nit comments on code style.

usx95 and others added 3 commits July 7, 2025 10:31
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
Co-authored-by: Baranov Victor <bar.victor.2002@gmail.com>
@usx95 usx95 requested review from Xazax-hun, ymand, jvoung and vbvictor July 7, 2025 08:45
@usx95 usx95 force-pushed the dangling-references-latest branch from 7c6ca5d to 634ba4b Compare July 7, 2025 09:09
Copy link
Collaborator

@Xazax-hun Xazax-hun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some discussions open on this PR e.g., whether access paths need kinds. But I don't think those are blockers, this is exploratory work and it is expected that some design decisions might be revisited along the way when we learn something new. I am OK with the current status of this PR.

Something for future PRs: I think it would be nice to have some sort of coverage statistics. While initially you do not aim to cover all of C++, it would be good to know what portion of nodes are missing from the fact generation when we run this on some real world code. That can give us some progress indicators.

Copy link

github-actions bot commented Jul 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@usx95 usx95 force-pushed the dangling-references-latest branch from bdec502 to 5f672e7 Compare July 7, 2025 11:10
Copy link
Contributor Author

@usx95 usx95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for future PRs: I think it would be nice to have some sort of coverage statistics. While initially you do not aim to cover all of C++, it would be good to know what portion of nodes are missing from the fact generation when we run this on some real world code. That can give us some progress indicators.

Good idea. I was thinking along similar lines and have some ideas for initial metrics:

  • Use an RAV to visit all expressions in a function and count number of expressions which should have an origin (i.e., it is of pointer type) but does not have an origin associated with the expression.
  • Count origins that are created but are never associated with a Loan; this would be a signal that we don't understand the pointer's source.

I will add these in a follow-up patch.

The assertion will be enabled once we have enough C++ coverage
@usx95
Copy link
Contributor Author

usx95 commented Jul 9, 2025

In 1-2 days, I will go ahead and merge this if there are no more comments from other reviewers. cc: @ymand

Copy link
Contributor

@jvoung jvoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nits, but LG!


OriginID get(const ValueDecl &D) {
auto It = DeclToOriginID.find(&D);
// TODO: This should be an assert(It != ExprToOriginID.end()). The current
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: DeclToOriginID.end()

@@ -0,0 +1,191 @@
// RUN: %clang_cc1 -mllvm -debug-only=LifetimeFacts,LifetimeDataflow -Wexperimental-lifetime-safety %s 2>&1 | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the LifetimeDataflow was moved to a different PR (147295) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:analysis clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants