Skip to content

[Clang] Diagnose forming references to nullptr #143667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cor3ntin
Copy link
Contributor

Per [decl.ref],

Because a null pointer value or a pointer past the end of an object
does not point to an object, a reference in a well-defined program cannot refer to such things.

Note this does not fixes the new bytecode interpreter.

Fixes #48665

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:bytecode Issues for the clang bytecode constexpr interpreter labels Jun 11, 2025
@cor3ntin
Copy link
Contributor Author

@tbaederr

@llvmbot
Copy link
Member

llvmbot commented Jun 11, 2025

@llvm/pr-subscribers-libcxx

@llvm/pr-subscribers-clang

Author: Corentin Jabot (cor3ntin)

Changes

Per [decl.ref],

> Because a null pointer value or a pointer past the end of an object
does not point to an object, a reference in a well-defined program cannot refer to such things.

Note this does not fixes the new bytecode interpreter.

Fixes #48665


Full diff: https://github.com/llvm/llvm-project/pull/143667.diff

4 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticASTKinds.td (+5-4)
  • (modified) clang/lib/AST/ByteCode/State.h (+1)
  • (modified) clang/lib/AST/ExprConstant.cpp (+20-6)
  • (modified) clang/test/SemaCXX/constant-expression-cxx14.cpp (+22-1)
diff --git a/clang/include/clang/Basic/DiagnosticASTKinds.td b/clang/include/clang/Basic/DiagnosticASTKinds.td
index d2cd86d05d55a..41ecda1cad960 100644
--- a/clang/include/clang/Basic/DiagnosticASTKinds.td
+++ b/clang/include/clang/Basic/DiagnosticASTKinds.td
@@ -174,10 +174,11 @@ def note_constexpr_heap_alloc_limit_exceeded : Note<
 def note_constexpr_this : Note<
   "%select{|implicit }0use of 'this' pointer is only allowed within the "
   "evaluation of a call to a 'constexpr' member function">;
-def access_kind : TextSubstitution<
-  "%select{read of|read of|assignment to|increment of|decrement of|"
-  "member call on|dynamic_cast of|typeid applied to|construction of|"
-  "destruction of|read of}0">;
+def access_kind
+    : TextSubstitution<
+          "%select{read of|read of|assignment to|increment of|decrement of|"
+          "member call on|dynamic_cast of|typeid applied to|construction of|"
+          "destruction of|read of|read of}0">;
 def access_kind_subobject : TextSubstitution<
   "%select{read of|read of|assignment to|increment of|decrement of|"
   "member call on|dynamic_cast of|typeid applied to|"
diff --git a/clang/lib/AST/ByteCode/State.h b/clang/lib/AST/ByteCode/State.h
index 9a81fa6b7d220..649b58a4dd164 100644
--- a/clang/lib/AST/ByteCode/State.h
+++ b/clang/lib/AST/ByteCode/State.h
@@ -35,6 +35,7 @@ enum AccessKinds {
   AK_Construct,
   AK_Destroy,
   AK_IsWithinLifetime,
+  AK_CheckReferenceInitialization
 };
 
 /// The order of this enum is important for diagnostics.
diff --git a/clang/lib/AST/ExprConstant.cpp b/clang/lib/AST/ExprConstant.cpp
index fa4e10e84de05..c02bf973c2552 100644
--- a/clang/lib/AST/ExprConstant.cpp
+++ b/clang/lib/AST/ExprConstant.cpp
@@ -1529,7 +1529,7 @@ CallStackFrame::~CallStackFrame() {
 
 static bool isRead(AccessKinds AK) {
   return AK == AK_Read || AK == AK_ReadObjectRepresentation ||
-         AK == AK_IsWithinLifetime;
+         AK == AK_IsWithinLifetime || AK == AK_CheckReferenceInitialization;
 }
 
 static bool isModification(AccessKinds AK) {
@@ -1540,6 +1540,7 @@ static bool isModification(AccessKinds AK) {
   case AK_DynamicCast:
   case AK_TypeId:
   case AK_IsWithinLifetime:
+  case AK_CheckReferenceInitialization:
     return false;
   case AK_Assign:
   case AK_Increment:
@@ -1558,7 +1559,7 @@ static bool isAnyAccess(AccessKinds AK) {
 /// Is this an access per the C++ definition?
 static bool isFormalAccess(AccessKinds AK) {
   return isAnyAccess(AK) && AK != AK_Construct && AK != AK_Destroy &&
-         AK != AK_IsWithinLifetime;
+         AK != AK_IsWithinLifetime && AK != AK_CheckReferenceInitialization;
 }
 
 /// Is this kind of axcess valid on an indeterminate object value?
@@ -1571,6 +1572,7 @@ static bool isValidIndeterminateAccess(AccessKinds AK) {
     return false;
 
   case AK_IsWithinLifetime:
+  case AK_CheckReferenceInitialization:
   case AK_ReadObjectRepresentation:
   case AK_Assign:
   case AK_Construct:
@@ -4426,7 +4428,7 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,
 
     // Unless we're looking at a local variable or argument in a constexpr call,
     // the variable we're reading must be const.
-    if (!Frame) {
+    if (!Frame && AK != clang::AK_CheckReferenceInitialization) {
       if (IsAccess && isa<ParmVarDecl>(VD)) {
         // Access of a parameter that's not associated with a frame isn't going
         // to work out, but we can leave it to evaluateVarDeclInit to provide a
@@ -4503,7 +4505,7 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,
   } else {
     const Expr *Base = LVal.Base.dyn_cast<const Expr*>();
 
-    if (!Frame) {
+    if (!Frame && AK != clang::AK_CheckReferenceInitialization) {
       if (const MaterializeTemporaryExpr *MTE =
               dyn_cast_or_null<MaterializeTemporaryExpr>(Base)) {
         assert(MTE->getStorageDuration() == SD_Static &&
@@ -4557,7 +4559,7 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,
         NoteLValueLocation(Info, LVal.Base);
         return CompleteObject();
       }
-    } else {
+    } else if (AK != clang::AK_CheckReferenceInitialization) {
       BaseVal = Frame->getTemporary(Base, LVal.Base.getVersion());
       assert(BaseVal && "missing value for temporary");
     }
@@ -5243,7 +5245,19 @@ static bool EvaluateVarDecl(EvalInfo &Info, const VarDecl *VD) {
   if (InitE->isValueDependent())
     return false;
 
-  if (!EvaluateInPlace(Val, Info, Result, InitE)) {
+  if (VD->getType()->isReferenceType() && InitE->isGLValue()) {
+    if (!EvaluateLValue(InitE, Result, Info))
+      return false;
+    CompleteObject Obj = findCompleteObject(
+        Info, InitE, AK_CheckReferenceInitialization, Result, InitE->getType());
+    if (Result.Designator.isOnePastTheEnd()) {
+      Info.FFDiag(InitE, diag::note_constexpr_access_past_end)
+          << AK_CheckReferenceInitialization;
+      return false;
+    }
+    Result.moveInto(Val);
+    return !!Obj;
+  } else if (!EvaluateInPlace(Val, Info, Result, InitE)) {
     // Wipe out any partially-computed value, to allow tracking that this
     // evaluation failed.
     Val = APValue();
diff --git a/clang/test/SemaCXX/constant-expression-cxx14.cpp b/clang/test/SemaCXX/constant-expression-cxx14.cpp
index e16a69df3830d..d8ebe92131ddc 100644
--- a/clang/test/SemaCXX/constant-expression-cxx14.cpp
+++ b/clang/test/SemaCXX/constant-expression-cxx14.cpp
@@ -250,7 +250,7 @@ namespace subobject {
 namespace lifetime {
   constexpr int &&id(int &&n) { return static_cast<int&&>(n); }
   constexpr int &&dead() { return id(0); } // expected-note {{temporary created here}}
-  constexpr int bad() { int &&n = dead(); n = 1; return n; } // expected-note {{assignment to temporary whose lifetime has ended}}
+  constexpr int bad() { int &&n = dead(); n = 1; return n; } // expected-note {{read of temporary whose lifetime has ended}}
   static_assert(bad(), ""); // expected-error {{constant expression}} expected-note {{in call}}
 }
 
@@ -1321,3 +1321,24 @@ constexpr bool check = different_in_loop();
   // expected-error@-1 {{}} expected-note@-1 {{in call}}
 
 }
+
+namespace GH48665 {
+constexpr bool foo(int *i) {
+    int &j = *i;
+    // expected-note@-1 {{read of dereferenced null pointer is not allowed in a constant expression}}
+    return true;
+}
+
+static_assert(foo(nullptr), ""); // expected-note {{in call to 'foo(nullptr)'}}
+// expected-error@-1 {{static assertion expression is not an integral constant expression}}
+
+int arr[3]; // expected-note 2{{declared here}}
+constexpr bool f() { // cxx14_20-error {{constexpr function never produces a constant expression}}
+  int &r  = arr[3]; // cxx14_20-note {{read of dereferenced one-past-the-end pointer is not allowed in a constant expression}} \
+                    // expected-warning {{array index 3 is past the end of the array}}\
+                    // expected-note {{initializer of 'arr' is unknown}}
+  return true;
+}
+static_assert(f(), ""); // expected-note {{in call to 'f()'}}
+// expected-error@-1 {{static assertion expression is not an integral constant expression}}
+}

: TextSubstitution<
"%select{read of|read of|assignment to|increment of|decrement of|"
"member call on|dynamic_cast of|typeid applied to|construction of|"
"destruction of|read of|read of}0">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I like "read of". Maybe we should explicitly mention reference binding?

@@ -5243,7 +5245,19 @@ static bool EvaluateVarDecl(EvalInfo &Info, const VarDecl *VD) {
if (InitE->isValueDependent())
return false;

if (!EvaluateInPlace(Val, Info, Result, InitE)) {
if (VD->getType()->isReferenceType() && InitE->isGLValue()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the isGLValue() check here redundant? If it wasn't a GLValue, we should have bailed out earlier.

Info, InitE, AK_CheckReferenceInitialization, Result, InitE->getType());
if (Result.Designator.isOnePastTheEnd()) {
Info.FFDiag(InitE, diag::note_constexpr_access_past_end)
<< AK_CheckReferenceInitialization;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only place we're using AK_CheckReferenceInitialization? Should we be using it for other places we do reference binding? (aggregate initialization, constructors, calls, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

@@ -4503,7 +4505,7 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,
} else {
const Expr *Base = LVal.Base.dyn_cast<const Expr*>();

if (!Frame) {
if (!Frame && AK != clang::AK_CheckReferenceInitialization) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check looks a little weird... what's it doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is to bypass the check of liveness for global variables - as taking a reference to a non-constexpr global is fine until we read from it

Copy link
Collaborator

@shafik shafik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same questions as @efriedma-quic

@@ -35,6 +35,7 @@ enum AccessKinds {
AK_Construct,
AK_Destroy,
AK_IsWithinLifetime,
AK_CheckReferenceInitialization
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AK_CheckReferenceInitialization
AK_ReferenceInitialization

I could also go for AK_ReadForReferenceInitialization. The "Check" is what the code you are adding is doing for a read for reference initialization or at least that is how I see it.

@@ -1321,3 +1321,24 @@ constexpr bool check = different_in_loop();
// expected-error@-1 {{}} expected-note@-1 {{in call}}

}

namespace GH48665 {
constexpr bool foo(int *i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other tests: https://godbolt.org/z/o66Gr3fc8

struct A {
   int &r;
};

struct B {
  constexpr B(int *p) : r{*p} {
  }  
  int &r;
};

constexpr bool f(int *i) {
    int *p = new int;
    delete p;
    int &r = *p;

    
    A a{*i};
    B b(i);

    return true;
}

static_assert(f(nullptr), "");

Copy link
Collaborator

@erichkeane erichkeane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments. Else, I'm not really the best one to review th is part of the code, but everything looks appropriate.

@@ -4418,6 +4420,9 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,
return CompleteObject();
}

// if(AK == clang::AK_ReferenceInitialization)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is commented out?

@@ -4503,7 +4510,7 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,
} else {
const Expr *Base = LVal.Base.dyn_cast<const Expr*>();

if (!Frame) {
if (AK != clang::AK_ReferenceInitialization && !Frame) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move hte new check here to 4510?

@@ -4426,7 +4431,7 @@ static CompleteObject findCompleteObject(EvalInfo &Info, const Expr *E,

// Unless we're looking at a local variable or argument in a constexpr call,
// the variable we're reading must be const.
if (!Frame) {
if (AK != clang::AK_ReferenceInitialization && !Frame) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why you are pairing these checks? I am missing the logic, maybe a comment should be added?

return false;
}
Result.moveInto(Val);
return !!Obj;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this line could use a comment explaining the semantics. I really like that you split this out into a function, it makes it much more readable. OTOH you could just comment the whole function on top explain the semantics of the return value.

@@ -1540,6 +1540,7 @@ static bool isModification(AccessKinds AK) {
case AK_DynamicCast:
case AK_TypeId:
case AK_IsWithinLifetime:
case AK_ReferenceInitialization:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is initialization not considered a modification but construction is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, AK_ReferenceInitialization mostly checks the initialization is valid, it doesn't modify any object

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth leaving a comment there for the next person to come along and wonder about that.

@@ -5243,7 +5281,9 @@ static bool EvaluateVarDecl(EvalInfo &Info, const VarDecl *VD) {
if (InitE->isValueDependent())
return false;

if (!EvaluateInPlace(Val, Info, Result, InitE)) {
if (VD->getType()->isReferenceType()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all the rules are the same for lvalue and rvalue references?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although I did interpret your question as a veiled request for more tests :):)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was trying to suss out whether I should ask for more tests or not. :-D

@cor3ntin cor3ntin requested a review from a team as a code owner June 25, 2025 12:01
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Jun 25, 2025
@frederick-vs-ja
Copy link
Contributor

Per CWG2823, would it be more meaningful to diagnose dereferencing null pointers?

@cor3ntin cor3ntin force-pushed the corentin/gh48665 branch from 2f9a122 to b54f67a Compare July 2, 2025 13:56
@cor3ntin cor3ntin requested a review from Endilll as a code owner July 2, 2025 13:56
@cor3ntin
Copy link
Contributor Author

cor3ntin commented Jul 2, 2025

Per CWG2823, would it be more meaningful to diagnose dereferencing null pointers?

Yes, done!

Copy link
Contributor

@Endilll Endilll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to CWG1458 test are fine, because it's still unspecified behavior, so we're conforming either way.
Kudos for getting the expected directives (almost) right.

@@ -107,6 +107,8 @@ void f() {
constexpr int p = &*a;
// since-cxx11-error@-1 {{cannot initialize a variable of type 'const int' with an rvalue of type 'A *'}}
constexpr A *p2 = &*a;
// since-cxx11-error@-1 {{constexpr variable 'p2' must be initialized by a constant expression}} \
// since-cxx11-note@-1 {{read of dereferenced null pointer is not allowed in a constant expression}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// since-cxx11-note@-1 {{read of dereferenced null pointer is not allowed in a constant expression}}
// since-cxx11-note@-2 {{read of dereferenced null pointer is not allowed in a constant expression}}

As a consequence, you need to remove the backslash on the previous line.

constexpr B &rb = (B&)*na; // both-error {{constant expression}} \
// both-note {{cannot access derived class of null pointer}}
// ref-note {{read of dereferenced null pointer}} \
// expected-note {{cannot access derived class of null pointer}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the new diagnostics seem like regressions.

@cor3ntin cor3ntin requested a review from AaronBallman July 7, 2025 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:bytecode Issues for the clang bytecode constexpr interpreter clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

forming reference to nullptr is not rejected in constexpr
9 participants