From 8a9e249f746a5b15e56d3c2123fe3e678b8bc63c Mon Sep 17 00:00:00 2001 From: Keno Fischer Date: Tue, 16 Apr 2024 06:38:00 +0000 Subject: [PATCH 1/3] Add devdocs on UB This adds some basic devdocs for undefined behavior, since there keep being complaints that it's not written down anywhere. The list of UB is just what I remembered off the top of my head, it's probably incomplete, since we haven't historically been particularly careful about this. This way, at least when people add new sources of UB, they'll have a centralized place to write that down. --- doc/make.jl | 1 + doc/src/devdocs/ub.md | 50 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 51 insertions(+) create mode 100644 doc/src/devdocs/ub.md diff --git a/doc/make.jl b/doc/make.jl index 15205d6c2f89d..965fda559f61d 100644 --- a/doc/make.jl +++ b/doc/make.jl @@ -202,6 +202,7 @@ DevDocs = [ "devdocs/init.md", "devdocs/ast.md", "devdocs/types.md", + "devdocs/ub.md", "devdocs/object.md", "devdocs/eval.md", "devdocs/callconv.md", diff --git a/doc/src/devdocs/ub.md b/doc/src/devdocs/ub.md new file mode 100644 index 0000000000000..8c0fb2194be09 --- /dev/null +++ b/doc/src/devdocs/ub.md @@ -0,0 +1,50 @@ +# Undefined Behavior + +In programming language design, it is prudent to separate the concepts of a language's semantics, and the behavior of a language's implementation. A language's semantics define the allowable set of *observable* behaviors (including defining what it means to be observable). A correct implementation will ensure is that the actual behavior of executing a program has observable behavior that is allowable according to the language semantics. This is often referred to as the **as-if** rule in other languages. + +To illustrate the distinction consider a statement like `print(Ref(1).x)`. The language semantics may specfiy that the observable behavior of this statement is that the value `1` is printined to stdout. However, whether or not the object `Ref` is actually allocated may not be semantically observable (even though it may be implicitly observable by looking at memory use, number of allocations, generated code, etc.). Because of this, the implementation is allowed to replace this statement with `print(1)`, which preserves all semantically observable behaviors. + +Additionally, the allowable behaviors for a given program are not unique. For example, the `@fastmath` macro gives wide semantic latitude for floating point math rearrangements and two subsequent invocation of the same operation, even on the same values is not semantically required to produce the same answer. The situation is similar for asynchronous operations, random number generation, et.c + +*Undefined Behavior* (UB) occurs when a julia program semantically perform an operation that is assumed to never happen. In such a situation, the language semantics do not constrain the behavior of the implementation, so any behavior of the program is allowable, including crashes, memory corruption, incorrect behavior, etc. As such, it is very important to avoid writing programs that semantically execute undefined behavior. + +Note that this explicitly applies to *semantically executed* undefined behavior. While julia's compiler is allowed to and does aggressively perform speculative execution of pure functions. Since the execution point is not semantically observable (though again indirectly observable through execution performance), this is allowable by the as-if rule. As such, speculative execution is inhibited unless the code in question is proven to be +free of undefined behavior. + +The presence of undefined behavior is modeled as part of julia's effect system using the `:noub` effect bit. See the documentation for `@assume_effects` for more information on querying the compiler's effect model or overriding it for specific situations (e.g. where a dynamic check precludes potential UB from every actually being reached). + +## List of sources of undefined behavior + +The following is a list of sources of undefined behavior, +though it should currently not be considered exhaustive: + +- Replacement of `const` values. Note that in interactive mode the compiler will issue a warning for this and some care is taken to mimize impact as a user convenience, but the behavior is not defined. +- Various modification of global state during precompile. Where possible, this is detected and yields and error, but detection is incomplete. +- Incorrect implementation of a `Core.OptimizedGenerics` interface [1] +- Any invocation of undefined behavior in FFI code (e.g. `ccall`, `llvmcall`) according to the semantics of the respective language +- Incorrect signature types in `ccall` or `cfunction`, even if those signatures happen to yield the correct ABI on a particular platform +- Incorrect use of annotations like `@inbounds`, `@assume_effects` in violation of their requirements [1] +- Retention of pointers to GC-tracked objects outside of a `@GC.preserve` region +- Memory modification of GC-tracked objects without appropriate write barriers from outside of julia (e.g. in native calls, debuggers, etc.) +- Violations of the memory model using `unsafe` operations (e.g. `unsafe_load` of an invalid pointer, pointer provenance violations, etc) +- Violations of TBAA guarantees (e.g. using `unsafe_wrap`) +- Mutation of data promised to be immutable (e.g. in `Base.StringVector`) +- Data races +- Modification of julia-internal mutable state (e.g. task schedulers, data types, etc.) + +[1] Incorrect use here may be UB, even if not semantically executed, please see the specific documentation of the feature. + +## Implementation-defined behavior +Some behavior is technically forbidden by the semantics of the language, but required in certain parts of the implementation and thus allowed as long as implementation-defined constraints are obeyed. Nevertheless, these constructs should be avoided when possible, as the implementation-defined constraints may not be stable across julia versions. + +- Construction of objects using `eval(Expr(:new))` rather than their constructors + +## Special cases explicitly NOT undefined behavior + +- As of Julia 1.11, access to undefined bits types is no longer undefined behavior. It is still allowed to return an arbitrary value of the bits type, but the value returned must be the same for every access and use thereof is not undefined behavior. In LLVM terminology, the value is `freeze undef`, not `undef`. + +- As of Julia 1.12, loops that do not make forward progress are no longer considered undefined behavior. + +- Common sources of undefined behavior in other system such as signed integer overflow are not undefined behavior. + +- Revival of objects inside finalizers is permitted though discouraged. From 81e6bfd156abbf5d88d4d08c8efb6605ce70dbfa Mon Sep 17 00:00:00 2001 From: Keno Fischer Date: Tue, 16 Apr 2024 04:46:31 -0400 Subject: [PATCH 2/3] Apply suggestions from code review Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com> --- doc/src/devdocs/ub.md | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/doc/src/devdocs/ub.md b/doc/src/devdocs/ub.md index 8c0fb2194be09..537d967c19c1f 100644 --- a/doc/src/devdocs/ub.md +++ b/doc/src/devdocs/ub.md @@ -1,10 +1,10 @@ # Undefined Behavior -In programming language design, it is prudent to separate the concepts of a language's semantics, and the behavior of a language's implementation. A language's semantics define the allowable set of *observable* behaviors (including defining what it means to be observable). A correct implementation will ensure is that the actual behavior of executing a program has observable behavior that is allowable according to the language semantics. This is often referred to as the **as-if** rule in other languages. +In programming language design, it is prudent to separate the concepts of a language's semantics, and the behavior of a language's implementation. A language's semantics define the allowable set of *observable* behaviors (including defining what it means to be observable). A correct implementation will ensure that the actual behavior of executing a program has observable behavior that is allowable according to the language semantics. This is often referred to as the **as-if** rule in other languages. -To illustrate the distinction consider a statement like `print(Ref(1).x)`. The language semantics may specfiy that the observable behavior of this statement is that the value `1` is printined to stdout. However, whether or not the object `Ref` is actually allocated may not be semantically observable (even though it may be implicitly observable by looking at memory use, number of allocations, generated code, etc.). Because of this, the implementation is allowed to replace this statement with `print(1)`, which preserves all semantically observable behaviors. +To illustrate the distinction, consider a statement like `print(Ref(1).x)`. The language semantics may specify that the observable behavior of this statement is that the value `1` is printed to `stdout`. However, whether or not the object `Ref` is actually allocated may not be semantically observable (even though it may be implicitly observable by looking at memory use, number of allocations, generated code, etc.). Because of this, the implementation is allowed to replace this statement with `print(1)`, which preserves all semantically observable behaviors. -Additionally, the allowable behaviors for a given program are not unique. For example, the `@fastmath` macro gives wide semantic latitude for floating point math rearrangements and two subsequent invocation of the same operation, even on the same values is not semantically required to produce the same answer. The situation is similar for asynchronous operations, random number generation, et.c +Additionally, the allowable behaviors for a given program are not unique. For example, the `@fastmath` macro gives wide semantic latitude for floating point math rearrangements and two subsequent invocation of the same operation inside of that macro, even on the same values, is not semantically required to produce the same answer. The situation is similar for asynchronous operations, random number generation, etc. *Undefined Behavior* (UB) occurs when a julia program semantically perform an operation that is assumed to never happen. In such a situation, the language semantics do not constrain the behavior of the implementation, so any behavior of the program is allowable, including crashes, memory corruption, incorrect behavior, etc. As such, it is very important to avoid writing programs that semantically execute undefined behavior. @@ -19,7 +19,7 @@ The following is a list of sources of undefined behavior, though it should currently not be considered exhaustive: - Replacement of `const` values. Note that in interactive mode the compiler will issue a warning for this and some care is taken to mimize impact as a user convenience, but the behavior is not defined. -- Various modification of global state during precompile. Where possible, this is detected and yields and error, but detection is incomplete. +- Various modification of global state during precompile. Where possible, this is detected and yields an error, but detection is incomplete. - Incorrect implementation of a `Core.OptimizedGenerics` interface [1] - Any invocation of undefined behavior in FFI code (e.g. `ccall`, `llvmcall`) according to the semantics of the respective language - Incorrect signature types in `ccall` or `cfunction`, even if those signatures happen to yield the correct ABI on a particular platform @@ -31,6 +31,8 @@ though it should currently not be considered exhaustive: - Mutation of data promised to be immutable (e.g. in `Base.StringVector`) - Data races - Modification of julia-internal mutable state (e.g. task schedulers, data types, etc.) +- A value other than `false` (`reinterpret(UInt8, b) == 0x00`) or `true` (`reinterpret(UInt8, b) == 0x01`) for a `Bool` `b`. +- Invoking undefined behavior via compiler intrinsics. [1] Incorrect use here may be UB, even if not semantically executed, please see the specific documentation of the feature. @@ -45,6 +47,6 @@ Some behavior is technically forbidden by the semantics of the language, but req - As of Julia 1.12, loops that do not make forward progress are no longer considered undefined behavior. -- Common sources of undefined behavior in other system such as signed integer overflow are not undefined behavior. +- Signed integer overflow is not undefined behavior. See also the manual section on [Overflow Behavior](https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#Overflow-behavior). - Revival of objects inside finalizers is permitted though discouraged. From c2237a5fef0cb0d0bc3f89c3e9a491e9c899ced0 Mon Sep 17 00:00:00 2001 From: Max Horn Date: Sat, 25 May 2024 01:31:55 +0200 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Shuhei Kadowaki <40514306+aviatesk@users.noreply.github.com> Co-authored-by: Sukera <11753998+Seelengrab@users.noreply.github.com> --- doc/src/devdocs/ub.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/src/devdocs/ub.md b/doc/src/devdocs/ub.md index 537d967c19c1f..cb1202572970a 100644 --- a/doc/src/devdocs/ub.md +++ b/doc/src/devdocs/ub.md @@ -4,29 +4,29 @@ In programming language design, it is prudent to separate the concepts of a lang To illustrate the distinction, consider a statement like `print(Ref(1).x)`. The language semantics may specify that the observable behavior of this statement is that the value `1` is printed to `stdout`. However, whether or not the object `Ref` is actually allocated may not be semantically observable (even though it may be implicitly observable by looking at memory use, number of allocations, generated code, etc.). Because of this, the implementation is allowed to replace this statement with `print(1)`, which preserves all semantically observable behaviors. -Additionally, the allowable behaviors for a given program are not unique. For example, the `@fastmath` macro gives wide semantic latitude for floating point math rearrangements and two subsequent invocation of the same operation inside of that macro, even on the same values, is not semantically required to produce the same answer. The situation is similar for asynchronous operations, random number generation, etc. +Additionally, the allowable behaviors for a given program are not unique. For example, the `@fastmath` macro gives wide semantic latitude for floating point math rearrangements and two subsequent invocations of the same operation inside of that macro, even on the same values, are not semantically required to produce the same answer. The situation is similar for asynchronous operations, random number generation, etc. -*Undefined Behavior* (UB) occurs when a julia program semantically perform an operation that is assumed to never happen. In such a situation, the language semantics do not constrain the behavior of the implementation, so any behavior of the program is allowable, including crashes, memory corruption, incorrect behavior, etc. As such, it is very important to avoid writing programs that semantically execute undefined behavior. +*Undefined Behavior* (UB) occurs when a julia program semantically performs an operation that is assumed to never happen. In such a situation, the language semantics do not constrain the behavior of the implementation, so any behavior of the program is allowable, including crashes, memory corruption, incorrect behavior, etc. As such, it is very important to avoid writing programs that semantically execute undefined behavior. Note that this explicitly applies to *semantically executed* undefined behavior. While julia's compiler is allowed to and does aggressively perform speculative execution of pure functions. Since the execution point is not semantically observable (though again indirectly observable through execution performance), this is allowable by the as-if rule. As such, speculative execution is inhibited unless the code in question is proven to be free of undefined behavior. -The presence of undefined behavior is modeled as part of julia's effect system using the `:noub` effect bit. See the documentation for `@assume_effects` for more information on querying the compiler's effect model or overriding it for specific situations (e.g. where a dynamic check precludes potential UB from every actually being reached). +The presence of undefined behavior is modeled as part of julia's effect system using the `:noub` effect bit. See the documentation for [`Base.@assume_effects`](@ref) for more information on querying the compiler's effect model or overriding it for specific situations (e.g. where a dynamic check precludes potential UB from ever actually being reached). ## List of sources of undefined behavior The following is a list of sources of undefined behavior, though it should currently not be considered exhaustive: -- Replacement of `const` values. Note that in interactive mode the compiler will issue a warning for this and some care is taken to mimize impact as a user convenience, but the behavior is not defined. -- Various modification of global state during precompile. Where possible, this is detected and yields an error, but detection is incomplete. +- Replacement of `const` values. Note that in interactive mode the compiler will issue a warning for this and some care is taken to minimize the impact as a user convenience, but the behavior is not defined. +- Various modifications of global state during precompile. Where possible, this is detected and yields an error, but detection is incomplete. - Incorrect implementation of a `Core.OptimizedGenerics` interface [1] - Any invocation of undefined behavior in FFI code (e.g. `ccall`, `llvmcall`) according to the semantics of the respective language - Incorrect signature types in `ccall` or `cfunction`, even if those signatures happen to yield the correct ABI on a particular platform -- Incorrect use of annotations like `@inbounds`, `@assume_effects` in violation of their requirements [1] +- Incorrect use of annotations like `@inbounds`, `@assume_effects` in violation of their requirements [^1] - Retention of pointers to GC-tracked objects outside of a `@GC.preserve` region - Memory modification of GC-tracked objects without appropriate write barriers from outside of julia (e.g. in native calls, debuggers, etc.) -- Violations of the memory model using `unsafe` operations (e.g. `unsafe_load` of an invalid pointer, pointer provenance violations, etc) +- Violations of the memory model using `unsafe_` operations (e.g. `unsafe_load` of an invalid pointer, pointer provenance violations, etc) - Violations of TBAA guarantees (e.g. using `unsafe_wrap`) - Mutation of data promised to be immutable (e.g. in `Base.StringVector`) - Data races @@ -34,7 +34,7 @@ though it should currently not be considered exhaustive: - A value other than `false` (`reinterpret(UInt8, b) == 0x00`) or `true` (`reinterpret(UInt8, b) == 0x01`) for a `Bool` `b`. - Invoking undefined behavior via compiler intrinsics. -[1] Incorrect use here may be UB, even if not semantically executed, please see the specific documentation of the feature. +[^1] Incorrect use here may be UB, even if not semantically executed, please see the specific documentation of the feature. ## Implementation-defined behavior Some behavior is technically forbidden by the semantics of the language, but required in certain parts of the implementation and thus allowed as long as implementation-defined constraints are obeyed. Nevertheless, these constructs should be avoided when possible, as the implementation-defined constraints may not be stable across julia versions.