Skip to content

Conversation

byroot
Copy link

@byroot byroot commented Mar 10, 2025

Found the issue. There are other bootstrap tests still failing though.

@byroot
Copy link
Author

byroot commented Mar 10, 2025

The other issue was the cache clearing on const_set. Now btests pass.

@byroot
Copy link
Author

byroot commented Mar 10, 2025

I think all failures left are from YJIT/RJIT. There's a bit of busy work to do to update them.

byroot pushed a commit that referenced this pull request May 6, 2025
…uby#13231)

This change addresses the following ASAN error:

```
==36597==ERROR: AddressSanitizer: heap-use-after-free on address 0x512000396ba8 at pc 0x7fcad5cbad9f bp 0x7fff19739af0 sp 0x7fff19739ae8
  WRITE of size 8 at 0x512000396ba8 thread T0
  [643/756] 36600=optparse/test_summary
      #0 0x7fcad5cbad9e in free_fast_fallback_getaddrinfo_entry /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/raddrinfo.c:3046:22
      #1 0x7fcad5c9fb48 in fast_fallback_inetsock_cleanup /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1179:17
      ruby#2 0x7fcadf3b611a in rb_ensure /home/runner/work/ruby-dev-builder/ruby-dev-builder/eval.c:1081:5
      ruby#3 0x7fcad5c9b44b in rsock_init_inetsock /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/ipsocket.c:1289:20
      ruby#4 0x7fcad5ca22b8 in tcp_init /home/runner/work/ruby-dev-builder/ruby-dev-builder/ext/socket/tcpsocket.c:76:12
      ruby#5 0x7fcadf83ba70 in vm_call0_cfunc_with_frame /home/runner/work/ruby-dev-builder/ruby-dev-builder/./vm_eval.c:164:15
...
```

A `struct fast_fallback_getaddrinfo_shared` is shared between the main thread and two child threads.
This struct contains an array of `fast_fallback_getaddrinfo_entry`.

`fast_fallback_getaddrinfo_entry` and `fast_fallback_getaddrinfo_shared` were freed separately, and if `fast_fallback_getaddrinfo_shared` was freed first and then an attempt was made to free a `fast_fallback_getaddrinfo_entry`, a `heap-use-after-free` could occur.

This change avoids that possibility by separating the deallocation of the addrinfo memory held by `fast_fallback_getaddrinfo_entry` from the access and lifecycle of the `fast_fallback_getaddrinfo_entry` itself.
etiennebarrie pushed a commit that referenced this pull request May 23, 2025
[Bug #18119]

When we create classes, it pushes the class to the subclass list of the
superclass. This access needs to be synchronized because multiple Ractors
may be creating classes with the same superclass, which would cause race
conditions and cause the linked list to be corrupted.

For example, we can reproduce with this script crashing:

    workers = (0...8).map do
      Ractor.new do
        loop do
          100.times.map { Class.new }
          Ractor.yield nil
        end
      end
    end

    100.times { Ractor.select(*workers) }

With ASAN enabled, we can see that there are use-after-free errors:

    ==176013==ERROR: AddressSanitizer: heap-use-after-free on address 0x5030000974f0 at pc 0x62f9e56f892d bp 0x7a503f1ffd90 sp 0x7a503f1ffd88
    WRITE of size 8 at 0x5030000974f0 thread T4
        #0 0x62f9e56f892c in rb_class_remove_from_super_subclasses class.c:149:24
        #1 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
        ruby#2 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
        ruby#3 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
        ruby#4 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
        ruby#5 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
        ruby#6 0x62f9e58fac93 in gc_start gc/default/default.c:6402:13
        ruby#7 0x62f9e58e8b69 in heap_prepare gc/default/default.c:2032:13
        ruby#8 0x62f9e58e8b69 in heap_next_free_page gc/default/default.c:2255:9
        ruby#9 0x62f9e58e8b69 in newobj_cache_miss gc/default/default.c:2362:38
    ...
    0x5030000974f0 is located 16 bytes inside of 24-byte region [0x5030000974e0,0x5030000974f8)
    freed by thread T4 here:
        #0 0x62f9e562f28a in free (miniruby+0x1fd28a) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
        #1 0x62f9e58ca2ab in rb_gc_impl_free gc/default/default.c:8102:9
        ruby#2 0x62f9e58ca2ab in ruby_sized_xfree gc.c:5029:13
        ruby#3 0x62f9e58ca2ab in ruby_xfree gc.c:5040:5
        ruby#4 0x62f9e56f88e6 in rb_class_remove_from_super_subclasses class.c:152:9
        ruby#5 0x62f9e58c9dd2 in rb_gc_obj_free gc.c:1262:9
        ruby#6 0x62f9e58f6e19 in gc_sweep_plane gc/default/default.c:3450:21
        ruby#7 0x62f9e58f686a in gc_sweep_page gc/default/default.c:3535:13
        ruby#8 0x62f9e58f12b4 in gc_sweep_step gc/default/default.c:3810:9
        ruby#9 0x62f9e58ed2a7 in gc_sweep gc/default/default.c:4058:13
    ...
    previously allocated by thread T5 here:
        #0 0x62f9e562f70d in calloc (miniruby+0x1fd70d) (BuildId: 5ad6d9e7cec8318df6726ea5ce34d3c76d0d0233)
        #1 0x62f9e58c8e1a in calloc1 gc/default/default.c:1472:12
        ruby#2 0x62f9e58c8e1a in rb_gc_impl_calloc gc/default/default.c:8138:5
        ruby#3 0x62f9e58c8e1a in ruby_xcalloc_body gc.c:4964:12
        ruby#4 0x62f9e58c8e1a in ruby_xcalloc gc.c:4958:34
        ruby#5 0x62f9e56f906e in push_subclass_entry_to_list class.c:88:13
        ruby#6 0x62f9e56f906e in rb_class_subclass_add class.c:111:38
        ruby#7 0x62f9e56f906e in RCLASS_SET_SUPER internal/class.h:257:9
        ruby#8 0x62f9e56fca7a in make_metaclass class.c:786:5
        ruby#9 0x62f9e59db982 in rb_class_initialize object.c:2101:5
etiennebarrie pushed a commit that referenced this pull request Jun 10, 2025
In commit d42b9ff, an optimization was introduced that can speed up
Regexp#match by 15% when it matches with strings of different encodings.
This optimization, however, does not work across ractors. To fix this,
we only use the optimization if no ractors have been started. In the
future, we could use atomics for the reference counting if we find it's
needed and if it's more performant.

The backtrace of the misbehaving native thread:

```
  * frame #0: 0x0000000189c94388 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x0000000189ccd88c libsystem_pthread.dylib`pthread_kill + 296
    frame ruby#2: 0x0000000189bd6c60 libsystem_c.dylib`abort + 124
    frame ruby#3: 0x0000000189adb174 libsystem_malloc.dylib`malloc_vreport + 892
    frame ruby#4: 0x0000000189adec90 libsystem_malloc.dylib`malloc_report + 64
    frame ruby#5: 0x0000000189ae321c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32
    frame ruby#6: 0x00000001001c3be4 ruby`onig_free_body(reg=0x000000012d84b660) at regcomp.c:5663:5
    frame ruby#7: 0x00000001001ba828 ruby`rb_reg_prepare_re(re=4748462304, str=4748451168) at re.c:1680:13
    frame ruby#8: 0x00000001001bac58 ruby`rb_reg_onig_match(re=4748462304, str=4748451168, match=(ruby`reg_onig_search [inlined] rbimpl_RB_TYPE_P_fastpath at value_type.h:349:14
ruby`reg_onig_search [inlined] rbimpl_rstring_getmem at rstring.h:391:5
ruby`reg_onig_search at re.c:1781:5), args=0x000000013824b168, regs=0x000000013824b150) at re.c:1708:20
    frame ruby#9: 0x00000001001baefc ruby`rb_reg_search_set_match(re=4748462304, str=4748451168, pos=<unavailable>, reverse=0, set_backref_str=1, set_match=0x0000000000000000) at re.c:1809:27
    frame ruby#10: 0x00000001001bae80 ruby`rb_reg_search0(re=<unavailable>, str=<unavailable>, pos=<unavailable>, reverse=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at re.c:1861:12 [artificial]
    frame ruby#11: 0x0000000100230b90 ruby`rb_pat_search0(pat=<unavailable>, str=<unavailable>, pos=<unavailable>, set_backref_str=<unavailable>, match=<unavailable>) at string.c:6619:16 [artificial]
    frame ruby#12: 0x00000001002287f4 ruby`rb_str_sub_bang [inlined] rb_pat_search(pat=4748462304, str=4748451168, pos=0, set_backref_str=1) at string.c:6626:12
    frame ruby#13: 0x00000001002287dc ruby`rb_str_sub_bang(argc=1, argv=0x00000001381280d0, str=4748451168) at string.c:6668:11
    frame ruby#14: 0x000000010022826c ruby`rb_str_sub
```

You can reproduce this by running:
```
RUBY_TESTOPTS="--name=/test_str_capitalize/" make test-all TESTS=test/ruby/test_m17n.comb
```

However, you need to run it with multiple ractors at once.

Co-authored-by: jhawthorn <john@hawthorn.email>
etiennebarrie pushed a commit that referenced this pull request Jun 30, 2025
`name` is used via `RSTRING_PTR` within rb_str_catf, which may allocate
and thus potentially trigger GC. Although `name` is still referenced
by a local variable, the compiler might optimize away the reference
before the GC sees it, especially under aggressive optimization or when
debugging tools like ASAN are used.

This patch adds an explicit `RB_GC_GUARD` to ensure `name` is kept alive
until after the last use.

While it's not certain this is the root cause of the following observed
use-after-poison ASAN error, I believe this fix is indeed needed and
hopefully a likely candidate for preventing the error.

```
==1960369==ERROR: AddressSanitizer: use-after-poison on address 0x7ec6a00f1d88 at pc 0x5fb5bcafcf2e bp 0x7ffcc1178cb0 sp 0x7ffcc1178470
READ of size 61 at 0x7ec6a00f1d88 thread T0
    #0 0x5fb5bcafcf2d in __asan_memcpy (/tmp/ruby/build/trunk_asan/ruby+0x204f2d) (BuildId: 6d92c84a27b87cfd253c38eeb552593f215ffb3d)
    #1 0x5fb5bcde1fa5 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:29:10
    ruby#2 0x5fb5bcde1fa5 in ruby_nonempty_memcpy /tmp/ruby/src/trunk_asan/include/ruby/internal/memory.h:758:16
    ruby#3 0x5fb5bcde1fa5 in ruby__sfvwrite /tmp/ruby/src/trunk_asan/sprintf.c:1083:9
    ruby#4 0x5fb5bcde1521 in BSD__sprint /tmp/ruby/src/trunk_asan/vsnprintf.c:318:8
    ruby#5 0x5fb5bcde0fbc in BSD_vfprintf /tmp/ruby/src/trunk_asan/vsnprintf.c:1215:3
    ruby#6 0x5fb5bcdde4b1 in ruby_vsprintf0 /tmp/ruby/src/trunk_asan/sprintf.c:1164:5
    ruby#7 0x5fb5bcddd648 in rb_str_vcatf /tmp/ruby/src/trunk_asan/sprintf.c:1234:5
    ruby#8 0x5fb5bcddd648 in rb_str_catf /tmp/ruby/src/trunk_asan/sprintf.c:1245:11
    ruby#9 0x5fb5bcf97c67 in location_format /tmp/ruby/src/trunk_asan/vm_backtrace.c:462:9
    ruby#10 0x5fb5bcf97c67 in location_to_str /tmp/ruby/src/trunk_asan/vm_backtrace.c:493:12
    ruby#11 0x5fb5bcf90a37 in location_to_str_dmyarg /tmp/ruby/src/trunk_asan/vm_backtrace.c:795:12
    ruby#12 0x5fb5bcf90a37 in backtrace_collect /tmp/ruby/src/trunk_asan/vm_backtrace.c:786:28
    ruby#13 0x5fb5bcf90a37 in backtrace_to_str_ary /tmp/ruby/src/trunk_asan/vm_backtrace.c:804:9
    ruby#14 0x5fb5bcf90a37 in rb_backtrace_to_str_ary /tmp/ruby/src/trunk_asan/vm_backtrace.c:816:9
    ruby#15 0x5fb5bd335b25 in exc_backtrace /tmp/ruby/src/trunk_asan/error.c:1904:15
    ruby#16 0x5fb5bd335b25 in rb_get_backtrace /tmp/ruby/src/trunk_asan/error.c:1924:16
```
https://ci.rvm.jp/results/trunk_asan@ruby-sp1/5810304
etiennebarrie pushed a commit that referenced this pull request Aug 1, 2025
This change addresses the following ASAN error:

```
==1973462==ERROR: AddressSanitizer: heap-use-after-free on address 0x5110002117dc at pc 0x749c307c8a65 bp 0x7ffc3af331d0 sp 0x7ffc3af331c8
READ of size 4 at 0x5110002117dc thread T0
    #0 0x749c307c8a64 in rb_getaddrinfo /tmp/ruby/src/trunk_asan/ext/socket/raddrinfo.c:564:14
    #1 0x749c307c8a64 in rsock_getaddrinfo /tmp/ruby/src/trunk_asan/ext/socket/raddrinfo.c:1008:21
    ruby#2 0x749c307cac48 in rsock_addrinfo /tmp/ruby/src/trunk_asan/ext/socket/raddrinfo.c:1049:12
    ruby#3 0x749c307b10ae in init_inetsock_internal /tmp/ruby/src/trunk_asan/ext/socket/ipsocket.c:62:23
    ruby#4 0x562c5b2e327e in rb_ensure /tmp/ruby/src/trunk_asan/eval.c:1080:18
    ruby#5 0x749c307aafd4 in rsock_init_inetsock /tmp/ruby/src/trunk_asan/ext/socket/ipsocket.c:1318:12
    ruby#6 0x749c307b3b78 in tcp_svr_init /tmp/ruby/src/trunk_asan/ext/socket/tcpserver.c:39:12
```

Fixed to avoid accessing memory that has already been freed after calling `free_getaddrinfo_arg`.
etiennebarrie pushed a commit that referenced this pull request Aug 1, 2025
This is notably faster: no need to hash indices.

Before:

```
plum% samply record ~/.rubies/ruby-zjit/bin/ruby --zjit benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-10T14:40:49Z master 51252ef) +ZJIT dev +PRISM [arm64-darwin24]
itr:   time
 #1: 5311ms
 ruby#2:   49ms
 ruby#3:   49ms
 ruby#4:   48ms
```

After:

```
plum% samply record ~/.rubies/ruby-zjit/bin/ruby --zjit benchmarks/getivar.rb
ruby 3.5.0dev (2025-07-10T15:09:06Z mb-benchmark-compile 42ffd3c) +ZJIT dev +PRISM [arm64-darwin24]
itr:   time
 #1: 1332ms
 ruby#2:   49ms
 ruby#3:   48ms
 ruby#4:   48ms
```
etiennebarrie pushed a commit that referenced this pull request Aug 1, 2025
Previously, ZJIT miscompiled the following because of native SP
interference.

    def a(n1,n2,n3,n4,n5,n6,n7,n8) = [n8]
    a(0,0,0,0,0,0,0, :ok)

Commented problematic disassembly:

    ; call rb_ary_new_capa
    mov x0, #1
    mov x16, #0x1278
    movk x16, #0x4bc, lsl ruby#16
    movk x16, #1, lsl ruby#32
    blr x16
    ; call rb_ary_push
    mov x1, x0
    str x1, [sp, #-0x10]! ; c_push() from alloc_regs()
    mov x0, x1            ; arg0, the array
    ldur x1, [sp]         ; meant to be arg1=n8, but sp just moved!
    mov x16, #0x3968
    movk x16, #0x4bc, lsl ruby#16
    movk x16, #1, lsl ruby#32
    blr x16

Since the frame pointer stays constant in the body of the function,
static offsets based on it don't run the risk of being invalidated by SP
movements.

Pass the registers to preserve through Insn::FrameSetup. This allows ARM
to use STP and waste no gaps between EC, SP, and CFP.

x86 now preserves and restores RBP since we use it as the frame pointer.
Since all arches now have a frame pointer, remove offset based SP
movement in the epilogue and restore registers using the frame pointer.
etiennebarrie pushed a commit that referenced this pull request Aug 1, 2025
During Ruby's shutdown, we no longer need to check the fstr of the symbol
because we don't use the fstr anymore for freeing the symbol. This can also
fix the following ASAN error:

==2721247==ERROR: AddressSanitizer: use-after-poison on address 0x75fa90a627b8 at pc 0x64a7b06fb4bc bp 0x7ffdf95ba9b0 sp 0x7ffdf95ba9a8
READ of size 8 at 0x75fa90a627b8 thread T0
    #0 0x64a7b06fb4bb in RB_BUILTIN_TYPE include/ruby/internal/value_type.h:191:30
    #1 0x64a7b06fb4bb in rb_gc_shutdown_call_finalizer_p gc.c:357:18
    ruby#2 0x64a7b06fb4bb in rb_gc_impl_shutdown_call_finalizer gc/default/default.c:3045:21
    ruby#3 0x64a7b06fb4bb in rb_objspace_call_finalizer gc.c:1739:5
    ruby#4 0x64a7b06ca1b2 in rb_ec_finalize eval.c:165:5
    ruby#5 0x64a7b06ca1b2 in rb_ec_cleanup eval.c:256:5
    ruby#6 0x64a7b06c98a3 in ruby_cleanup eval.c:179:12
etiennebarrie pushed a commit that referenced this pull request Oct 1, 2025
rb_profile_frames() is used by profilers in a way such that it can run
on any instruction in the binary, and it crashed previously in the
following situation in `RUBY_DEBUG` builds:

```
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
    frame #0: 0x00000001002827f0 miniruby`vm_make_env_each(ec=0x0000000101866b00, cfp=0x000000080c91bee8) at vm.c:992:74
   989              }
   990
   991              vm_make_env_each(ec, prev_cfp);
-> 992              VM_FORCE_WRITE_SPECIAL_CONST(&ep[VM_ENV_DATA_INDEX_SPECVAL], VM_GUARDED_PREV_EP(prev_cfp->ep));
   993          }
   994      }
   995      else {
(lldb) call rb_profile_frames(0, 100, $2, $3)
/Users/alan/ruby/vm_core.h:1448: Assertion Failed: VM_ENV_FLAGS:FIXNUM_P(flags)
ruby 3.5.0dev (2025-09-23T20:20:04Z master 06b7a70) +PRISM [arm64-darwin25]

-- Crash Report log information --------------------------------------------
   See Crash Report log file in one of the following locations:
     * ~/Library/Logs/DiagnosticReports
     * /Library/Logs/DiagnosticReports
   for more details.
Don't forget to include the above Crash Report log file in bug reports.

-- Control frame information -----------------------------------------------
c:0008 p:---- s:0029 e:000028 CFUNC  :lambda
/Users/alan/ruby/vm_core.h:1448: Assertion Failed: VM_ENV_FLAGS:FIXNUM_P(flags)
ruby 3.5.0dev (2025-09-23T20:20:04Z master 06b7a70) +PRISM [arm64-darwin25]

-- Crash Report log information --------------------------------------------
<snip>
```

There is a small window where the control frame is invalid and fails the
assert.

The double crash also shows that in `RUBY_DEBUG` builds, the crash reporter was
previously not resilient to corrupt frame state. In release builds, it
prints more info.

Add unchecked APIs for the crash reporter and profilers so they work
as well in `RUBY_DEBUG` builds as non-debug builds.
etiennebarrie pushed a commit that referenced this pull request Oct 13, 2025
We need to free the current_block_exits in parse_program when we're done
with it to prevent memory leaks. This fixes the following memory leak detected
when running Ruby using `RUBY_FREE_AT_EXIT=1 ruby -nc -e "break"`:

    Direct leak of 32 byte(s) in 1 object(s) allocated from:
        #0 0x5bd3c5bc66c8 in realloc (miniruby+0x616c8) (BuildId: ruby/prism@ba6a96e5a060)
        #1 0x5bd3c5f91fd9 in pm_node_list_grow prism/templates/src/node.c.erb:35:40
        ruby#2 0x5bd3c5f91e9d in pm_node_list_append prism/templates/src/node.c.erb:48:9
        ruby#3 0x5bd3c6001fa0 in parse_block_exit prism/prism.c:15788:17
        ruby#4 0x5bd3c5fee155 in parse_expression_prefix prism/prism.c:19221:50
        ruby#5 0x5bd3c5fe9970 in parse_expression prism/prism.c:22235:23
        ruby#6 0x5bd3c5fe0586 in parse_statements prism/prism.c:13976:27
        ruby#7 0x5bd3c5fd6792 in parse_program prism/prism.c:22508:40

ruby/prism@fdf9b8d24a
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant