-
Notifications
You must be signed in to change notification settings - Fork 608
Open
Description
On riscv64 FreeBSD, I have been experiencing problems when running the system under high load. The system hangs, apparently with all threads in a livelock condition. This condition reproduces on a SiFive Unmatched board, as well as on QEMU and RVVM.
The evaluation of many hangs suggest a link to multithreaded applications with lots of TLB shootdowns, specifically running the Go toolchain. Stack traces of a half-stuck system, where some threads seem to be in a livelock but others are fine look like this:
db> show active trace
Tracing command clock pid 2 tid 100029 td 0xffffffc0adc0a140 (CPU 0)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
spinlock_exit() at spinlock_exit+0x3c
mi_switch() at mi_switch+0x17a
softclock_thread() at softclock_thread+0x76
fork_exit() at fork_exit+0x68
fork_trampoline() at fork_trampoline+0xa
Tracing command pagedaemon pid 8 tid 100115 td 0xffffffc0adc2bcc0 (CPU 6)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
__rw_wlock_hard() at __rw_wlock_hard+0x442
_rw_wlock_cookie() at _rw_wlock_cookie+0x94
pmap_ts_referenced() at pmap_ts_referenced+0xa4
$x() at $x+0x62c
vm_pageout() at vm_pageout+0x1c2
fork_exit() at fork_exit+0x68
fork_trampoline() at fork_trampoline+0xa
Tracing command sh pid 76959 tid 120011 td 0xffffffc103152140 (CPU 5)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
sbi_remote_fence_i() at sbi_remote_fence_i+0x22
pmap_enter_quick() at pmap_enter_quick+0x6e
vm_fault_prefault() at vm_fault_prefault+0x180
vm_fault() at vm_fault+0x164c
vm_fault_trap() at vm_fault_trap+0x4a
page_fault_handler() at page_fault_handler+0x1c4
do_trap_user() at do_trap_user+0xf0
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 12, tval = 0x12a763f87e
Tracing command sh pid 76969 tid 120402 td 0xffffffc1030735c0 (CPU 4)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
sbi_remote_fence_i() at sbi_remote_fence_i+0x22
pmap_enter_quick() at pmap_enter_quick+0x6e
vm_fault_prefault() at vm_fault_prefault+0x180
vm_fault() at vm_fault+0x164c
vm_fault_trap() at vm_fault_trap+0x4a
page_fault_handler() at page_fault_handler+0x1c4
do_trap_user() at do_trap_user+0xf0
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 12, tval = 0x8337c0c0
Tracing command sh pid 76971 tid 120096 td 0xffffffc1030c75c0 (CPU 7)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
sbi_remote_fence_i() at sbi_remote_fence_i+0x22
pmap_enter_object() at pmap_enter_object+0xda
vm_map_pmap_enter() at vm_map_pmap_enter+0x280
vm_map_insert1() at vm_map_insert1+0x438
vm_map_fixed() at vm_map_fixed+0x112
vm_mmap_object() at vm_mmap_object+0x130
vn_mmap() at vn_mmap+0xec
kern_mmap() at kern_mmap+0x46e
sys_mmap() at sys_mmap+0x38
do_trap_user() at do_trap_user+0x1e4
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- syscall (477, FreeBSD ELF64, mmap)
Tracing command sh pid 76972 tid 120164 td 0xffffffc1030bfcc0 (CPU 1)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
sbi_remote_fence_i() at sbi_remote_fence_i+0x22
pmap_enter_quick() at pmap_enter_quick+0x6e
vm_fault_prefault() at vm_fault_prefault+0x180
vm_fault() at vm_fault+0x164c
vm_fault_trap() at vm_fault_trap+0x4a
page_fault_handler() at page_fault_handler+0x1c4
do_trap_user() at do_trap_user+0xf0
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 12, tval = 0x1a0c22baf0
Tracing command sh pid 76973 tid 100429 td 0xffffffc1030c9840 (CPU 3)
kdb_alt_break_internal() at kdb_alt_break_internal+0x15c
kdb_alt_break() at kdb_alt_break+0xe
uart_intr_rxready() at uart_intr_rxready+0x7e
uart_intr() at uart_intr+0x104
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
plic_intr() at plic_intr+0x80
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 9
__rw_wlock_hard() at __rw_wlock_hard+0x446
_rw_wlock_cookie() at _rw_wlock_cookie+0x94
pmap_enter() at pmap_enter+0x614
vm_fault() at vm_fault+0x133c
vm_fault_trap() at vm_fault_trap+0x4a
page_fault_handler() at page_fault_handler+0x1c4
do_trap_user() at do_trap_user+0xf0
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- exception 15, tval = 0x80a34a28
Tracing command cc pid 76974 tid 118022 td 0xffffffc1030d7cc0 (CPU 2)
ipi_stop() at ipi_stop+0x2c
intr_ipi_dispatch() at intr_ipi_dispatch+0x50
sbi_ipi_intr() at sbi_ipi_intr+0x70
intr_event_handle() at intr_event_handle+0x88
intr_isrc_dispatch() at intr_isrc_dispatch+0x2c
intc_intr() at intc_intr+0x42
intr_irq_handler() at intr_irq_handler+0x54
do_trap_supervisor() at do_trap_supervisor+0x78
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x74
--- interrupt 1
sbi_remote_fence_i() at sbi_remote_fence_i+0x22
pmap_enter_object() at pmap_enter_object+0xda
vm_map_pmap_enter() at vm_map_pmap_enter+0x280
vm_map_insert1() at vm_map_insert1+0x438
vm_map_fixed() at vm_map_fixed+0x112
elf64_map_insert() at elf64_map_insert+0x16e
elf64_load_sections() at elf64_load_sections+0x1ae
exec_elf64_imgact() at exec_elf64_imgact+0x75c
$x() at $x+0x47c
sys_execve() at sys_execve+0x52
do_trap_user() at do_trap_user+0x1e4
cpu_exception_handler_user() at cpu_exception_handler_user+0x72
--- syscall (59, FreeBSD ELF64, execve)
db> cont
^BFeb 18 23:23:53 freebsd syslogd: exiting on signal 15
KDB: enter: Break to debugger
timeout stopping cpus
[ thread pid 76965 tid 119128 ]
Stopped at kdb_alt_break_internal+0x15e: sd zero,-1164(s1)
db>
(a trace of a full hang cannot be obtained as the kernel debugger cannot be entered while the system hangs)
Project member @jrtc27 has suggested that this may be connected to an unhandled livelock possibility:
if (ret == SBI_FIFO_UNCHANGED &&
sbi_fifo_enqueue(tlb_fifo_r, data, false) < 0) {
/**
* For now, Busy loop until there is space in the fifo.
* There may be case where target hart is also
* enqueue in source hart's fifo. Both hart may busy
* loop leading to a deadlock.
* TODO: Introduce a wait/wakeup event mechanism to handle
* this properly.
*/
tlb_process_once(scratch);
sbi_dprintf("hart%d: hart%d tlb fifo full\n", curr_hartid,
sbi_hartindex_to_hartid(remote_hartindex));
return SBI_IPI_UPDATE_RETRY;
}
Metadata
Metadata
Assignees
Labels
No labels