Closed
Description
The bug is hard to reproduce, I saw it only could of times in tests like #515 (comment) :
[ 34.215054] BUG: unable to handle kernel paging request at ffffeb04003c0260
[ 34.218687] IP: tfw_cache_do_action+0x11a5/0x1700 [tempesta_fw]
[ 34.221514] PGD 0 P4D 0
[ 34.223853] Oops: 0000 [#1] SMP KASAN PTI
[ 34.226399] Modules linked in: tempesta_fw(O) tempesta_db(O) tempesta_tls(O) tempesta_lib(O) crct10dif_pclmul intel_rapl_perf ppdev bochs_drm ttm drm_kms_helper drm sg fb_sys_fops serio_raw syscopyarea parport_pc sysfillrect sysimgblt binfmt_misc parport button ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto sr_mod sd_mod cdrom ata_generic virtio_net ata_piix libata virtio_pci i2c_piix4 psmouse virtio_ring scsi_mod virtio
[ 34.245765] CPU: 3 PID: 490 Comm: nginx Tainted: G O 4.14.32-kdump+ #170
[ 34.250615] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-3.fc33 04/01/2014
[ 34.255464] task: ffff880033840100 task.stack: ffff880034540000
[ 34.259907] RIP: 0010:tfw_cache_do_action+0x11a5/0x1700 [tempesta_fw]
[ 34.264447] RSP: 0018:ffff88006d586e20 EFLAGS: 00010246
[ 34.268717] RAX: 0000000000000000 RBX: ffffc9000f009040 RCX: ffffffffc0745545
[ 34.273556] RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffffeb04003c0260
[ 34.278581] RBP: ffff88003426b2f0 R08: fffff52001e0120a R09: fffff52001e01209
[ 34.283265] R10: ffffc9000f00904f R11: fffff52001e0120a R12: ffffeb04003c0240
[ 34.287909] R13: ffff88003426b000 R14: 00000000000068f1 R15: ffff880000174020
[ 34.292673] FS: 00007f305250cb80(0000) GS:ffff88006d580000(0000) knlGS:0000000000000000
[ 34.297668] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 34.302358] CR2: ffffeb04003c0260 CR3: 0000000047610006 CR4: 00000000003606e0
[ 34.308380] Call Trace:
[ 34.312485] <IRQ>
[ 34.316529] ? tfw_http_conn_repair+0x1430/0x1430 [tempesta_fw]
[ 34.321283] ? frang_http_uri_len.isra.8+0x120/0x120 [tempesta_fw]
[ 34.326218] ? __cache_add_node+0x180/0x180 [tempesta_fw]
[ 34.331059] ? tfw_http_sess_obtain+0x14b/0x9d0 [tempesta_fw]
[ 34.338083] ? __hash_calc+0x155/0x180 [tempesta_lib]
[ 34.343724] ? tfw_hash_str_len+0x1fa/0x250 [tempesta_fw]
[ 34.348525] ? tfw_sync_socket_exit+0x130/0x130 [tempesta_fw]
[ 34.353404] ? tfw_gfsm_move+0xa2/0x200 [tempesta_fw]
[ 34.358293] ? tfw_http_msg_process_generic+0xb7a/0x1350 [tempesta_fw]
[ 34.366521] ? __http_msg_hdr_val+0x196/0x200 [tempesta_fw]
[ 34.371454] ? tfw_http_req_key_calc+0x17e/0x1a0 [tempesta_fw]
[ 34.376401] ? tfw_http_hm_srv_send+0x370/0x370 [tempesta_fw]
[ 34.381235] ? tfw_capolicy_match+0xc2/0x150 [tempesta_fw]
[ 34.385895] ? tfw_cfgop_block_action+0x1b0/0x1b0 [tempesta_fw]
[ 34.390603] ? tfw_http_conn_repair+0x1430/0x1430 [tempesta_fw]
[ 34.398540] tfw_cache_process+0x140/0x470 [tempesta_fw]
[ 34.403089] ? tfw_cache_msg_cacheable+0x40/0x40 [tempesta_fw]
[ 34.407754] ? lock_acquire+0x220/0x220
[ 34.411901] ? lock_acquire+0x103/0x220
[ 34.415978] ? tfw_http_msg_process_generic+0xb7a/0x1350 [tempesta_fw]
[ 34.424830] tfw_http_msg_process_generic+0xca8/0x1350 [tempesta_fw]
[ 34.429618] ? tfw_http_resp_cache_cb+0x8d0/0x8d0 [tempesta_fw]
[ 34.434124] ? kasan_slab_free+0x87/0xc0
[ 34.438378] ? trace_hardirqs_on_caller+0x11/0x280
[ 34.442921] ? ttls_decrypt+0x59b/0x990 [tempesta_tls]
[ 34.447186] ? ttls_send_alert+0xf0/0xf0 [tempesta_tls]
[ 34.453563] ? is_bpf_text_address+0x73/0xd0
[ 34.458769] ? __split_pgfrag_del+0x8a7/0x9d0 [tempesta_fw]
[ 34.463120] tfw_http_msg_process+0xff/0x150 [tempesta_fw]
[ 34.467418] ? tfw_http_msg_process_generic+0x1350/0x1350 [tempesta_fw]
[ 34.471839] __gfsm_fsm_exec+0xa3/0x100 [tempesta_fw]
[ 34.475957] ? ss_skb_expand_head_tail+0x180/0x180 [tempesta_fw]
[ 34.479982] tfw_gfsm_move+0x1d7/0x200 [tempesta_fw]
[ 34.487370] tfw_tls_msg_process+0x56d/0x840 [tempesta_fw]
[ 34.491350] ? do_syscall_64+0xeb/0x240
[ 34.494901] ? entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 34.498729] ? tfw_tls_sni+0x410/0x410 [tempesta_fw]
[ 34.502437] ? tcp_rate_skb_sent+0xcc/0x110
[ 34.505969] ? tcp_ack_update_rtt.isra.31+0x4c0/0x4c0
[ 34.509638] ? bictcp_cong_avoid+0x38a/0x6b0
[ 34.516674] ? tcp_rate_gen+0x185/0x240
[ 34.520175] ? tcp_update_pacing_rate+0xc7/0xf0
[ 34.523878] ? __lock_acquire+0xf8/0x1fa0
[ 34.527418] ? tfw_tls_sni+0x410/0x410 [tempesta_fw]
[ 34.531011] __gfsm_fsm_exec+0xa3/0x100 [tempesta_fw]
[ 34.534722] tfw_connection_recv+0x7b/0xa0 [tempesta_fw]
[ 34.538340] ? tfw_connection_send+0x60/0x60 [tempesta_fw]
[ 34.546761] ss_tcp_process_data+0x3e5/0x730 [tempesta_fw]
[ 34.550557] ? ss_sock_create+0x570/0x570 [tempesta_fw]
[ 34.554142] ? mod_timer_pending+0x6d0/0x6d0
[ 34.557748] ss_tcp_data_ready+0x8e/0x100 [tempesta_fw]
[ 34.561287] tcp_rcv_established+0x7b7/0x970
[ 34.564712] ? tcp_data_queue+0x1890/0x1890
[ 34.568098] ? lock_acquire+0x103/0x220
[ 34.573589] ? lock_acquire+0x103/0x220
[ 34.578045] ? __lock_is_held+0x26/0xf0
[ 34.581280] tcp_v4_do_rcv+0x25a/0x340
[ 34.584488] tcp_v4_rcv+0x1224/0x1360
[ 34.587676] ip_local_deliver_finish+0x1a0/0x530
[ 34.591122] ip_local_deliver+0x106/0x2d0
[ 34.594302] ? ip_call_ra_chain+0x2a0/0x2a0
[ 34.597458] ? tcp_v4_early_demux+0x262/0x310
[ 34.600633] ? rcu_read_lock_held+0x22/0x70
[ 34.607652] ? ip_rcv_finish+0x3b6/0xa50
[ 34.610732] ? nf_hook_slow+0x9f/0xe0
[ 34.613731] ip_rcv+0x50f/0x860
[ 34.616691] ? ip_local_deliver+0x2d0/0x2d0
[ 34.619739] ? process_backlog+0x11e/0x340
[ 34.622942] ? inet_del_offload+0x30/0x30
[ 34.625913] ? debug_check_no_locks_freed+0x1a0/0x1a0
[ 34.629078] ? ip_local_deliver+0x2d0/0x2d0
[ 34.635530] __netif_receive_skb_core+0xf08/0x1610
[ 34.638674] ? enqueue_to_backlog+0x5d0/0x5d0
[ 34.641803] ? lock_acquire+0x103/0x220
[ 34.644856] ? lock_acquire+0x220/0x220
[ 34.647776] ? lock_acquire+0x103/0x220
[ 34.650589] ? process_backlog+0x11e/0x340
[ 34.653450] ? process_backlog+0x11e/0x340
[ 34.656204] ? process_backlog+0xac/0x340
[ 34.658900] process_backlog+0xac/0x340
[ 34.664909] net_rx_action+0x45c/0x6e0
[ 34.667746] ? napi_complete_done+0x170/0x170
[ 34.671120] ? __kernel_fpu_end_bh+0x66/0xe0
[ 34.674516] ? trace_softirqs_on+0x1c/0x1b0
[ 34.677197] __do_softirq+0x115/0x551
[ 34.679753] do_softirq_own_stack+0x2a/0x40
[ 34.682424] </IRQ>
[ 34.684761] do_softirq.part.19+0x39/0x60
[ 34.687377] ? ip_finish_output2+0x43d/0x980
[ 34.690191] __local_bh_enable_ip+0x7d/0x80
[ 34.696133] ip_finish_output2+0x466/0x980
[ 34.698832] ? ip_copy_metadata+0x310/0x310
[ 34.701447] ? inet_sendpage+0x8b/0x2e0
[ 34.703997] ? ipv4_mtu+0xb7/0x130
[ 34.706643] ? ip_output+0x11e/0x340
[ 34.709078] ip_output+0x11e/0x340
[ 34.711461] ? ip_mc_output+0x880/0x880
[ 34.713918] ? lock_acquire+0x103/0x220
[ 34.716344] ? lock_acquire+0x103/0x220
[ 34.718718] ? ip_build_and_send_pkt+0x3e0/0x3e0
[ 34.723500] ip_queue_xmit+0x395/0x940
[ 34.726965] tcp_transmit_skb+0xb0d/0x12b0
[ 34.729492] ? __tcp_select_window+0x2d0/0x2d0
[ 34.732029] ? __atime_needs_update+0x360/0x360
[ 34.735798] ? debug_lockdep_rcu_enabled+0x22/0x40
[ 34.738536] ? tcp_md5_do_lookup+0x2f/0x2b0
[ 34.741730] tcp_write_xmit+0x3a1/0x2270
[ 34.744251] ? lock_acquire+0xa8/0x220
[ 34.746886] __tcp_push_pending_frames+0x4d/0x100
[ 34.749523] do_tcp_setsockopt.isra.33+0xaf2/0xd70
[ 34.754323] ? tcp_disconnect+0x900/0x900
[ 34.757948] ? do_sendfile+0x38a/0x5d0
[ 34.760478] ? do_compat_pwritev64+0xa0/0xa0
[ 34.763040] ? __fget_light+0xaf/0xc0
[ 34.765731] SyS_setsockopt+0xf2/0x190
[ 34.768231] ? SyS_recv+0x10/0x10
[ 34.770647] ? __might_fault+0x15/0x20
[ 34.773238] ? SyS_sendfile64+0xb9/0x110
[ 34.775715] ? SyS_sendfile+0x120/0x120
[ 34.778127] ? SyS_recv+0x10/0x10
[ 34.780439] do_syscall_64+0xeb/0x240
[ 34.786025] entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ 34.788720] RIP: 0033:0x7f3052608a6a
[ 34.791292] RSP: 002b:00007fff281ae4e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000036
[ 34.795989] RAX: ffffffffffffffda RBX: 00007f3050f9f130 RCX: 00007f3052608a6a
[ 34.799207] RDX: 0000000000000003 RSI: 0000000000000006 RDI: 000000000000000e
[ 34.802399] RBP: 0000559137ef5da8 R08: 0000000000000004 R09: 0000559137ef66a0
[ 34.805735] R10: 00007fff281ae4f4 R11: 0000000000000246 R12: 0000559137f2c570
[ 34.808987] R13: 00007f30515a02e0 R14: 0000559137ef5d50 R15: 0000000000000000
[ 34.815354] Code: 5f e2 8b 8c 24 88 00 00 00 48 8b 44 24 60 48 8d 7d 0c 89 48 48 e8 3c 59 5f e2 8b 44 24 28 49 8d 7c 24 20 89 45 0c e8 ab 59 5f e2 <49> 8b 44 24 20 48 8d 48 ff a8 01 4c 0f 45 e1 49 8d 7c 24 10 e8
[ 34.823191] RIP: tfw_cache_do_action+0x11a5/0x1700 [tempesta_fw] RSP: ffff88006d586e20
[ 34.828265] CR2: ffffeb04003c0260
addr2line
shows skb_fill_page_desc(it->skb, it->frag, page, off, sz)
from tfw_cache_add_body_page()
:
/root/linux-4.14.32-tfw/./include/linux/compiler.h:178
/root/linux-4.14.32-tfw/./include/linux/page-flags.h:147
/root/linux-4.14.32-tfw/./include/linux/skbuff.h:2001
/root/linux-4.14.32-tfw/./include/linux/skbuff.h:2023
/root/tempesta/tempesta_fw/cache.c:1890
/root/tempesta/tempesta_fw/cache.c:1954
/root/tempesta/tempesta_fw/cache.c:2168
/root/tempesta/tempesta_fw/cache.c:2227
/root/tempesta/tempesta_fw/cache.c:2271
The skbuff.h:2001 is page = compound_head(page);
from __skb_fill_page_desc()
, and the problem seems in page
dereferencing. The problem instruction address is at x9d545
offset in the disassembled source code:
frag->size = size;
9d528: 48 8d 7d 0c lea 0xc(%rbp),%rdi
frag->page_offset = off;
9d52c: 89 48 48 mov %ecx,0x48(%rax)
frag->size = size;
9d52f: e8 00 00 00 00 callq 9d534 <tfw_cache_do_action+0x1194>
9d530: R_X86_64_PLT32 __asan_store4_noabort-0x4
9d534: 8b 44 24 28 mov 0x28(%rsp),%eax
9d538: 49 8d 7c 24 20 lea 0x20(%r12),%rdi
9d53d: 89 45 0c mov %eax,0xc(%rbp)
9d540: e8 00 00 00 00 callq 9d545 <tfw_cache_do_action+0x11a5>
9d541: R_X86_64_PLT32 __asan_load8_noabort-0x4
9d545: 49 8b 44 24 20 mov 0x20(%r12),%rax
9d54a: 48 8d 48 ff lea -0x1(%rax),%rcx
9d54e: a8 01 test $0x1,%al
9d550: 4c 0f 45 e1 cmovne %rcx,%r12
9d554: 49 8d 7c 24 10 lea 0x10(%r12),%rdi
Note that at address 9d538
we did the same load from R12, successfully. crash
tool can not dereference the address from R12 as well.
The kernel was built with KASAN, but there were no any specific reports.