Skip to content

Security: Stop RocksDB threads calling unexpected code when zebrad exits #3133

@teor2345

Description

@teor2345

Description

When I terminate zebrad start using Ctrl-C (SIGINT) on Linux, I get pthread errors.

I can reliably generate these errors during startup and the initial peer connections. But they seem much less frequent after initialization.

This is a serious security risk, because calling pure virtual methods is unexpected code execution. So compilers could optimise it into something dangerous, or something that could be exploited. (That doesn't happen in this case - instead, it prints a warning. But there are no guarantees.)

Initial Analysis

We recently made a few changes to Zebra's startup and shutdown behaviour (#3057, #3071, #3078, #3091).

Some of these errors seem to come from C++ code ("pure virtual method called"), so it's probably a combination of:

PR #3091 also stopped zebrad deleting ephemeral state directories. We should fix that issue as part of this bug fix.

Suggested Design

Precautions:

Shut down in this order:

  1. wait for RocksDB to finish, by calling CancelAllBackgroundWork() when Zebra wants to exit: https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ ,
  2. drop the database: https://github.com/facebook/rocksdb/wiki/Known-Issues ,
  3. delete the database files on disk, if it is ephemeral,
  4. exit Zebra

We can do this via a shutdown method in the state service and finalized state, which is called:

  • when the state is dropped,
  • when the state reaches the debug_stop_at_height, or
  • using a Shutdown request to the state, in the start and copy-state (both state services) commands.

After we've waited for shutdown to finish, it is safe to exit Zebra.

Other state requests could be processed while we are exiting Zebra:

  • return errors for any requests and readiness checks after shutdown is called

Rejected Alternatives

Rust's sys::at_exit handler has been removed:
rust-lang/rust#84115

There's no guarantee that libc::atexit runs before RocksDB's static destructors, just like there's no guarantee of the exit order between Rust and C:
https://stackoverflow.com/questions/35980148/why-does-an-atexit-handler-panic-when-it-accesses-stdout

Steps to Reproduce

I tried this:

  1. Run zebrad start
  2. Press Ctrl-C (SIGINT)
zebrad start &
kill -SIGINT $!

Expected Behaviour

I expected to see this happen:

zebrad terminates quickly, without errors.

Actual Behaviour

Instead, this happened:

zebrad terminates quickly, with errors like:

pthread lock: Invalid argument                                                                                                                                                                 
Aborted (core dumped)  
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
pthread destroy mutex: Device or resource busy
Aborted (core dumped)

Zebra Logs

$ /home/dev/.cargo/bin/zebrad start                                                                                                                                         
Dec 03 07:04:11.132  INFO {zebrad="022808d" net="Main"}: zebrad::commands::start: Starting zebrad                                                                                              
Dec 03 07:04:11.132  INFO {zebrad="022808d" net="Main"}: zebrad::commands::start: config=ZebradConfig { consensus: Config { checkpoint_sync: false }, metrics: MetricsSection { endpoint_addr: 
None }, network: Config { listen_addr: 0.0.0.0:8233, network: Mainnet, initial_mainnet_peers: {"dnsseed.z.cash:8233", "dnsseed.str4d.xyz:8233", "mainnet.seeder.zfnd.org:8233", "mainnet.is.yol
o.money:8233"}, initial_testnet_peers: {"testnet.seeder.zfnd.org:18233", "testnet.is.yolo.money:18233", "dnsseed.testnet.z.cash:18233"}, peerset_initial_target_size: 50, crawl_new_peer_interv
al: 60s }, state: Config { cache_dir: "/home/dev/.cache/zebra", ephemeral: false, debug_stop_at_height: None }, tracing: TracingSection { use_color: true, filter: None, endpoint_addr: None, f
lamegraph: None, use_journald: false }, sync: SyncSection { max_concurrent_block_requests: 50, lookahead_limit: 2000 }, mempool: Config { debug_enable_at_height: None, tx_cost_limit: 80000000
, eviction_memory_time: 3600s } }                                                                                                                                                              
Dec 03 07:04:11.132  INFO {zebrad="022808d" net="Main"}: zebrad::commands::start: initializing node state                                                                                      
Dec 03 07:04:11.132  INFO {zebrad="022808d" net="Main"}: zebra_state::config: the open file limit is at or above the specified limit new_limit=1024 current_limit=1024 hard_rlimit=Some(524288)
Dec 03 07:04:11.144  INFO {zebrad="022808d" net="Main"}: zebra_state::service::finalized_state: Opened Zebra state cache at /home/dev/.cache/zebra/state/v11/mainnet                           
Dec 03 07:04:11.144  INFO {zebrad="022808d" net="Main"}: zebra_state::service::finalized_state: reading cached tip height                                                                      
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}: zebra_state::service::finalized_state: loaded Zebra state cache tip=None                                                              
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}: zebra_state::service: starting legacy chain check                                                                                     
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}: zebra_state::service: no legacy chain found                                                                                           
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}: zebrad::commands::start: initializing network                                                                                         
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}:open_listener{addr=0.0.0.0:8233}: zebra_network::peer_set::initialize: Trying to open Zcash protocol endpoint at 0.0.0.0:8233...       
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}:open_listener{addr=0.0.0.0:8233}: zebra_network::peer_set::initialize: Opened Zcash protocol endpoint at 0.0.0.0:8233        
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}:add_initial_peers: zebra_network::config: resolved seed peer IP addresses seed="dnsseed.z.cash:8233" remote_ip_count=25
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}:add_initial_peers: zebra_network::config: resolved seed peer IP addresses seed="mainnet.seeder.zfnd.org:8233" remote_ip_count=25
Dec 03 07:04:11.145  INFO {zebrad="022808d" net="Main"}:add_initial_peers: zebra_network::config: resolved seed peer IP addresses seed="mainnet.is.yolo.money:8233" remote_ip_count=50
^CDec 03 07:04:12.083  INFO {zebrad="022808d" net="Main"}:sig{kind=SignalKind(2) name="SIGINT"}: zebrad::signal: received SIGINT, starting shutdown
pthread destroy mutex: Device or resource busy
Aborted (core dumped)

Environment

Zebra Version

$ zebrad version
zebrad 1.0.0-beta.2

Operating System

$ uname -a
Linux ... 5.4.122 #1-NixOS SMP Wed May 26 10:05:21 UTC 2021 x86_64 GNU/Linux

Metadata

Metadata

Labels

C-securityCategory: Security issuesI-crashZebra crashes (without a panic)I-usabilityZebra is hard to understand or use

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions