-
Notifications
You must be signed in to change notification settings - Fork 72
Description
Branch: master
I believe this might be due to #645 - when C* node is down, and we call ccm node start, the hostid method is called on a node that is down. I saw that ScyllaNode::hostid logic was adjusted, so it's possible to obtain a host id from Scylla node that is down.
Reproducer from cpp-rust-driver CI:
ccm> ccm create -n 3:0 -i 127.0.0. -v 3.11.19 -b cpp-driver_3-11-19_3-0
ccm> ccm updateconf --rt=10000 read_request_timeout_in_ms:10000 write_request_timeout_in_ms:10000 request_timeout_in_ms:10000 phi_convict_threshold:16 hinted_handoff_enabled:false enable_materialized_views:true dynamic_snitch_update_interval_in_ms:1000 native_transport_max_threads:1 concurrent_reads:2 concurrent_writes:2 concurrent_compactors:1 compaction_throughput_mb_per_sec:0 key_cache_size_in_mb:0 key_cache_save_period:0 memtable_flush_writers:1 max_hints_delivery_threads:1 cas_contention_timeout_in_ms:10000 file_cache_size_in_mb:0 rpc_min_threads:1 rpc_max_threads:1
ccm> ccm start --wait-other-notice --wait-for-binary-proto
ccm> ccm status
ccm> ccm node2 stop
ccm> ccm node3 stop
ccm> ccm node2 start --wait-other-notice --wait-for-binary-proto
unknown file: Failure
C++ exception with description "Traceback (most recent call last):
File "/home/runner/.local/bin/ccm", line 74, in <module>
cmd.run()
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/cmds/node_cmds.py", line 196, in run
self.node.start(not self.options.no_join_ring,
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/node.py", line 675, in start
node.watch_log_for_alive(self, from_mark=mark)
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/node.py", line 534, in watch_log_for_alive
tofind = [f"({node.address()}|{node.hostid()}).* now UP" for node in tofind]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/node.py", line 534, in <listcomp>
tofind = [f"({node.address()}|{node.hostid()}).* now UP" for node in tofind]
^^^^^^^^^^^^^
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/node.py", line 1442, in hostid
info = self.nodetool('info', capture_output=True, timeout=timeout)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/node.py", line 880, in nodetool
return self._do_run_nodetool(nodetool, capture_output, wait, timeout, verbose)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/.local/lib/python3.11/site-packages/ccmlib/node.py", line 832, in _do_run_nodetool
raise NodetoolError(" ".join(nodetool), exit_status, stdout, stderr)
ccmlib.node.ToolError: Subprocess /home/runner/.ccm/repository/3.11.19/bin/nodetool -h localhost -p 7200 -Dcom.sun.jndi.rmiURLParsing=legacy info exited with non-zero status; exit status: 1;
stderr: nodetool: Failed to connect to 'localhost:7200' - ConnectException: 'Connection refused (Connection refused)'.
The reproducer does not work for me locally - it means that I might be lucky and C* node already listens on JMX port when ccm asks for hostid(). However, it reliably reproduces in cpp-rust-driver CI (GH action). See for example: https://github.com/scylladb/cpp-rust-driver/actions/runs/14170661751/job/39693671740?pr=231
Metadata
Metadata
Assignees
Labels
No labels