-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Description
The testnet validator experiences intermittent timeout errors when attempting to connect to the testnet WebSocket endpoint (wss://test.finney.opentensor.ai:443) when running on Kubernetes. The connection works fine locally and within the pod when testing individual commands, but fails during validator initialization with a handshake timeout.
Error Details
TimeoutError: timed out while waiting for handshake response
Full stack trace:
pythonTraceback (most recent call last):
File "/app/scripts/run_validator.py", line 25, in <module>
asyncio.run(main())
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/app/scripts/run_validator.py", line 13, in main
validator = Validator()
File "/app/neurons/validator.py", line 78, in __init__
self.metagraph.sync_nodes()
File "/usr/local/lib/python3.10/site-packages/fiber/chain/metagraph.py", line 66, in sync_nodes
nodes = fetch_nodes.get_nodes_for_netuid(self.substrate, self.netuid)
File "/usr/local/lib/python3.10/site-packages/fiber/chain/fetch_nodes.py", line 66, in get_nodes_for_netuid
substrate = get_substrate(subtensor_address=substrate.url)
File "/usr/local/lib/python3.10/site-packages/fiber/chain/interface.py", line 31, in get_substrate
substrate = SubstrateInterface(
File "/usr/local/lib/python3.10/site-packages/async_substrate_interface/sync_substrate.py", line 538, in __init__
self.ws = self.connect(init=True)
File "/usr/local/lib/python3.10/site-packages/async_substrate_interface/sync_substrate.py", line 624, in connect
return connect(self.chain_endpoint, max_size=self.ws_max_size)
File "/usr/local/lib/python3.10/site-packages/websockets/sync/client.py", line 378, in connect
connection.handshake(
File "/usr/local/lib/python3.10/site-packages/websockets/sync/client.py", line 94, in handshake
raise TimeoutError("timed out while waiting for handshake response")
TimeoutError: timed out while waiting for handshake response
Current Behavior
- Connection fails intermittently (succeeded on 5th retry in the reported case)
- Works fine when running locally
- Works fine when executing Python scripts directly in the pod
- Only affects testnet validator on Kubernetes
- Mainnet validator appears unaffected
Expected Behavior
The validator should establish a stable WebSocket connection to the testnet on first attempt without requiring multiple retries.
Environment
- Platform: Kubernetes
- Python: 3.10
- Endpoint: wss://test.finney.opentensor.ai:443
- Component: Testnet Validator
Reproduction Steps
- Deploy testnet validator to Kubernetes
- Observe logs during startup
- Connection may fail with timeout error (intermittent)
Metadata
Metadata
Assignees
Labels
No labels