Skip to content

Commit c0666aa

Browse files
feat(fortuna): Multiple replica support (#2812)
* feat(fortuna): implement multiple replica support with sequence number modulo filtering Co-Authored-By: Tejas Badadare <tejas@dourolabs.xyz> * feat(fortuna): add delayed processing for backup replicas - Add configurable time delay before backup replicas check request status - Backup replicas now wait backup_delay_seconds before attempting fulfillment - Add backup_delay_seconds field to ReplicaConfig with default of 30 seconds - Improves reliability by reducing race conditions between replicas Co-Authored-By: Tejas Badadare <tejas@dourolabs.xyz> * docs(fortuna): add comprehensive multi-replica setup documentation - Add Multiple Replica Setup section to README.md with modulo assignment explanation - Add replica_config examples to config.sample.yaml for 2, 3, and 5 replica setups - Include deployment considerations, failover behavior, and wallet separation requirements - Add validation for backup_delay_seconds > 0 to prevent race conditions Co-Authored-By: Tejas Badadare <tejas@dourolabs.xyz> * update sample config * refactor(fortuna): streamline keeper configuration and improve event processing - Updated config.sample.yaml by removing unnecessary blank lines. - Changed backup_delay_seconds in README.md from 45 to 30 for consistency. - Refactored run_keeper_threads to accept KeeperConfig directly instead of private_key. - Enhanced run function to handle keeper configuration more effectively. - Added comments in process_event_with_backoff to clarify primary and backup replica logic. * feat(fortuna): add provider arg to entropy load testing script * fix(fortuna): improve config and replica logic * fix(fortuna): names, config check * bump version --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
1 parent e4f385f commit c0666aa

File tree

10 files changed

+182
-11
lines changed

10 files changed

+182
-11
lines changed

apps/fortuna/Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

apps/fortuna/Cargo.toml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "fortuna"
3-
version = "7.6.5"
3+
version = "7.7.0"
44
edition = "2021"
55

66
[lib]
@@ -41,13 +41,12 @@ url = "2.5.0"
4141
chrono = { version = "0.4.38", features = [
4242
"clock",
4343
"std",
44-
"serde"
44+
"serde",
4545
], default-features = false }
4646
backoff = { version = "0.4.0", features = ["futures", "tokio"] }
4747
thiserror = "1.0.61"
4848
futures-locks = "0.7.1"
49-
sqlx = { version = "0.8", features = [ "runtime-tokio", "sqlite", "chrono" ] }
50-
49+
sqlx = { version = "0.8", features = ["runtime-tokio", "sqlite", "chrono"] }
5150

5251

5352
[dev-dependencies]

apps/fortuna/README.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,11 @@ a database to be available at build time. Create a `.env` file in the root of th
1717
DATABASE_URL="sqlite:fortuna.db?mode=rwc"
1818
```
1919

20+
Install sqlx for cargo with:
21+
```bash
22+
cargo install sqlx
23+
```
24+
2025
Next, you need to create the database and apply the schema migrations. You can do this by running:
2126

2227
```bash
@@ -40,6 +45,60 @@ Please add the changed files in the `.sqlx` folder to your git commit.
4045
The Fortuna binary has a command-line interface to perform useful operations on the contract, such as
4146
registering a new randomness provider, or drawing a random value. To see the available commands, simply run `cargo run`.
4247

48+
## Multiple Replica Setup
49+
50+
Fortuna supports running multiple replica instances for high availability and reliability. This prevents service interruption if one instance goes down and distributes the workload across multiple instances.
51+
52+
### How Replica Assignment Works
53+
54+
- Each replica is assigned a unique `replica_id` (0, 1, 2, etc.)
55+
- Requests are distributed using modulo assignment: `sequence_number % total_replicas`
56+
- Each replica primarily handles requests assigned to its ID
57+
- After a configurable delay, replicas will process requests from other replicas as backup (failover)
58+
59+
### Example Configurations
60+
61+
**Two Replica Setup (Blue/Green):**
62+
```yaml
63+
# Replica 0 (Blue) - handles even sequence numbers (0, 2, 4, ...)
64+
keeper:
65+
replica_config:
66+
replica_id: 0
67+
total_replicas: 2
68+
backup_delay_seconds: 30
69+
70+
# Replica 1 (Green) - handles odd sequence numbers (1, 3, 5, ...)
71+
keeper:
72+
replica_config:
73+
replica_id: 1
74+
total_replicas: 2
75+
backup_delay_seconds: 30
76+
```
77+
78+
**Three Replica Setup:**
79+
```yaml
80+
# Replica 0 - handles sequence numbers 0, 3, 6, 9, ...
81+
keeper:
82+
replica_config:
83+
replica_id: 0
84+
total_replicas: 3
85+
backup_delay_seconds: 30
86+
```
87+
88+
### Deployment Considerations
89+
90+
1. **Separate Wallets**: Each replica MUST use a different private key to avoid nonce conflicts
91+
2. **Backup Delay**: Set `backup_delay_seconds` long enough to allow primary replica to process requests, but short enough for acceptable failover time (recommended: 30-60 seconds)
92+
3. **Monitoring**: Monitor each replica's processing metrics to ensure proper load distribution
93+
4. **Gas Management**: Each replica needs sufficient ETH balance for gas fees
94+
95+
### Failover Behavior
96+
97+
- Primary replica processes requests immediately
98+
- Backup replicas wait for `backup_delay_seconds` before checking if request is still unfulfilled
99+
- If request is already fulfilled during the delay, backup replica skips processing
100+
- This prevents duplicate transactions and wasted gas while ensuring reliability
101+
43102
## Local Development
44103

45104
To start an instance of the webserver for local testing, you first need to perform a few setup steps:

apps/fortuna/config.sample.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,3 +86,19 @@ keeper:
8686
value: 0xabcd
8787
# For production, you can store the private key in a file.
8888
# file: keeper-key.txt
89+
90+
# Multi-replica configuration
91+
# Optional: Multi-replica configuration for high availability and load distribution
92+
# Uncomment and configure for production deployments with multiple Fortuna instances
93+
# replica_config:
94+
# replica_id: 0 # Unique identifier for this replica (0, 1, 2, ...)
95+
# total_replicas: 2 # Total number of replica instances running
96+
# backup_delay_seconds: 30 # Seconds to wait before processing other replicas' requests
97+
#
98+
# Example configurations:
99+
#
100+
# Two-replica setup (Blue/Green):
101+
# - Replica 0: handles even sequence numbers (0, 2, 4, ...)
102+
# - Replica 1: handles odd sequence numbers (1, 3, 5, ...)
103+
#
104+
# IMPORTANT: Each replica must use a different private_key to avoid nonce conflicts!

apps/fortuna/src/command/run.rs

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ use {
33
api::{self, ApiBlockChainState, BlockchainState, ChainId},
44
chain::ethereum::InstrumentedPythContract,
55
command::register_provider::CommitmentMetadata,
6-
config::{Commitment, Config, EthereumConfig, ProviderConfig, RunOptions},
6+
config::{Commitment, Config, EthereumConfig, ProviderConfig, ReplicaConfig, RunOptions},
77
eth_utils::traced_client::RpcMetrics,
88
history::History,
99
keeper::{self, keeper_metrics::KeeperMetrics},
@@ -94,10 +94,14 @@ pub async fn run(opts: &RunOptions) -> Result<()> {
9494

9595
let keeper_metrics: Arc<KeeperMetrics> =
9696
Arc::new(KeeperMetrics::new(metrics_registry.clone()).await);
97+
9798
let keeper_private_key_option = config.keeper.private_key.load()?;
9899
if keeper_private_key_option.is_none() {
99100
tracing::info!("Not starting keeper service: no keeper private key specified. Please add one to the config if you would like to run the keeper service.")
100101
}
102+
103+
let keeper_replica_config = config.keeper.replica_config.clone();
104+
101105
let chains: Arc<RwLock<HashMap<ChainId, ApiBlockChainState>>> = Arc::new(RwLock::new(
102106
config
103107
.chains
@@ -110,6 +114,7 @@ pub async fn run(opts: &RunOptions) -> Result<()> {
110114
keeper_metrics.add_chain(chain_id.clone(), config.provider.address);
111115
let keeper_metrics = keeper_metrics.clone();
112116
let keeper_private_key_option = keeper_private_key_option.clone();
117+
let keeper_replica_config = keeper_replica_config.clone();
113118
let chains = chains.clone();
114119
let secret_copy = secret.clone();
115120
let rpc_metrics = rpc_metrics.clone();
@@ -123,6 +128,7 @@ pub async fn run(opts: &RunOptions) -> Result<()> {
123128
chain_config.clone(),
124129
keeper_metrics.clone(),
125130
keeper_private_key_option.clone(),
131+
keeper_replica_config.clone(),
126132
chains.clone(),
127133
&secret_copy,
128134
history.clone(),
@@ -173,6 +179,7 @@ async fn setup_chain_and_run_keeper(
173179
chain_config: EthereumConfig,
174180
keeper_metrics: Arc<KeeperMetrics>,
175181
keeper_private_key_option: Option<String>,
182+
keeper_replica_config: Option<ReplicaConfig>,
176183
chains: Arc<RwLock<HashMap<ChainId, ApiBlockChainState>>>,
177184
secret_copy: &str,
178185
history: Arc<History>,
@@ -195,6 +202,7 @@ async fn setup_chain_and_run_keeper(
195202
if let Some(keeper_private_key) = keeper_private_key_option {
196203
keeper::run_keeper_threads(
197204
keeper_private_key,
205+
keeper_replica_config,
198206
chain_config,
199207
state,
200208
keeper_metrics.clone(),

apps/fortuna/src/config.rs

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,23 @@ impl Config {
9494
}
9595
}
9696

97+
if let Some(replica_config) = &config.keeper.replica_config {
98+
if replica_config.total_replicas == 0 {
99+
return Err(anyhow!("Keeper replica configuration is invalid. total_replicas must be greater than 0."));
100+
}
101+
if config.keeper.private_key.load()?.is_none() {
102+
return Err(anyhow!(
103+
"Keeper replica configuration requires a keeper private key to be specified."
104+
));
105+
}
106+
if replica_config.replica_id >= replica_config.total_replicas {
107+
return Err(anyhow!("Keeper replica configuration is invalid. replica_id must be less than total_replicas."));
108+
}
109+
if replica_config.backup_delay_seconds == 0 {
110+
return Err(anyhow!("Keeper replica configuration is invalid. backup_delay_seconds must be greater than 0 to prevent race conditions."));
111+
}
112+
}
113+
97114
Ok(config)
98115
}
99116

@@ -333,6 +350,18 @@ fn default_chain_sample_interval() -> u64 {
333350
1
334351
}
335352

353+
#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
354+
pub struct ReplicaConfig {
355+
pub replica_id: u64,
356+
pub total_replicas: u64,
357+
#[serde(default = "default_backup_delay_seconds")]
358+
pub backup_delay_seconds: u64,
359+
}
360+
361+
fn default_backup_delay_seconds() -> u64 {
362+
30
363+
}
364+
336365
/// Configuration values for the keeper service that are shared across chains.
337366
#[derive(Clone, Debug, serde::Serialize, serde::Deserialize)]
338367
pub struct KeeperConfig {
@@ -342,6 +371,9 @@ pub struct KeeperConfig {
342371
/// This key *does not need to be a registered provider*. In particular, production deployments
343372
/// should ensure this is a different key in order to reduce the severity of security breaches.
344373
pub private_key: SecretString,
374+
375+
#[serde(default)]
376+
pub replica_config: Option<ReplicaConfig>,
345377
}
346378

347379
// A secret is a string that can be provided either as a literal in the config,

apps/fortuna/src/keeper.rs

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ use {
22
crate::{
33
api::{BlockchainState, ChainId},
44
chain::ethereum::{InstrumentedPythContract, InstrumentedSignablePythContract},
5-
config::EthereumConfig,
5+
config::{EthereumConfig, ReplicaConfig},
66
eth_utils::traced_client::RpcMetrics,
77
history::History,
88
keeper::{
@@ -56,7 +56,8 @@ pub enum RequestState {
5656
/// handle any events for the new blocks.
5757
#[tracing::instrument(name = "keeper", skip_all, fields(chain_id = chain_state.id))]
5858
pub async fn run_keeper_threads(
59-
private_key: String,
59+
keeper_private_key: String,
60+
keeper_replica_config: Option<ReplicaConfig>,
6061
chain_eth_config: EthereumConfig,
6162
chain_state: BlockchainState,
6263
metrics: Arc<KeeperMetrics>,
@@ -69,7 +70,7 @@ pub async fn run_keeper_threads(
6970

7071
let contract = Arc::new(InstrumentedSignablePythContract::from_config(
7172
&chain_eth_config,
72-
&private_key,
73+
&keeper_private_key,
7374
chain_state.id.clone(),
7475
rpc_metrics.clone(),
7576
chain_state.network_id,
@@ -85,6 +86,7 @@ pub async fn run_keeper_threads(
8586
contract: contract.clone(),
8687
gas_limit,
8788
escalation_policy: chain_eth_config.escalation_policy.to_policy(),
89+
replica_config: keeper_replica_config,
8890
metrics: metrics.clone(),
8991
fulfilled_requests_cache,
9092
history,

apps/fortuna/src/keeper/block.rs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ use {
22
crate::{
33
api::BlockchainState,
44
chain::{ethereum::InstrumentedSignablePythContract, reader::BlockNumber},
5+
config::ReplicaConfig,
56
eth_utils::utils::EscalationPolicy,
67
history::History,
78
keeper::{
@@ -45,6 +46,7 @@ pub struct ProcessParams {
4546
pub gas_limit: U256,
4647
pub escalation_policy: EscalationPolicy,
4748
pub chain_state: BlockchainState,
49+
pub replica_config: Option<ReplicaConfig>,
4850
pub metrics: Arc<KeeperMetrics>,
4951
pub history: Arc<History>,
5052
pub fulfilled_requests_cache: Arc<RwLock<HashSet<u64>>>,

apps/fortuna/src/keeper/process_event.rs

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,55 @@ pub async fn process_event_with_backoff(
3535
return Ok(());
3636
}
3737

38+
// If replica config is present, we're running with multiple instances.
39+
// The incoming request is assigned by modulo operation on the sequence number
40+
// and the total number of replicas. If our replica_id is the primary for this sequence number,
41+
// we process the request directly. If our replica_id is a backup, we wait for the delay and
42+
// then check if the request is still open. If it is, we process it as a backup replica.
43+
if let Some(replica_config) = &process_param.replica_config {
44+
let assigned_replica = event.sequence_number % replica_config.total_replicas;
45+
let is_primary_replica = assigned_replica == replica_config.replica_id;
46+
47+
if is_primary_replica {
48+
tracing::debug!("Processing request as primary replica");
49+
} else {
50+
tracing::debug!("Processing request as backup replica");
51+
52+
tracing::info!("Waiting before processing as backup replica");
53+
tokio::time::sleep(tokio::time::Duration::from_secs(
54+
replica_config.backup_delay_seconds,
55+
))
56+
.await;
57+
58+
// Check if the request is still open after the delay.
59+
// If it is, we will process it as a backup replica.
60+
match chain_state
61+
.contract
62+
.get_request(event.provider_address, event.sequence_number)
63+
.await
64+
{
65+
Ok(Some(_)) => {
66+
tracing::info!(
67+
delay_seconds = replica_config.backup_delay_seconds,
68+
"Request still open after delay, processing as backup replica"
69+
);
70+
}
71+
Ok(None) => {
72+
tracing::debug!(
73+
"Request already fulfilled by primary replica during delay, skipping"
74+
);
75+
return Ok(());
76+
}
77+
Err(e) => {
78+
tracing::warn!(
79+
error = ?e,
80+
"Error checking request status after delay, processing as backup replica"
81+
);
82+
}
83+
}
84+
}
85+
}
86+
3887
let account_label = AccountLabel {
3988
chain_id: chain_state.id.clone(),
4089
address: chain_state.provider_address.to_string(),

contract_manager/scripts/load_test_entropy.ts

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ const parser = yargs(hideBin(process.argv))
99
.usage(
1010
"Load tests the entropy contract using the EntropyTester contract with many requests in a single transaction\n" +
1111
"it does not monitor whether the callbacks are actually submitted or not.\n" +
12-
"Usage: $0 --private-key <private-key> --chain <chain-id> --tester-address <tester-address>",
12+
"Usage: $0 --private-key <private-key> --chain <chain-id> --tester-address <tester-address> --provider-address <provider-address>",
1313
)
1414
.options({
1515
chain: {
@@ -22,6 +22,10 @@ const parser = yargs(hideBin(process.argv))
2222
demandOption: true,
2323
desc: "Address of the EntropyTester contract",
2424
},
25+
provider: {
26+
type: "string",
27+
desc: "Address of the entropy provider to use for requests (defaults to default provider)",
28+
},
2529
"success-count": {
2630
type: "number",
2731
default: 100,
@@ -66,7 +70,7 @@ async function main() {
6670
const privateKey = toPrivateKey(argv.privateKey);
6771
const chain = DefaultStore.getChainOrThrow(argv.chain, EvmChain);
6872
const contract = findEntropyContract(chain);
69-
const provider = await contract.getDefaultProvider();
73+
const provider = argv.provider || (await contract.getDefaultProvider());
7074
const fee = await contract.getFee(provider);
7175
const web3 = contract.chain.getWeb3();
7276
const testerContract = new web3.eth.Contract(ABI, argv.testerAddress);

0 commit comments

Comments
 (0)