Skip to content

Conversation

akichidis
Copy link
Contributor

@akichidis akichidis commented Sep 11, 2024

Description

Looking on the connectivity metrics for the subscribed peers I believe it's possible to have some race conditions when nodes quickly connect/disconnect. As the metric is a gauge the last who sets the value "wins", so it's possible that we might have nodes connecting again while the earlier connection has not been dropped yet - which consequently once dropped it will make the peer appear in metrics as disconnected. The PR is refactoring a bit that part and only sets the peer as disconnected when there is no other pending connection.

Test plan

CI


Release notes

Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.

For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

… 0 only when there is no other subscription for the authority
Copy link

vercel bot commented Sep 11, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 13, 2024 1:52pm
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Sep 13, 2024 1:52pm
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Sep 13, 2024 1:52pm
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Sep 13, 2024 1:52pm

.set(1);
// Failure can only be due to core shutdown.
let _ = subscription_counter.increment();
let _ = subscription_counter.increment(peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the error and panic if it is not shutdown?

.set(0);
// Failure can only be due to core shutdown.
let _ = self.subscription_counter.decrement();
let _ = self.subscription_counter.decrement(self.peer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, check the error and panic if it is not shutdown?

@akichidis akichidis enabled auto-merge (squash) September 13, 2024 13:49
@akichidis akichidis merged commit 71ac187 into main Sep 13, 2024
48 checks passed
@akichidis akichidis deleted the akichidis/subscription-metrics-atomic branch September 13, 2024 14:13
suiwombat pushed a commit that referenced this pull request Sep 16, 2024
## Description 

Looking on the connectivity metrics for the subscribed peers I believe
it's possible to have some race conditions when nodes quickly
connect/disconnect. As the metric is a gauge the last who sets the value
"wins", so it's possible that we might have nodes connecting again while
the earlier connection has not been dropped yet - which consequently
once dropped it will make the peer appear in metrics as disconnected.
The PR is refactoring a bit that part and only sets the peer as
disconnected when there is no other pending connection.

## Test plan 

CI

---

## Release notes

Check each box that your changes affect. If none of the boxes relate to
your changes, release notes aren't required.

For each box you select, include information after the relevant heading
that describes the impact of your changes that a user might notice and
any actions they must take to implement updates.

- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC: 
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants