Skip to content

Conversation

jordanjennings-mysten
Copy link
Contributor

@jordanjennings-mysten jordanjennings-mysten commented Sep 6, 2024

Description

The CLI and SDK rely on the QuorumDriverError type to handle errors from the Quorum driver. This PR improves the clarity of errors for the QuorumDriverError type and its usage in RPCs.

  • a published transaction which has a conflict now provides the conflicting transaction
  • invalid user signature simplified
  • better formatting for object double used
  • rely on display when possible

Test plan

Augment the current RPC error tests surrounding From<QuorumDriverError> trait implementations to ensure that the error messages are clear and informative.

Release notes

  • Protocol:
  • Nodes (Validators and Full nodes):
  • Indexer:
  • JSON-RPC:
  • GraphQL:
  • CLI:
  • Rust SDK:
  • REST API:

Copy link

vercel bot commented Sep 6, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
sui-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Sep 13, 2024 0:29am
3 Skipped Deployments
Name Status Preview Comments Updated (UTC)
multisig-toolkit ⬜️ Ignored (Inspect) Visit Preview Sep 13, 2024 0:29am
sui-kiosk ⬜️ Ignored (Inspect) Visit Preview Sep 13, 2024 0:29am
sui-typescript-docs ⬜️ Ignored (Inspect) Visit Preview Sep 13, 2024 0:29am

@jordanjennings-mysten jordanjennings-mysten changed the title improve clarity of errors for QuorumDriverError type and it's usage in RPCs Improve clarity of errors for QuorumDriverError type and it's usage in RPCs Sep 6, 2024
@jordanjennings-mysten jordanjennings-mysten force-pushed the improve-quorum-driver-error-rpc branch 2 times, most recently from f8fb346 to 7c4c86c Compare September 6, 2024 19:26
@amnn amnn requested review from a team September 9, 2024 13:34
Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a couple of suggestions for the object-double used message, but everything else here is looking good modulo nits! If you'd like to split the PR up so the other changes can go ahead, feel free, and I'll stamp, otherwise, I don't think it will take too long to iterate on that last piece.

Comment on lines 101 to 104
#[error("Object {object_id} with version {object_version} is not available for consumption, current version: {current_version}")]
ObjectVersionUnavailableForConsumption {
provided_obj_ref: ObjectRef,
object_id: ObjectID,
object_version: SequenceNumber,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to keep the digest around -- that way you can quickly check whether the contents of the object has changed.

    #[error("Object {} with version {} (digest {}) is not available for consumption, current version: {current_version}", .provided_obj_ref.0, provided_obj_ref.1, provided_obj_ref.2)]
    ObjectVersionUnavailableForConsumption {
        provided_obj_ref: ObjectRef,

Copy link
Contributor Author

@jordanjennings-mysten jordanjennings-mysten Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled up the digest into the error type in my today, continuing on the path I'm on.

The options I considered:

  • pull the tuple elements up a level, which is shown here
  • impl display, requires changing from a type definition to a struct definition, possible but requires maybe twice as much code to change all the places where the type is instantiated
  • change the error formatting inside of the impl From, very messy because its nested 2 levels deep within the QuorumDriverError
  • Could just keep the debug error but doesn't look very great.

expected_code.assert_eq(&error_object.code().to_string());
let expected_message =
expect!["Transaction execution failed due to issues with transaction inputs, please review the errors and try again: Could not find the referenced object 0x0000000000000000000000000000000000000000000000000000000000000000 at version None."];
expect!["Transaction execution failed due to issues with transaction inputs, please review the errors and try again:\nError 1: Could not find the referenced object 0x0000000000000000000000000000000000000000000000000000000000000000 at version None"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (optional): I would tend to prefer bullet points, rather than enumerating the errors, (less line noise, and the order of the errors are unimportant here), but not strongly opposed to this format.

Suggested change
expect!["Transaction execution failed due to issues with transaction inputs, please review the errors and try again:\nError 1: Could not find the referenced object 0x0000000000000000000000000000000000000000000000000000000000000000 at version None"];
expect!["Transaction execution failed due to issues with transaction inputs, please review the errors and try again:\n - Could not find the referenced object 0x0000000000000000000000000000000000000000000000000000000000000000 at version None"];

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, less line noise can be better so I've changed it over to dashes.

let expected_code = expect!["-32002"];
expected_code.assert_eq(&error_object.code().to_string());
let expected_message = expect!["Failed to sign transaction by a quorum of validators because of locked objects. Retried a conflicting transaction Some(TransactionDigest(11111111111111111111111111111111)), success: Some(true)"];
let expected_message = expect!["Failed to sign transaction by a quorum of validators because of locked objects. Retried a conflicting transaction 8qbHbw2BbbTHBW1sbeqakYXVKRQM8Ne7pLK7m6CVfeR, success: true. Conflicting Transactions: [4vJ9JU1bJJE96FWSJKvHsmmFADCg4gpZQff4P3bkLKi]"];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Do you know whether usually the conflicting_txes map includes the transaction that was retried or not? It would be good to get the test to look like the "real thing" as much as possible.
  • One useful piece of information that's missing here is the weight of each conflicting transaction (how much of the stake by weight voted for each transaction -- stake_unit in the code). Can we order conflicting transactions in decreasing order of stake weight and include their stake weight as a percentage in the message? Stake units are given in basis points. Having this information will help identify whether the object is equivocated, or whether it is used for something else.
  • We can incorporate the success: true into the message a bit more -- do you know whether it's possible to extract more information about the conflicting transaction's execution status (if it also errored out, can we provide its error?)
  • Let's add some test cases for other errors in this same family (object double used):
    • Tried a conflicting transaction and it failed.
    • Multiple conflicting transactions.
    • Could not retry a conflicting transaction (does this imply that the object is equivocated? If so, can we make that more obvious in the error message?).

Copy link
Contributor Author

@jordanjennings-mysten jordanjennings-mysten Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know whether usually the conflicting_txes map includes the transaction that was retried or not? It would be good to get the test to look like the "real thing" as much as possible.

I just tried it and yeah this does appear to be the case.. A little confused by this because the way I set this up is by making two transactions that should hash differently (one transfer 1 coin the other moves 2 coins) so how are these digests the same? My understanding was that the conflicting transaction is different from the retried. I guess I'll have to dig some more, unsure how to format this error given this output.

JsonRpcError: Failed to sign transaction by a quorum of validators because of locked objects. Retried a conflicting transaction CoQQ96bjA5UVKhG7sFqjAPmCgcTf7kRY9W9jWHnsasNk, success: true. Conflicting Transactions: [CoQQ96bjA5UVKhG7sFqjAPmCgcTf7kRY9W9jWHnsasNk]

Order conflicting transactions in decreasing order of stake

I'll implement this

do you know whether it's possible to extract more information about the conflicting transaction's execution status

That might require a good amount of pluming since each usage of this error might not have that information available when the error is created. I'll take a look

@jordanjennings-mysten
Copy link
Contributor Author

jordanjennings-mysten commented Sep 9, 2024

Updates I've made:

  • keep the digest in the error type ObjectVersionUnavailableForConsumption
  • sort by stake amount for QuorumDriverError::ObjectsDoubleUsed
  • update the tests to reflect how QuorumDriverError::ObjectsDoubleUsed behaves where conflicting txs has the same tx as retried tx
  • bullet points on the error list

if you find that object double used could use some more work still I'll pull it out.

do you know whether it's possible to extract more information about the conflicting transaction's execution status

It doesn't seem like there is just digests, authority name, object ref, stake unit

This is the function which creates the error in src/quorum_driver/mod.rs, there doesn't appear to be other instances where the options are filled.

    async fn process_conflicting_tx(
        &self,
        tx_digest: TransactionDigest,
        conflicting_tx_digest: TransactionDigest,
        conflicting_tx_digests: BTreeMap<
            TransactionDigest,
            (Vec<(AuthorityName, ObjectRef)>, StakeUnit),
        >,
        client_addr: Option<SocketAddr>,
    ) -> Result<ProcessTransactionResult, Option<QuorumDriverError>> {

The only alternative I see is to query for more info but what if that fails? Getting into a double error would add a bit of complication.

@jordanjennings-mysten
Copy link
Contributor Author

from our conversation today around ObjectsDoubleUsed:

  • now informs the user when equivocation is detected
  • use a single option for ObjectsDoubleUsed.retried_tx_status via a tuple since retried_tx and retried_tx_success will always be set together.
  • gives stake counts in ObjectsDoubleUsed error output

Copy link
Contributor

@amnn amnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks @jordanjennings-mysten ! I added a final comment on wording for the object double use one but all the data is there now, and it's more a matter of formatting it, so accepting to unblock!

Comment on lines 165 to 172
.sorted_by(|(_, (_, a)), (_, (_, b))| b.cmp(a))
.map(|(digest, (_, stake))| format!(
"{{digest: {}, stake: {}}}",
digest,
stake
))
.join(", "),
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the representation of stake in basis points may be a little bit tough to understand, can we format it as an actual percentage, to two decimal points, e.g. 100.00% instead of 10000? I also don't think we need to make each conflicting transaction look like a JSON blob (because the error data field does have exactly this JSON blob), so we can make it a little easier for people to read.

Also, yesterday we talked about an error structure where we went from the highest level (least granular) information to the lowest level (most granular) information, and I still think that's worth doing, something like:

Failed to sign transaction by a quorum of validators because one or more of its objects is {reason}. Retried transaction {retry_digest} ({retry_status}) because it was able to gather the necessary votes. Other transactions locking these objects:

  • {} (stake {}.{}%)
  • {} (stake {}.{}%)
  • ...

Where:

  • reason is either "reserved for another transaction", or "equivocated until the next epoch".
  • retry_status is either "succeeded" or "failed".

That way we dole out more detail as people get further and further through the message (or conversely, someone who only wants to get the gist only needs to read a small part of the error).

Comment on lines 181 to 185
let weights = conflicting_txes
.values()
.map(|(_, stake)| *stake)
.collect::<Vec<_>>();
let remaining = TOTAL_VOTING_POWER - weights.iter().sum::<u64>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (optional -- because this is just a pet peeve of mine): I have an irrational hatred of rocket fishes, and I have no idea why, but I seek to avoid them at every turn. I won't hold it against you if you don't though.

Suggested change
let weights = conflicting_txes
.values()
.map(|(_, stake)| *stake)
.collect::<Vec<_>>();
let remaining = TOTAL_VOTING_POWER - weights.iter().sum::<u64>();
let weights: Vec<_> = conflicting_txes
.values()
.map(|(_, stake)| *stake)
.collect();
let remaining = TOTAL_VOTING_POWER - weights.iter().sum();

@jordanjennings-mysten jordanjennings-mysten merged commit cd6dfa4 into MystenLabs:main Sep 13, 2024
44 checks passed
suiwombat pushed a commit that referenced this pull request Sep 16, 2024
…n RPCs (#19258)

## Description 
The CLI and SDK rely on the QuorumDriverError type to handle errors from
the Quorum driver. This PR improves the clarity of errors for the
QuorumDriverError type and its usage in RPCs.
- a published transaction which has a conflict now provides the
conflicting transaction
- invalid user signature simplified 
- better formatting for object double used
- rely on display when possible

## Test plan 
Augment the current RPC error tests surrounding
`From<QuorumDriverError>` trait implementations to ensure that the error
messages are clear and informative.

## Release notes
- [ ] Protocol: 
- [ ] Nodes (Validators and Full nodes): 
- [ ] Indexer: 
- [ ] JSON-RPC:
- [ ] GraphQL: 
- [ ] CLI: 
- [ ] Rust SDK:
- [ ] REST API:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants