-
Notifications
You must be signed in to change notification settings - Fork 65
fix: fix batch ordering issue #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fix batch ordering issue #160
Conversation
@marshallyale thanks for identifying and working on this. Please rebase your branch since I just merged #159 which changes this projects MSRV and CI checks. Also you have a few clippy errors that need to be fixed. We'll need to give time to get it reviewed, so be patient with getting this merged. 🙂 |
8c98446
to
2d3f4a5
Compare
No worries, I should have just rebased and pushed this. I'm working off my forked repo currently so there's no rush on getting any changes synced. Let me know if there's anything else I need to change! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this fix!
I've reviewed the issue and proposed changes, it seems it's a library problem as it's not management and matching context between requests and responses, as the response for a batch call may be out of ordered (as per JSON-RPC spec).
Using an exclusive channel for each request in a batch, solves the problem, however I'm trying to think about any shortcomings of that approach.
I'm taking another look at this one this week, and I think it's best to come up with two |
I think the unique channels approach in this PR is a valid solution. Perhaps an alternative would be to change |
@marshallyale did you see the comment above? If you don't have time to look at it now we may open a new PR to try out his suggestion. |
Yes, apologies for the delay. I will write some tests and look into the ChannelMessage::Response this weekend sometime. |
2d3f4a5
to
294dab4
Compare
I've added a single test that tests batch_block_header and asserts that the timestamp matches up to the expected times. This is looped 10 times, although it would fail within 5 times normally. I had to use a different client (exs.dyshek.org:50001) since it appears that electrum.blockstream.info always returns results in order, which is not true of other servers. I looked at trying to implement tests comparing request ids to response ids, but there isn't a great way to do this without doing a bit of code refactor and adding additional loops to the code. Both batch_call and raw_call don't return the json response ids, so we would have to break the actual json responses into a separate function that returns the raw request-response matchup, and then iterate over this to get the actual result in its current format. Additionally, I looked into changing ChannelMessage::Response from Value to (req_id, value). This kind of gets back to the original issue with each request sharing the same channel for receiving, as the receiver will receive the response but would then have to check that the response id is equal to the request id and if not, would have to put this message back in the channel until the correct request id gets it. It could definitely be done, but would require a bit more reworking than the current separate channels approach. I think that this could probably be done in a more elegant/faster way, but would require more of an overhaul of the send/receive logic in the code. Maybe this is just a function of me being very new to the code, but I personally think the recv function and _reader_thread code is confusing which makes it hard to come up with alternatives without a major code refactor. Finally, it looks like the tests failed because actions/cache@v2 is deprecated. Want me to update to v3 (or v4)? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approach ACK
If you want to push an update to |
I added the changes and updated the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK b6230be
Fixes issue bitcoindevkit#75 Raw client waiting map was using the same channel for every request/response. When items were put back into the channel inside of _reader_thread the waiting receiver in recv would just take the next response without validating it on request id request. This fixes this by using unique channels for each request response inside of the waiting map.
b6230be
to
a3488f4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reACK a3488f4
Thanks again @marshallyale
Fixes issue #75
This is my first open source pull request so I apologize for any formatting issues. Additionally, I don't know the repository as well as others so there may be a better way to implement the fix.
I believe I found the root cause of this. I added a pull request to fix, but I'm going to copy/paste what I believe is causing the error.
The main issue in the code is inside raw_client.rs inside the
recv
method implementation (snippet below):rust-electrum-client/src/raw_client.rs
Lines 671 to 685 in 805ea0a
When this is first called, the
self._reader_thread
will run. Inside theself._reader_thread
, if the request id matches the response id, everything works fine. However, if the request id does not match the response id, we run the following code:rust-electrum-client/src/raw_client.rs
Lines 602 to 612 in 805ea0a
The channel that the response is sent back into is not unique, but rather all the channels share the same sender.clone() and receiver. The only validation that is done is to check that the request id is still being searched for inside
self.waiting_map
. This means that the receiver channel receives whatever the next response is into the channel without any validation that it matches the request id which happens herematch receiver.recv()?
.This is fixed by implementing unique channels for every request id. This fix can be verified with the code @johnzweng used to show the issue
If you run this with the initial code, it will error out after 1-10 cycles normally. However, after the fix this runs indefinitely.