fix: fix batch ordering issue #160

marshallyale · 2025-01-22T22:14:18Z

Fixes issue #75
This is my first open source pull request so I apologize for any formatting issues. Additionally, I don't know the repository as well as others so there may be a better way to implement the fix.

I believe I found the root cause of this. I added a pull request to fix, but I'm going to copy/paste what I believe is causing the error.

The main issue in the code is inside raw_client.rs inside the recv method implementation (snippet below):

rust-electrum-client/src/raw_client.rs

Lines 671 to 685 in 805ea0a

    
           fn recv( 
        
               &self, 
        
               receiver: &Receiver<ChannelMessage>, 
        
               req_id: usize, 
        
           ) -> Result<serde_json::Value, Error> { 
        
               loop { 
        
                   // Try to take the lock on the reader. If we manage to do so, we'll become the reader 
        
                   // thread until we get our reponse 
        
                   match self._reader_thread(Some(req_id)) { 
        
                       Ok(response) => break Ok(response), 
        
                       Err(Error::CouldntLockReader) => { 
        
                           match receiver.recv()? { 
        
                               // Received our response, returning it 
        
                               ChannelMessage::Response(received) => break Ok(received), 
        
                               ChannelMessage::WakeUp => {

When this is first called, the self._reader_thread will run. Inside the self._reader_thread, if the request id matches the response id, everything works fine. However, if the request id does not match the response id, we run the following code:

rust-electrum-client/src/raw_client.rs

Lines 602 to 612 in 805ea0a

    
           Some(resp_id) => { 
        
               // We have an id, but it's not our response. Notify the thread and 
        
               // move on 
        
               trace!("Reader thread received response for {}", resp_id); 
        
               if let Some(sender) = self.waiting_map.lock()?.remove(&resp_id) { 
        
                   sender.send(ChannelMessage::Response(resp))?; 
        
               } else { 
        
                   warn!("Missing listener for {}", resp_id); 
        
               } 
        
           }

The channel that the response is sent back into is not unique, but rather all the channels share the same sender.clone() and receiver. The only validation that is done is to check that the request id is still being searched for inside self.waiting_map. This means that the receiver channel receives whatever the next response is into the channel without any validation that it matches the request id which happens here match receiver.recv()?.

This is fixed by implementing unique channels for every request id. This fix can be verified with the code @johnzweng used to show the issue

use bitcoin::block::Header;
use electrum_client::{Client, ElectrumApi};
use std::process;

fn main() {
    let url = "tcp://exs.dyshek.org:50001";
    let client = Client::new(url).unwrap();

    loop {
        let heights: Vec<u32> = vec![1, 4, 8, 12, 222, 6666];
        let headers: Vec<Header> = client.batch_block_header(heights).unwrap();

        let mut err_counter = 0;
        if headers.get(0).unwrap().time != 1231469665 {
            // time of block 1
            err_counter += 1;
            println!("error 0");
            process::exit(1);
        }
        if headers.get(1).unwrap().time != 1231470988 {
            // time of block 4
            err_counter += 1;
            println!("error 1");
            process::exit(1);
        }
        if headers.get(2).unwrap().time != 1231472743 {
            // time of block 8
            err_counter += 1;
            println!("error 2");
            process::exit(1);
        }
        if headers.get(3).unwrap().time != 1231474888 {
            // time of block 12
            err_counter += 1;
            println!("error 3");
            process::exit(1);
        }
        if headers.get(4).unwrap().time != 1231770653 {
            // time of block 222
            err_counter += 1;
            println!("error 4");
            process::exit(1);
        }
        if headers.get(5).unwrap().time != 1236456633 {
            // time of block 6666
            err_counter += 1;
            println!("error 5");
            process::exit(1);
        }
        println!("==========================================\n")
    }
}

If you run this with the initial code, it will error out after 1-10 cycles normally. However, after the fix this runs indefinitely.

notmandatory · 2025-01-23T16:00:54Z

@marshallyale thanks for identifying and working on this. Please rebase your branch since I just merged #159 which changes this projects MSRV and CI checks. Also you have a few clippy errors that need to be fixed. We'll need to give time to get it reviewed, so be patient with getting this merged. 🙂

marshallyale · 2025-01-24T20:05:30Z

No worries, I should have just rebased and pushed this. I'm working off my forked repo currently so there's no rush on getting any changes synced. Let me know if there's anything else I need to change!

oleonardolima

Thanks for working on this fix!

I've reviewed the issue and proposed changes, it seems it's a library problem as it's not management and matching context between requests and responses, as the response for a batch call may be out of ordered (as per JSON-RPC spec).

Using an exclusive channel for each request in a batch, solves the problem, however I'm trying to think about any shortcomings of that approach.

oleonardolima · 2025-02-04T03:12:04Z

I'm taking another look at this one this week, and I think it's best to come up with two #[test]'s: (i) checking/asserting that the request_id's are correct within a batch request, and (ii) single requests / batch requests doesn't mixups the request_id's too. By having a test running on CI/locally it's easier to iterate over this solution and other ones.

ValuedMammal · 2025-02-17T19:47:59Z

I think the unique channels approach in this PR is a valid solution. Perhaps an alternative would be to change ChannelMessage::Response to carry the response id in a tuple of (usize, Value) and use this when inserting into answers during batch_call. Since answers is an ordered map, we'll still return the responses in the correct order.

notmandatory · 2025-03-13T22:59:24Z

@marshallyale did you see the comment above? If you don't have time to look at it now we may open a new PR to try out his suggestion.

marshallyale · 2025-03-14T05:12:48Z

Yes, apologies for the delay. I will write some tests and look into the ChannelMessage::Response this weekend sometime.

marshallyale · 2025-03-15T18:45:50Z

I've added a single test that tests batch_block_header and asserts that the timestamp matches up to the expected times. This is looped 10 times, although it would fail within 5 times normally. I had to use a different client (exs.dyshek.org:50001) since it appears that electrum.blockstream.info always returns results in order, which is not true of other servers.

I looked at trying to implement tests comparing request ids to response ids, but there isn't a great way to do this without doing a bit of code refactor and adding additional loops to the code. Both batch_call and raw_call don't return the json response ids, so we would have to break the actual json responses into a separate function that returns the raw request-response matchup, and then iterate over this to get the actual result in its current format.

Additionally, I looked into changing ChannelMessage::Response from Value to (req_id, value). This kind of gets back to the original issue with each request sharing the same channel for receiving, as the receiver will receive the response but would then have to check that the response id is equal to the request id and if not, would have to put this message back in the channel until the correct request id gets it. It could definitely be done, but would require a bit more reworking than the current separate channels approach.

I think that this could probably be done in a more elegant/faster way, but would require more of an overhaul of the send/receive logic in the code. Maybe this is just a function of me being very new to the code, but I personally think the recv function and _reader_thread code is confusing which makes it hard to come up with alternatives without a major code refactor.

Finally, it looks like the tests failed because actions/cache@v2 is deprecated. Want me to update to v3 (or v4)?

…essage fixes bitcoindevkit#160

ValuedMammal

Approach ACK

src/raw_client.rs

ValuedMammal · 2025-03-17T23:04:48Z

Finally, it looks like the tests failed because actions/cache@v2 is deprecated. Want me to update to v3 (or v4)?

If you want to push an update to actions/cache that would be fine or else we can do it in another PR (#167).

marshallyale · 2025-03-17T23:31:13Z

I added the changes and updated the actions/cache to v4. Hopefully easier than multiple prs.

ValuedMammal

ACK b6230be

Fixes issue bitcoindevkit#75 Raw client waiting map was using the same channel for every request/response. When items were put back into the channel inside of _reader_thread the waiting receiver in recv would just take the next response without validating it on request id request. This fixes this by using unique channels for each request response inside of the waiting map.

ValuedMammal

reACK a3488f4

Thanks again @marshallyale

notmandatory added the bug Something isn't working label Jan 23, 2025

notmandatory assigned marshallyale Jan 23, 2025

marshallyale force-pushed the fix_transaction_ordering branch from 8c98446 to 2d3f4a5 Compare January 24, 2025 20:02

oleonardolima self-requested a review January 27, 2025 14:33

oleonardolima reviewed Jan 30, 2025

View reviewed changes

marshallyale force-pushed the fix_transaction_ordering branch from 2d3f4a5 to 294dab4 Compare March 15, 2025 18:17

ValuedMammal added a commit to ValuedMammal/rust-electrum-client that referenced this pull request Mar 16, 2025

fix order of batch calls by including the response id in the ChannelM…

e7ba850

…essage fixes bitcoindevkit#160

ValuedMammal requested changes Mar 17, 2025

View reviewed changes

src/raw_client.rs Show resolved Hide resolved

src/raw_client.rs Show resolved Hide resolved

marshallyale requested a review from ValuedMammal March 17, 2025 23:39

ValuedMammal approved these changes Mar 17, 2025

View reviewed changes

marshallyale and others added 3 commits March 28, 2025 12:34

test: add batch response ordering test

0ee0a6a

ci: bump actions/cache to v4

a3488f4

ValuedMammal force-pushed the fix_transaction_ordering branch from b6230be to a3488f4 Compare March 28, 2025 16:39

ValuedMammal approved these changes Mar 28, 2025

View reviewed changes

ValuedMammal merged commit 862280a into bitcoindevkit:master Mar 28, 2025
4 checks passed

jp1ac4 mentioned this pull request Jun 9, 2025

Wrong confirmation timestamps when using Fulcrum as Electrum node wizardsardine/liana#1585

Open

	fn recv(
	&self,
	receiver: &Receiver<ChannelMessage>,
	req_id: usize,
	) -> Result<serde_json::Value, Error> {
	loop {
	// Try to take the lock on the reader. If we manage to do so, we'll become the reader
	// thread until we get our reponse
	match self._reader_thread(Some(req_id)) {
	Ok(response) => break Ok(response),
	Err(Error::CouldntLockReader) => {
	match receiver.recv()? {
	// Received our response, returning it
	ChannelMessage::Response(received) => break Ok(received),
	ChannelMessage::WakeUp => {

	Some(resp_id) => {
	// We have an id, but it's not our response. Notify the thread and
	// move on
	trace!("Reader thread received response for {}", resp_id);

	if let Some(sender) = self.waiting_map.lock()?.remove(&resp_id) {
	sender.send(ChannelMessage::Response(resp))?;
	} else {
	warn!("Missing listener for {}", resp_id);
	}
	}

fix: fix batch ordering issue #160

fix: fix batch ordering issue #160

Uh oh!

Conversation

marshallyale commented Jan 22, 2025 • edited by ValuedMammal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

notmandatory commented Jan 23, 2025

Uh oh!

marshallyale commented Jan 24, 2025

Uh oh!

oleonardolima left a comment

Choose a reason for hiding this comment

Uh oh!

oleonardolima commented Feb 4, 2025

Uh oh!

ValuedMammal commented Feb 17, 2025

Uh oh!

notmandatory commented Mar 13, 2025

Uh oh!

marshallyale commented Mar 14, 2025

Uh oh!

marshallyale commented Mar 15, 2025

Uh oh!

ValuedMammal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ValuedMammal commented Mar 17, 2025

Uh oh!

marshallyale commented Mar 17, 2025

Uh oh!

ValuedMammal left a comment

Choose a reason for hiding this comment

Uh oh!

ValuedMammal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

marshallyale commented Jan 22, 2025 •

edited by ValuedMammal

Loading