Geth crashes when trying to query more than 10k transaction receipts concurrently. How to process at a faster rate? #3149

ancil-t · 2022-07-07T02:03:30Z

ancil-t
Jul 7, 2022

Hi there! Any help with the following would be much appreciated.

Context

I've got a large txt file (30GB+) containing transaction hashes. I have a nodejs script that reads this file and queries a full geth node for transaction receipts and saves these receipts in a csv file on disk.

Problem

Geth can handle 10K getTransactionReceipt requests in a Promise.all fine. But geth hangs/crashes when I increase this amount to anything higher. At 20k it can process it for a bit before it crashes. Anything higher than 20k, it just hangs and crashes immediately. Geth returns an error saying SERVER_ERROR missing response or some sort of socket hang up error. The error log also contains a massive array of RPC requests containing the individual tx hashes.

Configuration

I'm running geth on a m5.2xlarge (8 cores and 32GB memory) instance:

geth --syncmode snap --cache 26000 --txlookuplimit 0 --http --http.api personal,eth,net,web3,txpool

The CPU utilisation hovers in the 70-80% range and memory hovers in the 30-50% range.

Pseudocode

My script is basically like this

var provider = new ethers.providers.StaticJsonRpcProvider({ timeout: 1200000, errorPassThrough: true });

    while (line = liner.next()) {
        txHash = line.toString('ascii')
        txHashes.push(txHash)

        if (txHashes.length === 10_000) {
            await Promise.all(txHashes.map(queryAndSaveTxReceipt))
            txHashes = []
        }
    }

What I've tried

I've tried using JsonBatchRpcProvider but it doesn't seem to make a difference. I also read that it's not actually faster because the rpc calls have to be serialised. The documentation is not clear as to whether this only applies to state changing transaction calls vs just simple queries to fetch data
I've tried using a buffer of promises instead of using Promise.all. With this approach I send a getTransactionReceipt serially without waiting for confirmation. Prior to making the call I increment the buffer count and on successful response I decrement the count. I add an arbitrary wait of a couple of seconds once the buffer count exceeds 10k. I didn't find this approach to be any faster and geth would most likely crash when I increased the count from 10k. FYI I wasn't too rigorous in this implementation.

Questions

What can I do to process this at a faster rate?
Is the 10k limitation dependent on how much cpu and memory my machine has?
Are there any configurations I can tweak?
Can this script be implemented in a different way to process receipts faster?

Thank you

zemse · 2022-07-17T15:36:08Z

zemse
Jul 17, 2022
Collaborator

What can I do to process this at a faster rate?

If geth is on the same system as the node js script, you may explore IpcProvider to check if the bottleneck is due to communication over a network socket. IPC simply uses OS API for communication between two processes so if there would be any bottleneck due to network communication it might be avoided.

If the bottleneck is due to disk reads, you'd have to explore a faster SSD option (e.g. NVMe).

Is the 10k limitation dependent on how much cpu and memory my machine has? Are there any configurations I can tweak?

Since your system could go over 80% CPU and 50% memory, the bottleneck could be due to disk. But to be really sure you can briefly experiment with a higher capacity.

Can this script be implemented in a different way to process receipts faster?

Can't think of a faster way but I'm curious if your system has 8 cores, then does doing 10K concurrent requests vs 5k concurrent requests make any difference? Because at max, they would be resolved parallelly 8 at a time. So maybe you don't really need to shoot a lot of requests since they would go in waiting mode anyways. Maybe you can push 1K requests and if 500 of them are left then push 500 more, something like that. Though I'm not very sure about this.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Geth crashes when trying to query more than 10k transaction receipts concurrently. How to process at a faster rate? #3149

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Geth crashes when trying to query more than 10k transaction receipts concurrently. How to process at a faster rate? #3149

Uh oh!

ancil-t Jul 7, 2022

Context

Problem

Configuration

Pseudocode

What I've tried

Questions

Replies: 1 comment

Uh oh!

zemse Jul 17, 2022 Collaborator

ancil-t
Jul 7, 2022

zemse
Jul 17, 2022
Collaborator