Skip to content

Bug: Restarts during StateSync creates bad state #1516

@outerlook

Description

@outerlook

@vinarmani

Just did a brand new install on a fresh Ubuntu instance.

...

When I run through the instructions in the docs here: https://docs.truf.network/node-operator-guide

The node stalls out after downloading the snapshot

Jun 02 11:44:59 truf-dev-node kwild[7511]: 2025-06-02 11:44:59.061 [INF] STATESYNC: Received snapshot chunk {height=40000 index=49 provider=12D3KooWAfCxVs6Q9KWsABjMiL4Wayq4fAvJFUrGDAK1xrdYZ3vu}
Jun 02 11:44:59 truf-dev-node kwild[7511]: 2025-06-02 11:44:59.197 [INF] STATESYNC: Fresh chunk download completed using range protocol {chunk=50 bytes_written=15995904 copy_duration=2.062116484s}
Jun 02 11:44:59 truf-dev-node kwild[7511]: 2025-06-02 11:44:59.197 [INF] STATESYNC: Chunk download successful {chunk=50 provider=12D3KooWAfCxVs6Q9KWsABjMiL4Wayq4fAvJFUrGDAK1xrdYZ3vu attempt=1}
Jun 02 11:44:59 truf-dev-node kwild[7511]: 2025-06-02 11:44:59.197 [INF] STATESYNC: Received snapshot chunk {height=40000 index=50 provider=12D3KooWAfCxVs6Q9KWsABjMiL4Wayq4fAvJFUrGDAK1xrdYZ3vu}
Jun 02 11:44:59 truf-dev-node kwild[7511]: 2025-06-02 11:44:59.197 [INF] STATESYNC: All chunks downloaded successfully {total_chunks=52}
Jun 02 11:44:59 truf-dev-node kwild[7511]: 2025-06-02 11:44:59.199 [INF] STATESYNC: Restore DB:  {command=/usr/bin/psql --username kwild --host 127.0.0.1 --port 5432 --dbname kwild --no-password}

...

Looks like restarting when it hangs puts it in a bad state and throws it into a restart loop

Jun 02 13:01:28 truf-dev-node kwild[12704]: 2025-06-02 13:01:28.487 [INF] KWILD: Closing blockstore
Jun 02 13:01:28 truf-dev-node kwild[12704]: 2025-06-02 13:01:28.497 [INF] KWILD: Server is now shut down.
Jun 02 13:01:28 truf-dev-node kwild[12704]: node stopped with error: error catching up: app height 40000 is greater than the store height 0 (did you forget to reset postgres?)
Jun 02 13:01:28 truf-dev-node systemd[1]: kwild.service: Deactivated successfully.
Jun 02 13:01:28 truf-dev-node systemd[1]: kwild.service: Consumed 1.482s CPU time.

We need some UI element to indicate work is taking place during that period. Otherwise, the default behavior will be restarting.

context: https://github.com/trufnetwork/truf-network/issues/814#issuecomment-2930348895

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions