Skip to content

Forcing rollback after Out Of Memory kill causes Panic with nil pointer #719

Open
@carry2web

Description

@carry2web

We are facing a couple of times already very high memory consumption (128Gb) and OOM kills on backend.
The Rollback fails and causes Panic exit on nil pointer see logs below.

How to repair the /db as a full resync takes 4-5 days.
Any help appreciated.

Is there proper garbage collection done for Go? Any advice on environment setting for that?

badger 2025/03/11 12:15:37 INFO: Block cache metrics: hit: 6 miss: 859463 keys-added: 146367 keys-updated: 0 keys-evicted: 76523 cost-added: 569568571 cost-evicted: 301133431 sets-dropped: 0 sets-rejected: 709775 gets-dropped: 0 gets-kept: 857088 gets-total: 859469 hit-ratio: 0.00
I0311 12:17:25.728528 1 config.go:227] Logging to directory /db/v-00000
I0311 12:17:25.728671 1 config.go:228] Running node in MAINNET mode
I0311 12:17:25.728677 1 config.go:229] Data Directory: /db/v-00000
I0311 12:17:25.728684 1 config.go:240] PoS Validator: ON
I0311 12:17:25.728687 1 config.go:244] HyperSync: ON
I0311 12:17:25.728690 1 config.go:250] ForceChecksum: OFF - This could allow a peer to trick you into downloading bad hypersync state. Be sure you're connecting to a trustworthy sync peer.
I0311 12:17:25.728694 1 config.go:256] SnapshotBlockHeightPeriod: 600000
I0311 12:17:25.728701 1 config.go:260] ArchivalMode: ON
I0311 12:17:25.728704 1 config.go:263] SyncType: blocksync
I0311 12:17:25.728708 1 config.go:270] Connect IPs: [deso-seed-4.io:17000]
I0311 12:17:25.728712 1 config.go:293] Max Inbound Peers: 125
I0311 12:17:25.728716 1 config.go:294] Protocol listening on port 17000
I0311 12:17:25.728720 1 config.go:300] Rate Limit Feerate: 0
I0311 12:17:25.728723 1 config.go:301] Min Feerate: 1000
badger 2025/03/11 12:17:26 INFO: All 3320 tables opened in 386ms
badger 2025/03/11 12:17:26 INFO: Discard stats nextEmptySlot: 0
badger 2025/03/11 12:17:26 INFO: Set nextTxnTs to 672795186
badger 2025/03/11 12:17:26 INFO: Deleting empty file: /db/v-00000/badgerdb/000566.vlog
E0311 12:17:26.308269 1 snapshot.go:448] NewSnapshot: Node didn't shut down properly last time. Entering a recovery mode. The node will roll back to last snapshot epoch block height (13200000) and hash (e2f8cee35bcae255443c6c294af965240d8fb19e308c7ece7dcb465a61beb815), then restart.
I0311 12:22:59.231991 1 config.go:227] Logging to directory /db/v-00000
I0311 12:22:59.232119 1 config.go:228] Running node in MAINNET mode
I0311 12:22:59.232125 1 config.go:229] Data Directory: /db/v-00000
I0311 12:22:59.232132 1 config.go:240] PoS Validator: ON
I0311 12:22:59.232135 1 config.go:244] HyperSync: ON
I0311 12:22:59.232138 1 config.go:250] ForceChecksum: OFF - This could allow a peer to trick you into downloading bad hypersync state. Be sure you're connecting to a trustworthy sync peer.
I0311 12:22:59.232143 1 config.go:256] SnapshotBlockHeightPeriod: 600000
I0311 12:22:59.232148 1 config.go:260] ArchivalMode: ON
I0311 12:22:59.232152 1 config.go:263] SyncType: blocksync
I0311 12:22:59.232156 1 config.go:270] Connect IPs: [deso-seed-4.io:17000]
I0311 12:22:59.232161 1 config.go:293] Max Inbound Peers: 125
I0311 12:22:59.232164 1 config.go:294] Protocol listening on port 17000
I0311 12:22:59.232167 1 config.go:300] Rate Limit Feerate: 0
I0311 12:22:59.232171 1 config.go:301] Min Feerate: 1000
badger 2025/03/11 12:22:59 INFO: All 3320 tables opened in 387ms
badger 2025/03/11 12:22:59 INFO: Discard stats nextEmptySlot: 0
badger 2025/03/11 12:22:59 INFO: Set nextTxnTs to 672795186
badger 2025/03/11 12:22:59 INFO: Deleting empty file: /db/v-00000/badgerdb/000567.vlog
E0311 12:22:59.775777 1 snapshot.go:448] NewSnapshot: Node didn't shut down properly last time. Entering a recovery mode. The node will roll back to last snapshot epoch block height (13200000) and hash (e2f8cee35bcae255443c6c294af965240d8fb19e308c7ece7dcb465a61beb815), then restart.
badger 2025/03/11 12:23:59 INFO: Block cache metrics: hit: 5 miss: 878509 keys-added: 146436 keys-updated: 1 keys-evicted: 76591 cost-added: 569837756 cost-evicted: 301402693 sets-dropped: 0 sets-rejected: 728752 gets-dropped: 0 gets-kept: 875968 gets-total: 878515 hit-ratio: 0.00
E0311 12:25:22.716425 1 miner.go:410] BigintToHash: Bigint is nil
E0311 12:25:22.716444 1 miner.go:410] BigintToHash: Bigint is nil
I0311 12:25:22.716913 1 legacy_mempool.go:2609] Calling StartReadOnlyUtxoViewRegenerator...
E0311 12:25:22.722465 1 server.go:712] NewServer: Forcing a rollback to the last snapshot epoch because node was not closed properly last time
I0311 12:25:22.722496 1 node.go:381] Node.listenToNodeMessages: Stopping node
I0311 12:25:22.722517 1 node.go:317] Node is shutting down. This might take a minute. Please don't close the node now or else you might corrupt the state.
I0311 12:25:22.722525 1 node.go:321] Node.Stop: Stopping server...
I0311 12:25:22.722532 1 server.go:3188] Server.Stop: Gracefully shutting down Server
I0311 12:25:22.722537 1 server.go:3191] Server.Stop: Flushing logs for all peers
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1676827]

goroutine 53 [running]:
github.com/deso-protocol/core/lib.(*Server).Stop(0x0)
/deso/src/core/lib/server.go:3192 +0x107
github.com/deso-protocol/core/cmd.(*Node).Stop(0xc000a33680)
/deso/src/core/cmd/node.go:322 +0x15c
github.com/deso-protocol/core/cmd.(*Node).listenToNodeMessages(0xc000a33680, {0x0, 0x0, 0x0})
/deso/src/core/cmd/node.go:382 +0x12c
created by github.com/deso-protocol/core/cmd.(*Node).Start
/deso/src/core/cmd/node.go:79 +0x365

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions