-
-
Notifications
You must be signed in to change notification settings - Fork 68
Heartbeat process causes disconnection for bots with large guild membership #415
Description
Describe the bug
When connecting to Discord with a large guild membership the heartbeat process will continually reconnect due to not receiving heartbeat ACKs. This appears to be caused by Discord prioritizing GUILD_CREATE
events which are sent as a synchronization process after the READY
event.
Expected behavior
Disgord should not trigger a reconnect attempt during guild synchronization even if heartbeat ACKs are delayed.
Error messages
This appears as normal heartbeat timeout errors after the initial connection setup is performed and the READY
event has been received.
INFO[0000] Connecting to discord Gateway
DEBU[0000] shard 0 intents: { Guilds | GuildMembers | GuildBans | GuildEmojis | GuildIntegrations | GuildWebhooks | GuildInvites | GuildVoiceStates | GuildMessages | GuildMessageReactions | GuildMessageTyping }
DEBU[0000] shard 0 rejects events: [PRESENCE_UPDATE GUILD_MEMBER_UPDATE GUILD_MEMBER_REMOVE PRESENCES_REPLACE]
DEBU[0000] shard 1 intents: { Guilds | GuildMembers | GuildBans | GuildEmojis | GuildIntegrations | GuildWebhooks | GuildInvites | GuildVoiceStates | GuildMessages | GuildMessageReactions | GuildMessageTyping }
DEBU[0000] shard 1 rejects events: [PRESENCE_UPDATE GUILD_MEMBER_UPDATE GUILD_MEMBER_REMOVE PRESENCES_REPLACE]
DEBU[0000] shard 2 intents: { Guilds | GuildMembers | GuildBans | GuildEmojis | GuildIntegrations | GuildWebhooks | GuildInvites | GuildVoiceStates | GuildMessages | GuildMessageReactions | GuildMessageTyping }
DEBU[0000] shard 2 rejects events: [PRESENCE_UPDATE GUILD_MEMBER_UPDATE GUILD_MEMBER_REMOVE PRESENCES_REPLACE]
DEBU[0000] shard 3 intents: { Guilds | GuildMembers | GuildBans | GuildEmojis | GuildIntegrations | GuildWebhooks | GuildInvites | GuildVoiceStates | GuildMessages | GuildMessageReactions | GuildMessageTyping }
DEBU[0000] shard 3 rejects events: [PRESENCE_UPDATE GUILD_MEMBER_UPDATE GUILD_MEMBER_REMOVE PRESENCES_REPLACE]
DEBU[0000] shard 4 intents: { Guilds | GuildMembers | GuildBans | GuildEmojis | GuildIntegrations | GuildWebhooks | GuildInvites | GuildVoiceStates | GuildMessages | GuildMessageReactions | GuildMessageTyping }
DEBU[0000] shard 4 rejects events: [PRESENCE_UPDATE GUILD_MEMBER_UPDATE GUILD_MEMBER_REMOVE PRESENCES_REPLACE]
DEBU[0000] shard 5 intents: { Guilds | GuildMembers | GuildBans | GuildEmojis | GuildIntegrations | GuildWebhooks | GuildInvites | GuildVoiceStates | GuildMessages | GuildMessageReactions | GuildMessageTyping }
DEBU[0000] shard 5 rejects events: [PRESENCE_UPDATE GUILD_MEMBER_UPDATE GUILD_MEMBER_REMOVE PRESENCES_REPLACE]
DEBU[0000] [ws-e,s:1,shard:3]trying to connect
DEBU[0000] [shardSync]shard3is waiting to identify
DEBU[0000] [ws-e,s:2,shard:3]waiting to send identify/resume
DEBU[0000] [ws-e,s:4,shard:3]Ready to receive operation codes...
DEBU[0000] [ws-e,s:3,shard:3]starting emitter
DEBU[0000] [ws-e,s:5,shard:3]starting receiver
DEBU[0000] [ws-e,s:6,shard:3]sendIdentityPacket is acquiring once channel
DEBU[0000] [ws-e,s:7,shard:3]writing to once channel0xc00007eb40
DEBU[0000] [ws-e,s:9,shard:3]finished writing to once channel0xc00007eb40
DEBU[0000] [ws-e,s:8,shard:3]heartbeat ACK ok
DEBU[0000] [ws-e,s:11,shard:3]sent heartbeat
DEBU[0000] [ws-e,s:10,shard:3]sent identify/resume
DEBU[0000] [shardSync]shard3waited and finished execution after171.316543ms
DEBU[0000] [ws-e,s:12,shard:3]establishing connection succeeded
DEBU[0000] [ws-e,s:1,shard:4]trying to connect
...
INFO[0031] Connected
DEBU[0041] [ws-e,s:13,shard:3]heartbeat ACK ok
DEBU[0041] [ws-e,s:14,shard:3]sent heartbeat
DEBU[0047] [ws-e,s:13,shard:4]heartbeat ACK ok
DEBU[0047] [ws-e,s:14,shard:4]sent heartbeat
DEBU[0053] [ws-e,s:13,shard:5]heartbeat ACK ok
DEBU[0053] [ws-e,s:14,shard:5]sent heartbeat
DEBU[0059] [ws-e,s:13,shard:0]heartbeat ACK ok
DEBU[0059] [ws-e,s:14,shard:0]sent heartbeat
DEBU[0067] [ws-e,s:13,shard:1]heartbeat ACK ok
DEBU[0067] [ws-e,s:14,shard:1]sent heartbeat
DEBU[0073] [ws-e,s:13,shard:2]heartbeat ACK ok
DEBU[0073] [ws-e,s:14,shard:2]sent heartbeat
INFO[0083] [ws-e,s:15,shard:3]heartbeat ACK was not received, forcing reconnect
DEBU[0083] [ws-e,s:16,shard:3]stopping pulse
DEBU[0083] [ws-e,s:17,shard:3]is reconnecting
DEBU[0083] [ws-e,s:18,shard:3]closing operations handler
DEBU[0083] [ws-e,s:19,shard:3]closing emitter
INFO[0083] [ws-e,s:20,shard:3]disconnected
DEBU[0083] [ws-e,s:21,shard:3]reconnecting failed: failed to close WebSocket: failed to write control frame opClose: failed to write frame: WebSocket closed: read timed out: context canceled
DEBU[0083] [ws-e,s:22,shard:3]trying to connect
DEBU[0083] [ws-e,s:23,shard:3]closing receiver
...
Desktop (please complete the following information):
- Golang version: v16.6.6
- Using Go modules? yes
- Disgord version? 0.27.3
- Connected to the gateway before using REST methods? yes
Additional context
I did some testing with event flow and found that Discord will block heartbeat ACKs during the guild synchronization phase of a new connection which leads to the timeout issues described above.
What appears to be happening is that a stream of GUILD_CREATE
events will be received after the READY
event. During this time a heartbeat request will be sent but no ACK will be received which triggers the reconnect logic. If the reconnect logic is disabled, multiple heartbeat ACKs are received after the guild synchronization is complete indicating that they are delayed.
I tested some possible solutions but didn't find any of them satisfactory:
-
Set
lastHeartbeatAck
for everyGUILD_CREATE
event. This resolves the issue by keeping the ACK timestamp fresh as events are received during guild synchronization but is philosophically wrong. -
Set a flag that will ignore missed heartbeats after the
READY
event until the first ACK is received. This would effectively turn off the reconnect logic until after guild synchronization is complete but offers no connection tracking during this process. -
Use a timer and set of flags to detect the guild synchronization process and only turn on heartbeats after it is complete. This is a more complex solution and technically still suffers from the drawback in solution 2.
One other issue is that you cannot turn heartbeats off completely or you will get read errors from the receiver which also trigger a reconnect. I haven't had the time to trace it down, but I'm assuming that Discord is dropping the websocket connection if no traffic is sent periodically. This means that even if heartbeat ACKs are ignored, heartbeat requests still need to be sent to prevent this error from occurring.
So I'm kind of at a loss as to what a proper solution should be. I'd be happy to post a PR for this if something reasonable could be suggested. I thought for now the first step would be to report it and go from there.