-
Notifications
You must be signed in to change notification settings - Fork 731
Description
Bug Description
Client will disconnect/reconnect forever in a loop, if the server restarts and a QOS=1 message using topic aliasing wasn't ACKed yet.
The issue appears to occur because upon reconnection, the client will resend all qos > 0 messages that were in flight at the time of disconnection.
However - if one of those messages was using topic aliasing - and set the topic to ""
, then when it is sent to the server (which has restarted and doesn't know the session of the client) the server will close the connection for reason "Protocol Error" (mosquitto). This is because the message had an alias index, but no topic (and the server's state won't know what to do with the message).
The server will now disconnect the client.
The client will reconnect, and the same issue occurs.
NOTE: Probably related to this: #855
But I don't think this scenario is covered by that discussion
Reproduction
Obviously timing this perfectly with a server restart isn't super easy. But recreating the state it ends up in IS.
So here is a quick example script that demonstrates the issue with the current form of the logic.
I reproduced the behaviour using test.mosquitto.org.
import paho.mqtt.client as mqtt
def main():
def on_connect(client, userdata, flags, reason_code, properties):
print(f"Connected with result code {reason_code}")
def on_disconnect(client, userdata, flags, reason_code, properties):
print(f"Disconnected with result code {reason_code}")
def on_message(client, userdata, msg):
print(msg.topic + " " + str(msg.payload), msg.properties)
def on_log(client, userdata, level, buf):
print("PAHO-MQTT-LOG:", level, buf)
mqttc = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, protocol=mqtt.MQTTv5)
mqttc.on_connect = on_connect
mqttc.on_message = on_message
mqttc.on_disconnect = on_disconnect
mqttc.on_log = on_log
# Publish a message using an alias - that was previously registered in an earlier session.
# If the server has forgotten the client (i.e. a reboot) and we've just reconnected then this message should
# cause an error on the server because it doesn't know what topic the topicAlias corresponds to.
alias_property = mqtt.Properties(mqtt.PacketTypes.PUBLISH)
alias_property.TopicAlias = 1
mqttc.publish(topic="", payload="hi_aliased", qos=1, properties=alias_property)
mqttc.connect("test.mosquitto.org", 1883, 60, clean_start=True)
mqttc.loop_forever()
# The trouble starts with the error handling of this case.
# The client will be stuck sending the same broken message, and keep reconnecting.
if __name__ == "__main__":
main()
Environment
- Python version: 3.9.0
- Library version: paho-mqtt==2.1.0
- Operating system (including version): Windows 11
- MQTT server (name, version, configuration, hosting details): mosquitto (test.mosquitto.org, and local mosquitto server using docker).
Logs
Reproduction script logs:
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending CONNECT (u1, p1, wr0, wq0, wf0, c1, k60) client_id=b'' properties=None
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received CONNACK (0, Success) properties=[AssignedClientIdentifier : auto-CDF56F33-D918-6599-9E6E-9BF26E59C7DF, ReceiveMaximum : 4096, TopicAliasMaximum : 10]
Connected with result code Success
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending PUBLISH (d0, q1, r0, m1), 'b''', properties=[TopicAlias : 1], ... (10 bytes)
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received DISCONNECT None None
Disconnected with result code Normal disconnection
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending CONNECT (u1, p1, wr0, wq0, wf0, c0, k60) client_id=b'' properties=None
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received CONNACK (0, Success) properties=[AssignedClientIdentifier : auto-08C8E0B5-A2BC-4530-1BC0-A0C1C4A4123B, ReceiveMaximum : 4096, TopicAliasMaximum : 10]
Connected with result code Success
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending PUBLISH (d1, q1, r0, m1), 'b''', properties=[TopicAlias : 1], ... (10 bytes)
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received DISCONNECT None None
Disconnected with result code Normal disconnection
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending CONNECT (u1, p1, wr0, wq0, wf0, c0, k60) client_id=b'' properties=None
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received CONNACK (0, Success) properties=[AssignedClientIdentifier : auto-714B2B0C-53E0-0370-C2EA-CBB2EF3626F3, ReceiveMaximum : 4096, TopicAliasMaximum : 10]
Connected with result code Success
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending PUBLISH (d1, q1, r0, m1), 'b''', properties=[TopicAlias : 1], ... (10 bytes)
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received DISCONNECT None None
Disconnected with result code Normal disconnection
Broker logs:
mqtt5 | 1749642878: New connection from 172.18.0.1:44450 on port 1883.
mqtt5 | 1749642878: New client connected from 172.18.0.1:44450 as auto-CDF56F33-D918-6599-9E6E-9BF26E59C7DF (p5, c1, k60, u'').
mqtt5 | 1749642878: Client auto-CDF56F33-D918-6599-9E6E-9BF26E59C7DF disconnected due to protocol error.
mqtt5 | 1749642879: New connection from 172.18.0.1:44462 on port 1883.
mqtt5 | 1749642879: New client connected from 172.18.0.1:44462 as auto-08C8E0B5-A2BC-4530-1BC0-A0C1C4A4123B (p5, c0, k60, u'').
mqtt5 | 1749642879: Client auto-08C8E0B5-A2BC-4530-1BC0-A0C1C4A4123B disconnected due to protocol error.
mqtt5 | 1749642880: New connection from 172.18.0.1:44466 on port 1883.
mqtt5 | 1749642880: New client connected from 172.18.0.1:44466 as auto-714B2B0C-53E0-0370-C2EA-CBB2EF3626F3 (p5, c0, k60, u'').
mqtt5 | 1749642880: Client auto-714B2B0C-53E0-0370-C2EA-CBB2EF3626F3 disconnected due to protocol error.
Possible solutions?
So the client gets stuck sending a broken message that the server will never understand.
Some possible solution ideas I had...
- Detect the topic alias being used when a session couldn't be restored and skip the message?
- Allow a setting (for users of topic aliasing) that drops all pending messages on connect?
(Or have an API we can call duringon_disconnect
.mqttc.clear_messages()
) - Implement topic aliasing inside the client library so it can know the appropriate serialization (according to connection state).
This would mean the user of the library ALWAYS supplies the topic name, and the library would keep track of whether to send the topic name or not.
For now, I am adding the following hacky code to my on_connect
callback:
(which is ugly in that it needs to access internal members of the object and may end up causing more bugs).
It does allow my clients to reliably reconnect and recover from server restarts though.
def on_connect(client, userdata, flags, reason_code, properties):
with client._out_message_mutex:
client._out_messages.clear()