Skip to content

Infinite disconnect/reconnect loops because reconnection resends qos>0 messages even when they're using topic aliasing #893

@inbilla

Description

@inbilla

Bug Description

Client will disconnect/reconnect forever in a loop, if the server restarts and a QOS=1 message using topic aliasing wasn't ACKed yet.

The issue appears to occur because upon reconnection, the client will resend all qos > 0 messages that were in flight at the time of disconnection.
However - if one of those messages was using topic aliasing - and set the topic to "", then when it is sent to the server (which has restarted and doesn't know the session of the client) the server will close the connection for reason "Protocol Error" (mosquitto). This is because the message had an alias index, but no topic (and the server's state won't know what to do with the message).
The server will now disconnect the client.
The client will reconnect, and the same issue occurs.

NOTE: Probably related to this: #855
But I don't think this scenario is covered by that discussion

Reproduction

Obviously timing this perfectly with a server restart isn't super easy. But recreating the state it ends up in IS.
So here is a quick example script that demonstrates the issue with the current form of the logic.
I reproduced the behaviour using test.mosquitto.org.

import paho.mqtt.client as mqtt


def main():

    def on_connect(client, userdata, flags, reason_code, properties):
        print(f"Connected with result code {reason_code}")

    def on_disconnect(client, userdata, flags, reason_code, properties):
        print(f"Disconnected with result code {reason_code}")

    def on_message(client, userdata, msg):
        print(msg.topic + " " + str(msg.payload), msg.properties)

     def on_log(client, userdata, level, buf):
        print("PAHO-MQTT-LOG:", level, buf)

    mqttc = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, protocol=mqtt.MQTTv5)
    mqttc.on_connect = on_connect
    mqttc.on_message = on_message
    mqttc.on_disconnect = on_disconnect
    mqttc.on_log = on_log

    # Publish a message using an alias - that was previously registered in an earlier session.
    # If the server has forgotten the client (i.e. a reboot) and we've just reconnected then this message should
    # cause an error on the server because it doesn't know what topic the topicAlias corresponds to.
    alias_property = mqtt.Properties(mqtt.PacketTypes.PUBLISH)
    alias_property.TopicAlias = 1
    mqttc.publish(topic="", payload="hi_aliased", qos=1, properties=alias_property)

    mqttc.connect("test.mosquitto.org", 1883, 60, clean_start=True)
    mqttc.loop_forever()

    # The trouble starts with the error handling of this case.
    # The client will be stuck sending the same broken message, and keep reconnecting.
    
if __name__ == "__main__":
    main()

Environment

  • Python version: 3.9.0
  • Library version: paho-mqtt==2.1.0
  • Operating system (including version): Windows 11
  • MQTT server (name, version, configuration, hosting details): mosquitto (test.mosquitto.org, and local mosquitto server using docker).

Logs

Reproduction script logs:

PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending CONNECT (u1, p1, wr0, wq0, wf0, c1, k60) client_id=b'' properties=None
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received CONNACK (0, Success) properties=[AssignedClientIdentifier : auto-CDF56F33-D918-6599-9E6E-9BF26E59C7DF, ReceiveMaximum : 4096, TopicAliasMaximum : 10]
Connected with result code Success
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending PUBLISH (d0, q1, r0, m1), 'b''', properties=[TopicAlias : 1], ... (10 bytes)
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received DISCONNECT None None
Disconnected with result code Normal disconnection
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending CONNECT (u1, p1, wr0, wq0, wf0, c0, k60) client_id=b'' properties=None
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received CONNACK (0, Success) properties=[AssignedClientIdentifier : auto-08C8E0B5-A2BC-4530-1BC0-A0C1C4A4123B, ReceiveMaximum : 4096, TopicAliasMaximum : 10]
Connected with result code Success
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending PUBLISH (d1, q1, r0, m1), 'b''', properties=[TopicAlias : 1], ... (10 bytes)
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received DISCONNECT None None
Disconnected with result code Normal disconnection
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending CONNECT (u1, p1, wr0, wq0, wf0, c0, k60) client_id=b'' properties=None
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received CONNACK (0, Success) properties=[AssignedClientIdentifier : auto-714B2B0C-53E0-0370-C2EA-CBB2EF3626F3, ReceiveMaximum : 4096, TopicAliasMaximum : 10]
Connected with result code Success
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Sending PUBLISH (d1, q1, r0, m1), 'b''', properties=[TopicAlias : 1], ... (10 bytes)
PAHO-MQTT-LOG: LogLevel.MQTT_LOG_DEBUG Received DISCONNECT None None
Disconnected with result code Normal disconnection

Broker logs:

mqtt5     | 1749642878: New connection from 172.18.0.1:44450 on port 1883.
mqtt5     | 1749642878: New client connected from 172.18.0.1:44450 as auto-CDF56F33-D918-6599-9E6E-9BF26E59C7DF (p5, c1, k60, u'').
mqtt5     | 1749642878: Client auto-CDF56F33-D918-6599-9E6E-9BF26E59C7DF disconnected due to protocol error.
mqtt5     | 1749642879: New connection from 172.18.0.1:44462 on port 1883.                                                                                                                                                                                                                                            
mqtt5     | 1749642879: New client connected from 172.18.0.1:44462 as auto-08C8E0B5-A2BC-4530-1BC0-A0C1C4A4123B (p5, c0, k60, u'').
mqtt5     | 1749642879: Client auto-08C8E0B5-A2BC-4530-1BC0-A0C1C4A4123B disconnected due to protocol error.                                                                                                                                                                                                          
mqtt5     | 1749642880: New connection from 172.18.0.1:44466 on port 1883.                                                                                                                                                                                                                                            
mqtt5     | 1749642880: New client connected from 172.18.0.1:44466 as auto-714B2B0C-53E0-0370-C2EA-CBB2EF3626F3 (p5, c0, k60, u'').
mqtt5     | 1749642880: Client auto-714B2B0C-53E0-0370-C2EA-CBB2EF3626F3 disconnected due to protocol error.   

Possible solutions?

So the client gets stuck sending a broken message that the server will never understand.
Some possible solution ideas I had...

  1. Detect the topic alias being used when a session couldn't be restored and skip the message?
  2. Allow a setting (for users of topic aliasing) that drops all pending messages on connect?
    (Or have an API we can call during on_disconnect. mqttc.clear_messages())
  3. Implement topic aliasing inside the client library so it can know the appropriate serialization (according to connection state).
    This would mean the user of the library ALWAYS supplies the topic name, and the library would keep track of whether to send the topic name or not.

For now, I am adding the following hacky code to my on_connect callback:
(which is ugly in that it needs to access internal members of the object and may end up causing more bugs).
It does allow my clients to reliably reconnect and recover from server restarts though.

    def on_connect(client, userdata, flags, reason_code, properties):
        with client._out_message_mutex:
            client._out_messages.clear()

Metadata

Metadata

Assignees

No one assigned

    Labels

    Status: AvailableNo one has claimed responsibility for resolving this issue.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions