NodeId class - hash collisions with integer (UInteger) node identifiers #1461

peterfranciscook · 2025-05-15T19:23:20Z

peterfranciscook
May 15, 2025

Describe the bug
I noticed some unpredictable behavior in my app while using NodeId as Map keys. I'm not sure if this is actually supported/encouraged, but didn't see anything in the docs to indicate otherwise. At a glance, many of Milo's internals (e.g. DataTypeTree) seem to use NodeId as a hashtable key. I tracked it down to the hashCode method of the NodeId class - the hashCode computation yields a very high likelihood of hash collisions for a server with several active namespaces & number of nodes. For my particular project, there were 38 hash collisions out of 1433 monitored/relevant nodes across 3 namespaces (2.6% collision rate).

Expected behavior
I expect NodeId to be reliably hashable, perhaps not completely uniquely, but at least "unique-enough" across the range of namespaces and nodes for my server, so that 2 NodeIds don't resolve to the same hash.

Logs and Packet Captures
I wrote this small test to see hash collisions for what I figured to be a feasible range of namespaces (8) and node identifiers (32767) and compared it to a couple alternatives to Java's .hashCode()

    void checkMiloNodeIdHashes() {
        checkNodeIdHashes(NodeId::hashCode, "NodeId::hashCode");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.goodFastHash(32)), "Hashing.goodFastHash(32)");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.murmur3_32_fixed()), "Hashing.murmur3_32_fixed()");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.murmur3_128()), "Hashing.murmur3_128()");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.sha256()), "Hashing.sha256()");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.crc32()), "Hashing.crc32()");
    }

    void checkNodeIdHashes(Function<NodeId, Integer> hashMethod, String method) {
        Multimap<Integer, NodeId> seen = ArrayListMultimap.create();

        // hash node ids for namespaces 0-7 & node identifiers 0-32767
        int nNamespace = 8;
        int nIndex = 32768;
        for (int ns = 0; ns < nNamespace; ns++) {
            for (int nid = 0; nid < nIndex; nid++) {
                NodeId nodeId = new NodeId(ns, nid);
                int hash = hashMethod.apply(nodeId);
                seen.put(hash, nodeId);
            }
        }

        Logger.debug("hash method: {}", method);
        checkCollisions(seen, nNamespace * nIndex);

        seen = null; // much RAM, garbage collect
    }

    int hashNodeId(NodeId nodeId, HashFunction hashFunction) {
        return hashFunction.newHasher()
            .putInt(nodeId.getNamespaceIndex().intValue())
            .putInt(((UInteger) nodeId.getIdentifier()).intValue())
            .hash()
            .asInt();
    }

    void checkCollisions(Multimap<Integer, NodeId> seen, int nNode) {
        int collisions = 0;
        int maxCollisions = 0;
        for (var entry : seen.asMap().entrySet()) {
            int key = entry.getKey();
            Collection<NodeId> values = entry.getValue();
            int nCollisionForKey = values.size();
            if (nCollisionForKey > 1) {
                Logger.trace("Hash Collision: {} <- {}", key, values);
                collisions += nCollisionForKey;
                maxCollisions = Math.max(maxCollisions, nCollisionForKey);
            }
        }
        displayCollisionStats(collisions, nNode, maxCollisions);
    }

    void displayCollisionStats(int collisions, int nNode, int maxCollisions) {
        float collisionPercent = (float) collisions / nNode * 100;
        Logger.debug("Total collisions: {}", collisions);
        Logger.debug("Collision percentage: {}%", collisionPercent);
        Logger.debug("Max collisions: {}\n", maxCollisions);
    }

outputs:

DEBUG: hash method: NodeId::hashCode
DEBUG: Total collisions: 262082
DEBUG: Collision percentage: 99.97635%
DEBUG: Max collisions: 8

DEBUG: hash method: Hashing.goodFastHash(32)
DEBUG: Total collisions: 0
DEBUG: Collision percentage: 0.0%
DEBUG: Max collisions: 0

DEBUG: hash method: Hashing.murmur3_32_fixed()
DEBUG: Total collisions: 12
DEBUG: Collision percentage: 0.0045776367%
DEBUG: Max collisions: 2

DEBUG: hash method: Hashing.murmur3_128()
DEBUG: Total collisions: 14
DEBUG: Collision percentage: 0.005340576%
DEBUG: Max collisions: 2

DEBUG: hash method: Hashing.sha256()
DEBUG: Total collisions: 22
DEBUG: Collision percentage: 0.008392334%
DEBUG: Max collisions: 2

DEBUG: hash method: Hashing.crc32()
DEBUG: Total collisions: 0
DEBUG: Collision percentage: 0.0%
DEBUG: Max collisions: 0

Additional context
artifacts = ["org.eclipse.milo:sdk-client:0.6.15", "org.eclipse.milo:stack-core:0.6.15",]

kevinherron · 2025-05-15T19:52:39Z

kevinherron
May 15, 2025
Maintainer

What was the weird behavior these hash collisions led to?

0 replies

kevinherron · 2025-05-16T15:38:05Z

kevinherron
May 16, 2025
Maintainer

I've moved this into the Discussions area for now because I'm not convinced there's any actual issue/bug here, nor that there should be any expectation that NodeId hashes any different/better/worse than any other Java class or record. The hash code is calculated no differently than if it were delegated to Objects.hash or if it were a record class instead.

0 replies

peterfranciscook · 2025-05-19T14:19:15Z

peterfranciscook
May 19, 2025
Author

@kevinherron The answer to your question is that non uniqueness of NodeId hashCodes created no problems with Milo itself, only code of mine using Milo. My problems boiled down to treating NodeId.hashCode() as a unique identifier for the underlying NodeId.

Noticed that hashCode (in the range of namespaces and identifiers mentioned above) for NodeIds were limited to about 5 bytes, and wanted to encode some extra OpcUa tree information into a UaMonitoredItem's MonitoringParameters so the ValueConsumer that received it could route it to the correct class. (For the tree below, the extra information would be the index [0], [1], ...)

ServerInterfaces
├── Foo
│   ├── [0]
│   ├── [1]
│   ├── [2]
│   ├── ...
│   ├── [n-1]

So to create the clientHandle parameter for a node's MonitoringParameters, I stuffed the index into the upper bytes, and used the lower bytes of the hashCode.

int clientHandle = Unsigned.uint( (serverIdxOfUdt << 24) | (nodeId.hashCode & 0x00FFFFFF) );
// or (nameSpace << 24) | (serverIdxOfUdt << 16) | (nodeIdentifier & 0x0000FFFF)

The hashCode (or [index, namespace, identifier]) backed out of this clientHandle by my ValueConsumer implementation did not always map to a unique NodeId (non-unique hashcode and all) and I'd route the decoded ExtensionObject to the wrong class. Biggest things learned from this blunder:

The NodeId is already a class member of UaMonitoredItem (which gets routed to ValueConsumer::onValueArrived), so much of this work was pointless 🤦
Although there are hashCode collisions, you can still use NodeId as a hashtable key, it just requires extra equality checks in Java's Map implementations and (slightly) dings the performance of the Map implementation as a result.

I should have dug slightly deeper before submitting an issue, my apologies for that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NodeId class - hash collisions with integer (UInteger) node identifiers #1461

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

NodeId class - hash collisions with integer (UInteger) node identifiers #1461

Uh oh!

peterfranciscook May 15, 2025

Replies: 3 comments

Uh oh!

kevinherron May 15, 2025 Maintainer

Uh oh!

kevinherron May 16, 2025 Maintainer

Uh oh!

peterfranciscook May 19, 2025 Author

peterfranciscook
May 15, 2025

kevinherron
May 15, 2025
Maintainer

kevinherron
May 16, 2025
Maintainer

peterfranciscook
May 19, 2025
Author