NodeId class - hash collisions with integer (UInteger) node identifiers #1461
Replies: 3 comments
-
What was the weird behavior these hash collisions led to? |
Beta Was this translation helpful? Give feedback.
-
I've moved this into the Discussions area for now because I'm not convinced there's any actual issue/bug here, nor that there should be any expectation that NodeId hashes any different/better/worse than any other Java class or record. The hash code is calculated no differently than if it were delegated to |
Beta Was this translation helpful? Give feedback.
-
@kevinherron The answer to your question is that non uniqueness of NodeId hashCodes created no problems with Milo itself, only code of mine using Milo. My problems boiled down to treating NodeId.hashCode() as a unique identifier for the underlying NodeId. Noticed that hashCode (in the range of namespaces and identifiers mentioned above) for NodeIds were limited to about 5 bytes, and wanted to encode some extra OpcUa tree information into a UaMonitoredItem's MonitoringParameters so the ValueConsumer that received it could route it to the correct class. (For the tree below, the extra information would be the index [0], [1], ...)
So to create the clientHandle parameter for a node's MonitoringParameters, I stuffed the index into the upper bytes, and used the lower bytes of the hashCode.
The hashCode (or [index, namespace, identifier]) backed out of this clientHandle by my ValueConsumer implementation did not always map to a unique NodeId (non-unique hashcode and all) and I'd route the decoded ExtensionObject to the wrong class. Biggest things learned from this blunder:
I should have dug slightly deeper before submitting an issue, my apologies for that. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Describe the bug
I noticed some unpredictable behavior in my app while using
NodeId
as Map keys. I'm not sure if this is actually supported/encouraged, but didn't see anything in the docs to indicate otherwise. At a glance, many of Milo's internals (e.g.DataTypeTree
) seem to useNodeId
as a hashtable key. I tracked it down to thehashCode
method of theNodeId
class - the hashCode computation yields a very high likelihood of hash collisions for a server with several active namespaces & number of nodes. For my particular project, there were 38 hash collisions out of 1433 monitored/relevant nodes across 3 namespaces (2.6% collision rate).Expected behavior
I expect NodeId to be reliably hashable, perhaps not completely uniquely, but at least "unique-enough" across the range of namespaces and nodes for my server, so that 2 NodeIds don't resolve to the same hash.
Logs and Packet Captures
I wrote this small test to see hash collisions for what I figured to be a feasible range of namespaces (8) and node identifiers (32767) and compared it to a couple alternatives to Java's .hashCode()
outputs:
DEBUG: hash method: NodeId::hashCode
DEBUG: Total collisions: 262082
DEBUG: Collision percentage: 99.97635%
DEBUG: Max collisions: 8
DEBUG: hash method: Hashing.goodFastHash(32)
DEBUG: Total collisions: 0
DEBUG: Collision percentage: 0.0%
DEBUG: Max collisions: 0
DEBUG: hash method: Hashing.murmur3_32_fixed()
DEBUG: Total collisions: 12
DEBUG: Collision percentage: 0.0045776367%
DEBUG: Max collisions: 2
DEBUG: hash method: Hashing.murmur3_128()
DEBUG: Total collisions: 14
DEBUG: Collision percentage: 0.005340576%
DEBUG: Max collisions: 2
DEBUG: hash method: Hashing.sha256()
DEBUG: Total collisions: 22
DEBUG: Collision percentage: 0.008392334%
DEBUG: Max collisions: 2
DEBUG: hash method: Hashing.crc32()
DEBUG: Total collisions: 0
DEBUG: Collision percentage: 0.0%
DEBUG: Max collisions: 0
Additional context
artifacts = ["org.eclipse.milo:sdk-client:0.6.15", "org.eclipse.milo:stack-core:0.6.15",]
Beta Was this translation helpful? Give feedback.
All reactions