-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Description
I understand using xor for combining hashes is useful pedologically, but it has problems and shouldn't be used in production code.
Problems:
- XOR is commutative and associative, so it is a poor choice for ordered aggregates (list, tuples, strings, etc)
- Duplicate cancelation is severe, since h(x) ^ h(x) = 0. In your vector example, all diagonal vectors will hash to 0.
- Security: attackers can create collections with identical combined hashes, which would make all dictionary lookups become linear scans.
- Small changes should flip many output bits in a way that looks random.
One solution is a Boost lineage custom combiner, though it also isn't resistant to attacks:
class MyObj:
def __init__(self, a, b, c):
self.a, self.b, self.c = a, b, c
def __hash__(self):
seed = 0
for v in (self.a, self.b, self.c):
h = hash(v)
seed ^= h + 0x9E3779B97F4A7C15 + ((seed << 6) & ((1 << 64) - 1)) + (seed >> 2)
seed &= (1 << 64) - 1 # keep bounded like a machine word
# Convert to Python int in signed-ish way (optional)
if seed >= (1 << 63):
seed -= (1 << 64)
return seed
Or just put a warning on the code.
Adam Smith
Metadata
Metadata
Assignees
Labels
No labels