Skip to content

chapter 12: xor-ing hash codes is discouraged and a potential security flaw #13

@adamwadesmith

Description

@adamwadesmith

I understand using xor for combining hashes is useful pedologically, but it has problems and shouldn't be used in production code.

Problems:

  1. XOR is commutative and associative, so it is a poor choice for ordered aggregates (list, tuples, strings, etc)
  2. Duplicate cancelation is severe, since h(x) ^ h(x) = 0. In your vector example, all diagonal vectors will hash to 0.
  3. Security: attackers can create collections with identical combined hashes, which would make all dictionary lookups become linear scans.
  4. Small changes should flip many output bits in a way that looks random.

One solution is a Boost lineage custom combiner, though it also isn't resistant to attacks:

class MyObj:
    def __init__(self, a, b, c):
        self.a, self.b, self.c = a, b, c

    def __hash__(self):
        seed = 0
        for v in (self.a, self.b, self.c):
            h = hash(v)
            seed ^= h + 0x9E3779B97F4A7C15 + ((seed << 6) & ((1 << 64) - 1)) + (seed >> 2)
            seed &= (1 << 64) - 1  # keep bounded like a machine word
        # Convert to Python int in signed-ish way (optional)
        if seed >= (1 << 63):
            seed -= (1 << 64)
        return seed

Or just put a warning on the code.

Adam Smith

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions