2.11 The __hash__ Special Method

Here's the definition of __hash__ from the Python reference manual.

Called for the key object for dictionary operations, and by the built-in function hash(). Should return a 32-bit integer usable as a hash value for dictionary operations. The only required property is that objects which compare equal have the same hash value; it is advised to somehow mix together (e.g., using exclusive or) the hash values for the components of the object that also play a part in comparison of objects. If a class does not define a __cmp__() method it should not define a __hash__() operation either; if it defines __cmp__() or __eq__() but not __hash__(), its instances will not be usable as dictionary keys. If a class defines mutable objects and implements a __cmp__() or __eq__() method, it should not implement __hash__(), since the dictionary implementation requires that a key's hash value is immutable (if the object's hash value changes, it will be in the wrong hash bucket).

Notice that ``The only required property is that objects which compare equal have the same hash value.'' This is an assumption made by the Python language, which in Sage we simply cannot make (!), and violating it has consequences. Fortunately, the consequences are pretty clearly defined and reasonably easy to understand, so if you know about them they don't cause you trouble. I think the following example illustrates them pretty well:

    sage: v = [Mod(2,7)]
    sage: 9 in v
    True
    sage: v = set([Mod(2,7)])
    sage: 9 in v
    False
    sage: 2 in v
    True
    sage: w = {Mod(2,7):'a'}
    sage: w[2]
    'a'
    sage: w[9]
    Traceback (most recent call last):
    ...
    KeyError: 9

Here's another example,

    sage: R = RealField(10000)
    sage: a = R(1) + R(10)^-100
    sage: a == RDF(1)  # because the a gets coerced down to RDF
    True
but hash(a) should not equal hash(1).

Unfortunately, in Sage we simply can't require:

          (*) "a == b ==> hash(a) == hash(b)"
because serious mathematics is simply too complicated for this rule. For example, the equalities z == Mod(z, 2) and z == Mod(z, 3) would force hash() to be constant on the integers.

The only way we could ``fix'' this problem for good would be to abandon using the "==" operator for "Sage equality", and implement Sage equality as a new method attached to each object. Then we could follow Python rules for "==" and our rules for everything else, and all Sage code would become completely unreadable (and for that matter unwritable). So we just have to live with it.

So what is done in Sage is to attempt to satisfy (*) when it is reasonably easy to do so, but use judgment and not go overboard. For example,

    sage: hash(Mod(2,7))
    2
I.e., we get 2. That's better than some weird random hash that also involves the moduli, but it's of course not right from the Python point of view, since 9 == Mod(2,7).

The goal is to make a hash function that is fast but within reason respects any obvious, natural inclusions and coercions.

See About this document... for information on suggesting changes.