Hash Function in Python 3.3: Why Different Results are Returned Between Sessions
In Python 3.3, the internal hash() function behaves unexpectedly, returning different hash values for the same string in different sessions. This phenomenon stems from Python's use of a random hash seed as a security measure.
The random hash seed is employed to prevent attackers from exploiting predictable keys that could cause tar-pitting attacks. By adding a random offset to the hash, attackers cannot anticipate which keys will collide.
To control the behavior of the hash function, the PYTHONHASHSEED environment variable can be set. A fixed positive seed can be specified to deter randomness, while setting it to 0 disables the seed offset entirely.
Prior to Python 3.3, the random hash seed was disabled; however, it became enabled by default. This change affects not only sets but also dictionaries in Python versions 3.5 and earlier.
Furthermore, object.__hash__() has a special behavior:
It is important to note that hash values impact the iteration order of mappings such as dicts and sets. However, such ordering is not guaranteed by Python and may vary between different builds and versions.
For consistent hashing, consider using the hashlib module, which provides cryptographic hash functions. Additionally, pybloom utilizes this approach for stability.
While the random hash seed offset makes it difficult for attackers to determine the offset, it also prevents the storage of the offset itself. However, this ensures that attackers cannot use timing attacks to determine the seed.
The above is the detailed content of Why Do Python 3.3 Hash Values Differ Between Sessions?. For more information, please follow other related articles on the PHP Chinese website!