Hashing
HashSet and HashMap are two widely-used types. The default hashing
algorithm is not specified, but at the time of writing the default is an
algorithm called SipHash 1-3. This algorithm is high quality—it provides high
protection against collisions—but is relatively slow, particularly for short keys
such as integers.
If profiling shows that hashing is hot, and HashDoS attacks are not a concern for your application, the use of hash tables with faster hash algorithms can provide large speed wins.
rustc-hashprovidesFxHashSetandFxHashMaptypes that are drop-in replacements forHashSetandHashMap. Its hashing algorithm is low-quality but very fast, especially for integer keys, and has been found to out-perform all other hash algorithms within rustc. (fxhashis an older, less well maintained implementation of the same algorithm and types.)fnvprovidesFnvHashSetandFnvHashMaptypes. Its hashing algorithm is higher quality thanrustc-hash's but a little slower.ahashprovidesAHashSetandAHashMap. Its hashing algorithm can take advantage of AES instruction support that is available on some processors.
If hashing performance is important in your program, it is worth trying more than one of these alternatives. For example, the following results were seen in rustc.
- The switch from
fnvtofxhashgave speedups of up to 6%. - An attempt to switch from
fxhashtoahashresulted in slowdowns of 1-4%. - An attempt to switch from
fxhashback to the default hasher resulted in slowdowns ranging from 4-84%!
If you decide to universally use one of the alternatives, such as
FxHashSet/FxHashMap, it is easy to accidentally use HashSet/HashMap in
some places. You can use Clippy to avoid this problem.
Some types don't need hashing. For example, you might have a newtype that wraps
an integer and the integer values are random, or close to random. For such a
type, the distribution of the hashed values won't be that different to the
distribution of the values themselves. In this case the nohash_hasher crate
can be useful.
Hash function design is a complex topic and is beyond the scope of this book.
The ahash documentation has a good discussion.