You know that the only identification of a object cannot just be done by writing a nice equals Implementing
is great, but now you also have to implement the hashCode method
Let’s see why and how to do it right.
##Equality and hash codesEquality is from a general perspective, hash codes are more technical, and if we have difficulty understanding, we can say that they are just an implementation detail. To improve performance. Most data structures use the equals method to determine whether they contain an element, for example:List<String> list = Arrays.asList("a", "b", "c"); boolean contains = list.contains("b");
They perform comparisons by using a shortcut method (reducing potential instance equality), thereby replacing by comparison. Each element contained in the instance. The shortcut comparison only needs to compare the following aspects:
. Hash
to identify them, among whichHashMap is the most famous representative
They usually work like this
array (the so-called bucket)
if , unequal elements have the same hash code, they end up on the same bucket and are bundled together, for example by adding to a list## when an instance is made. When containing
operates, its hash code will be used to calculate the bucket value (index value), and the instance will be compared only if there is an element at the corresponding index value.Therefore
equals
Object
idea of the hash method##.
# If hashCode<a href="http://www.php.cn/wiki/60.html" target="_blank"> is used as a shortcut to determine equality, then there is only one thing we should care about: equal objects should have the same hash code, which is why if we override </a> equals
method, we must create a matching
Otherwise equal objects may not have the same hash code, because they What will be called is the default implementation of Object's
. HashCode
Guidelines
Quote
From official documentation
hashCode General convention:
* Call to run Java For the same object in the application, the hashCode method must always return the same integer. This integer does not need to be consistent across different Java applications. * According to the equals(Object) method, if the two objects are not equal, then calling the hashCode method on the two objects will not necessarily produce different integer results. However, programmers should be aware that producing different integer results for unequal objects will potentially improve hash table performance.
Person.hashCodeThe first point reflects equal consistency
The following is a very simple implementation of
Attributes
, and the second one is the requirement we made above. The third illustrates an important detail that we will discuss later.
HashCode implementation
@Override
public int hashCode() {
return Objects.hash(firstName, lastName);
}
of Object
.
Select fields 首先,有一致性的要求。它应该相当严格。虽然它允许如果一些字段改变对应的哈希码发生变化(对于可变的类是不可避免的),但是哈希数据结构并不是为这种场景准备的。 正如我们以上所见的哈希码用于确定元素的桶。但如果hash-relevant字段发生了改变,并不会重新计算哈希码、也不会更新内部数组。 这意味着以后通过相等的对象,甚至同一实例进行查询也会失败,数据结构计算当前的哈希码与之前存储实例计算的哈希码并不一致,并是错误的桶。 结论:最好不要使用可变字段计算哈希码! 哈希码最终计算的频率与可能调用 除非使用非常复杂的算法或者涉及非常多的字段,那么计算哈希码的运算成本是微不足道的、同样也是不可避免的。但是也应该考虑是否需要包含所有的字段来进行运算。集合需要特别警惕的对待。以 如果性能是至关重要的,使用 总是关注性能,这个实现怎么呢? 快是肯定的。相等的对象将具有相同的哈希码。并且,没有可变的字段! 但是,我们之前说过的桶呢?!这种方式下所有的实例将会有相同的桶!这将会导致一个链表来包含所有的元素,这样一来将会有非常差的性能。每次调用 我们希望尽可能少的元素在同一个桶!一个算法返回变化多端的哈希码,即使对于非常相似的对象,是一个好的开始。 怎样才能达到上面的效果部分取决于选取的字段,我们在计算中包含更多的细节,越有可能获取到不同的哈希码。注意:这个与我们所说的性能是完全相反的。因此,有趣的是,使用过多或者过少的字段都会导致糟糕的性能。 防止碰撞的另一部分是使用实际计算散列的算法。 最简单的方法来计算一个字段的哈希码是通过直接调用But which fields are related? The requirement will help us answer this question: If equal objects must have the same hash code, then calculating the hash code is not Any fields not used for equality checking should be included. (Otherwise, the two objects are only different in these fields but may still be equal. At this time, the hash codes of the two objects will be different.)
So it is used when the hash group fields should be equal. A subset of fields. By default both use the same fields, but there are some details to consider.
一致性
性能
equals
差不多,那么这里将是影响性能的关键部分,因此考虑此部分性能也是非常有意义的。并且与equals
相比,优化之后又更大的上升空间。Lists
和sets
为例,将会包含集合里面的每一个元素来计算哈希码。是否需要调用它们需要具体情况具体分析。Objects.hash
因为需要为varargs
创建一个数组也许并不是最好的选择。但一般规则优化是适用的:不要过早地使用一个通用的散列码算法,也许需要放弃集合,只有优化分析显示潜在的改进。碰撞
@Override
public int hashCode() {
return 0;
}
contains
将会触发对整个list线性扫描。计算Hsah
hashCode
,结合的话会自动完成。常见的算法是首先在以任意数量的数值(通常是基本数据类型)反复进行相乘操作再与字段哈希码相加int prime = 31;
int result = 1;
result = prime * result + ((firstName == null) ? 0 : firstName.hashCode());
result = prime * result + ((lastName == null) ? 0 : lastName.hashCode());
return result;
这可能导致溢出,但是不是特别有问题的,因为他们并没有产生Java异常。
注意,即使是非常良好的的哈希算法也可能因为输入特定的模式的数据有导致频繁碰撞。作为一个简单的例子假设我们会计算点的散列通过增加他们的x和y坐标。当我们处理f(x) = -x
线上的点时,线上的点都满足:x + y == 0
,将会有大量的碰撞。
但是:我们可以使用一个通用的算法,只到分析表明并不正确,才需要对哈希算法进行修改。
我们了解到计算哈希码就是压缩相等的一个整数值:相等的对象必须有相同的哈希码,而出于对性能的考虑:最好是尽可能少的不相等的对象共享相同的哈希码。
这就意味着如果重写了equals
方法,那么就必须重写hashCode
方法
当实现hashCode
使用与equals中使用的相同的字段(或者equals中使用字段的子集)
最好不要包含可变的字段。
对集合不要考虑调用hashCode
如果没有特殊的输入特定的模式,尽量采用通用的哈希算法
记住hashCode
性能,所以除非分析表明必要性,否则不要浪费太多的精力。
The above is the detailed content of Sample code sharing for implementing hashCode method in Java. For more information, please follow other related articles on the PHP Chinese website!