-------------------------------- -------------------------------------------------- -------------------------------------------------- -------------------------------------

## In the previous three blog posts, LZ explained (HashMap, HashSet, HashTable), in which LZ continued to explain their put and get method, calculating the hashCode of the key should be the most important and essential part among these two methods, so below LZ unveils the "mystery" of hashCode.

## The role of hashCode

To understand the inner principles of a method, we first You need to understand what it does, which is what this method does. When explaining arrays (Java Improvement Chapter (18)------Arrays), we mentioned that arrays are the most efficient data structure in Java, but the "highest" has a prerequisite. First we need to know the location of the queried data. Second: If we perform an iterative search, the amount of data must be small. For large amounts of data, collections are generally recommended.

There are two types of collections in Java, one is List and the other is Set. The difference between them is that the elements in the List collection are ordered. , and can be repeated, while the elements in the Set collection are unordered and non-repeatable. List is easy to handle, but for Set, how do we ensure that the elements are not repeated? Check whether equals() is equal by iterating. It is acceptable when the amount of data is small, but the efficiency can be imagined when our data amount is large (of course we can use algorithms to optimize). For example, if we insert 1000 data into HashSet, do we really have to iterate 1000 times and call the equals() method 1000 times? hashCode provides the solution. How to achieve it? Let’s first look at the source code (Object) of hashCode.

public native int hashCode();

Copy after login

It is a local method, and its implementation is related to the local machine. Here we consider it for the time being. What is returned is the physical location where the object is stored (actually not, it is written here for ease of understanding). When we add an element to a collection, the collection will first call the hashCode method, so that the location where it is stored can be directly located. If there are no other elements there, it will be saved directly. If there is already an element there, call the equals method to match whether the two elements are the same. If they are the same, they will not be stored. If they are different, they will be hashed to other locations (for details, please refer to (Java Improvement) ()-----HashMap )). In this way, when we store a large number of elements, we can greatly reduce the number of calls to the equals() method, which greatly improves efficiency.

So the role of

hashCode in the above is to find the region (finding the location of an object in the collection) ). HashCode can divide the collection into several areas. Each object can calculate their hash code. The hash codes can be grouped. Each group corresponds to a certain storage area. According to the hash code of an object, the object stored can be determined. area, which greatly reduces the number of query matching elements and improves query efficiency. ##

The importance of hashCode to an object

## Is it important? Not important, for List collections and arrays, it is a burden, but for HashMap, HashSet, and HashTable, it becomes extremely important. Therefore, you must pay attention to hashCode when using HashMap, HashSet, and HashTable. For an object, its hashCode process is the implementation of a simple Hash algorithm, and its implementation process plays a very important role in realizing the object's access process.

LZ mentioned two data structures, HashMap and HashTable. Although they have several differences, their implementation principles are the same. Here I use HashTable takes an example to illustrate the importance of hashCode for an object.

一个对象势必会存在若干个属性，如何选择属性来进行散列考验着一个人的设计能力。如果我们将所有属性进行散列，这必定会是一个糟糕的设计，因为对象的hashCode方法无时无刻不是在被调用，如果太多的属性参与散列，那么需要的操作数时间将会大大增加，这将严重影响程序的性能。但是如果较少属相参与散列，散列的多样性会削弱，会产生大量的散列“冲突”，除了不能够很好的利用空间外，在某种程度也会影响对象的查询效率。其实这两者是一个矛盾体，散列的多样性会带来性能的降低。

那么如何对对象的hashCode进行设计，LZ也没有经验。从网上查到了这样一种解决方案：设置一个缓存标识来缓存当前的散列码，只有当参与散列的对象改变时才会重新计算，否则调用缓存的hashCode，这样就可以从很大程度上提高性能。

在HashTable计算某个对象在table[]数组中的索引位置，其代码如下：

int index = (hash & 0x7FFFFFFF) % tab.length;

Copy after login

为什么要&0x7FFFFFFF？因为某些对象的hashCode可能会为负值，与0x7FFFFFFF进行与运算可以确保index为一个正数。通过这步我可以直接定位某个对象的位置，所以从理论上来说我们是完全可以利用hashCode直接定位对象的散列表中的位置，但是为什么会存在一个key-value的键值对，利用key的hashCode来存入数据而不是直接存放value呢？这就关系HashTable性能问题的最重要的问题:Hash冲突！

我们知道冲突的产生是由于不同的对象产生了相同的散列码，假如我们设计对象的散列码可以确保99.999999999%的不重复，但是有一种绝对且几乎不可能遇到的冲突你是绝对避免不了的。我们知道hashcode返回的是int，它的值只可能在int范围内。如果我们存放的数据超过了int的范围呢？这样就必定会产生两个相同的index，这时在index位置处会存储两个对象，我们就可以利用key本身来进行判断。所以具有相索引的对象，在该index位置处存在多个对象，我们必须依靠key的hashCode和key本身来进行区分。

hashCode与equals

在Java中hashCode的实现总是伴随着equals，他们是紧密配合的，你要是自己设计了其中一个，就要设计另外一个。当然在多数情况下，这两个方法是不用我们考虑的，直接使用默认方法就可以帮助我们解决很多问题。但是在有些情况，我们必须要自己动手来实现它，才能确保程序更好的运作。

对于equals，我们必须遵循如下规则：

对称性：如果x.equals(y)返回是“true”，那么y.equals(x)也应该返回是“true”。

反射性：x.equals(x)必须返回是“true”。

类推性：如果x.equals(y)返回是“true”，而且y.equals(z)返回是“true”，那么z.equals(x)也应该返回是“true”。

一致性：如果x.equals(y)返回是“true”，只要x和y内容一直不变，不管你重复x.equals(y)多少次，返回都是“true”。

任何情况下，x.equals(null)，永远返回是“false”；x.equals(和x不同类型的对象)永远返回是“false”。

对于hashCode，我们应该遵循如下规则：

1. 在一个应用程序执行期间，如果一个对象的equals方法做比较所用到的信息没有被修改的话，则对该对象调用hashCode方法多次，它必须始终如一地返回同一个整数。

2. 如果两个对象根据equals(Object o)方法是相等的，则调用这两个对象中任一对象的hashCode方法必须产生相同的整数结果。

3. 如果两个对象根据equals(Object o)方法是不相等的，则调用这两个对象中任一个对象的hashCode方法，不要求产生不同的整数结果。但如果能不同，则可能提高散列表的性能。

至于两者之间的关联关系，我们只需要记住如下即可：

如果x.equals(y)返回“true”，那么x和y的hashCode()必须相等。

如果x.equals(y)返回“false”，那么x和y的hashCode()有可能相等，也有可能不等。

理清了上面的关系我们就知道他们两者是如何配合起来工作的。先看下图：

整个处理流程是：

1、判断两个对象的hashcode是否相等，若不等，则认为两个对象不等，完毕，若相等，则比较equals。

2、若两个对象的equals不等，则可以认为两个对象不等，否则认为他们相等。

实例：

public class Person {
    private int age;
    private int sex;    //0：男，1：女
    private String name;
    
    private final int PRIME = 37;
    
    Person(int age ,int sex ,String name){
        this.age = age;
        this.sex = sex;
        this.name = name;
    }

    /** 省略getter、setter方法 **/
    
    @Override
    public int hashCode() {
        System.out.println("调用hashCode方法...........");

        int hashResult = 1;
        hashResult = (hashResult + Integer.valueOf(age).hashCode() + Integer.valueOf(sex).hashCode()) * PRIME;
        hashResult = PRIME * hashResult + ((name == null) ? 0 : name.hashCode()); 
        System.out.println("name:"+name +" hashCode:" + hashResult);
        
        return hashResult;
    }

    /**
     * 重写hashCode()
     */
    public boolean equals(Object obj) {
        System.out.println("调用equals方法...........");
        
        if(obj == null){
            return false;
        }
        if(obj.getClass() != this.getClass()){
            return false;
        }
        if(this == obj){
            return true;
        }

        Person person = (Person) obj;
        
        if(getAge() != person.getAge() || getSex()!= person.getSex()){
            return false;
        }
        
        if(getName() != null){
            if(!getName().equals(person.getName())){
                return false;
            }
        }
        else if(person != null){
            return false;
        }
        return true;
    }
}

Copy after login

该Bean为一个标准的Java Bean，重新实现了hashCode方法和equals方法。

public class Main extends JPanel {

    public static void main(String[] args) {
        Set<Person> set = new HashSet<Person>();
        
        Person p1 = new Person(11, 1, "张三");
        Person p2 = new Person(12, 1, "李四");
        Person p3 = new Person(11, 1, "张三");
        Person p4 = new Person(11, 1, "李四");
        
        //只验证p1、p3
        System.out.println("p1 == p3? :" + (p1 == p3));
        System.out.println("p1.equals(p3)?:"+p1.equals(p3));
        System.out.println("-----------------------分割线--------------------------");
        set.add(p1);
        set.add(p2);
        set.add(p3);
        set.add(p4);
        System.out.println("set.size()="+set.size());
    }
}

Copy after login

运行结果如下：