HashMap中的红黑树问题

mac2024-06-18 60

HashMap中的红黑树问题

背景基础知识equals()方法和hashCode()方法哈希表数据结构源码剖析总结

背景

如果大家对HashMap有一定了解的话，那么首先知道，红黑树结构是JDK8中对HashMap优化后新增的数据结构。JDK7中HashMap的数据结构是数组+链表构成的哈希表，JDK8中的HashMap的数据结构是数组+链表+红黑树。网上有很多关于HashMap底层原理的文章，但是其中对HashMap在扩容时红黑树的结构描述的不清楚或者不准确，本文通过源码对红黑树问题进行剖析。这里我们使用的jdk版本是1.8.0_162

基础知识

要分析源码，首先我们需要有一些基础知识。

equals()方法和hashCode()方法

关于这两个方法，只需要记住重写equals()方法后，一定要重写hashCode方法，原因是JDK中有很多API和这两个方法有关系，正如JDK的建议一样，equals方法返回true的两个对象，hashCode应该是相同的，除此之外没有别的要求，也就是说equals方法返回为false的两个对象，hashCode可以相同也可以不相同。

哈希表数据结构

首先要了解的是哈希表的数据结构。前文提过，HashMap的底层就是通过数组+链表+红黑树的方式实现的哈希表结构。数组结构可以在O(1)的时间复杂度定位元素在数组中的位置，而位置是通过key的哈希值和数组长度取模计算出来的，而哈希值是可能相同的，也就是哈希冲突，当然，哈希值不同的时候，通过取模计算也可能产生哈希冲突，所以相同索引下的键的哈希值是可能不相同的或者说，绝大多数是不相同的。当哈希冲突时，就用到链表或者红黑树来解决，当多个元素产生哈希冲突时，这些元素都映射到数组的同一个位置，通过链表或者红黑树结构把这些元素连接起来。

这里解释一下为什么equals方法返回true的两个对象，hashCode值也应该一样？我们假设有一个对象，通过计算hashCode正常添加到HashMap中，这时，又有一个对象，它和刚才的对象是“相同”的，也就是equals方法返回true，正常情况下，通过这个“相同”的对象，我们应该能获取之前放入的值，但是，如果这两个“相同”的对象hashCode不相同，那么当get(Object o)时，“相同”的对象被映射到了不同的数组位置上，导致获取结果为null。

源码剖析

首先需要知道HashMap中的涉及到的几个变量及内部数据结构，如下

/** * The load factor used when none specified in constructor. * 扩容比例 */ static final float DEFAULT_LOAD_FACTOR = 0.75f; /** * The bin count threshold for using a tree rather than list for a * bin. Bins are converted to trees when adding an element to a * bin with at least this many nodes. The value must be greater * than 2 and should be at least 8 to mesh with assumptions in * tree removal about conversion back to plain bins upon * shrinkage. * 链表转换为红黑树的阈值，该阈值指定了元素最少达到8个才有可能转换为红黑树，不是一定 */ static final int TREEIFY_THRESHOLD = 8; /** * The bin count threshold for untreeifying a (split) bin during a * resize operation. Should be less than TREEIFY_THRESHOLD, and at * most 6 to mesh with shrinkage detection under removal. * 用来控制缩容时从红黑树转换为链表的阈值 */ static final int UNTREEIFY_THRESHOLD = 6; /** * The smallest table capacity for which bins may be treeified. * (Otherwise the table is resized if too many nodes in a bin.) * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts * between resizing and treeification thresholds. * 转换为红黑树时，数组的最小容量 */ static final int MIN_TREEIFY_CAPACITY = 64; /** * Basic hash bin node, used for most entries. (See below for * TreeNode subclass, and in LinkedHashMap for its Entry subclass.) * 这是JDK8中，用来封装哈希表中链表元素的对象，添加到HashMap中的每个键值对最初都会封装成Node节点 */ static class Node<K,V> implements Map.Entry<K,V> { final int hash; // 记录键的哈希值 final K key; // 键 V value; // 值 Node<K,V> next; // 链接的下一个Node节点 Node(int hash, K key, V value, Node<K,V> next) { this.hash = hash; this.key = key; this.value = value; this.next = next; } public final K getKey() { return key; } public final V getValue() { return value; } public final String toString() { return key + "=" + value; } public final int hashCode() { return Objects.hashCode(key) ^ Objects.hashCode(value); } public final V setValue(V newValue) { V oldValue = value; value = newValue; return oldValue; } public final boolean equals(Object o) { if (o == this) return true; if (o instanceof Map.Entry) { Map.Entry<?,?> e = (Map.Entry<?,?>)o; if (Objects.equals(key, e.getKey()) && Objects.equals(value, e.getValue())) return true; } return false; } } /** * Entry for Tree bins. Extends LinkedHashMap.Entry (which in turn * extends Node) so can be used as extension of either regular or * linked node. * 红黑树数据结构，当产生哈希冲突，链表转换为红黑树时，元素节点被封装成该结构 */ static final class TreeNode<K,V> extends LinkedHashMap.Entry<K,V> { TreeNode<K,V> parent; // 父节点，红黑树中判断父子关系 TreeNode<K,V> left; // 左子节点 TreeNode<K,V> right; // 右子节点 TreeNode<K,V> prev; // 前一个节点，也就是说，红黑树不仅仅包含树结构，还包含链表结构 boolean red; // 是否为红节点，用于红黑树特性 TreeNode(int hash, K key, V val, Node<K,V> next) { super(hash, key, val, next); } /** * Tree version of putVal. * 添加新的节点 */ final TreeNode<K,V> putTreeVal(HashMap<K,V> map, Node<K,V>[] tab, int h, K k, V v) { Class<?> kc = null; boolean searched = false; // 红黑树添加节点，从根节点添加，先找到根节点 TreeNode<K,V> root = (parent != null) ? root() : this; // 开始从根节点遍历，找到添加的位置 for (TreeNode<K,V> p = root;;) { int dir, ph; K pk; // dir就是左右子树，根据大小，判断遍历分支 // 先通过哈希值判断，哈希值小的放在左边 if ((ph = p.hash) > h) dir = -1; else if (ph < h) dir = 1; // 哈希值相同的话，判断key是否相等，相等就返回，通过调用方的后续方法判断是否覆盖旧值 else if ((pk = p.key) == k || (k != null && k.equals(pk))) return p; // 判断是否实现了Comparable接口来判断键大小，如果通过接口方法判断大小相等，那么用默认的排序方法判断大小 else if ((kc == null && (kc = comparableClassFor(k)) == null) || (dir = compareComparables(kc, k, pk)) == 0) { // 在左右子树中分别查找指定的键，存在就返回 if (!searched) { TreeNode<K,V> q, ch; searched = true; if (((ch = p.left) != null && (q = ch.find(h, k, kc)) != null) || ((ch = p.right) != null && (q = ch.find(h, k, kc)) != null)) return q; } // 默认方法判断大小 dir = tieBreakOrder(k, pk); } TreeNode<K,V> xp = p; // 根据dir的值，判断添加到左子树还是右子树，如果左右都不是空，那么继续循环遍历，直到为空时，添加键值对 if ((p = (dir <= 0) ? p.left : p.right) == null) { Node<K,V> xpn = xp.next; TreeNode<K,V> x = map.newTreeNode(h, k, v, xpn); if (dir <= 0) xp.left = x; else xp.right = x; xp.next = x; x.parent = x.prev = xp; if (xpn != null) ((TreeNode<K,V>)xpn).prev = x; // 平衡红黑树 moveRootToFront(tab, balanceInsertion(root, x)); return null; } } } // 方法较多，这里略去一部分，后文会对个别方法针对剖析 // ...... }

了解了相关变量和数据结构后，我们开始分析put方法，当添加的多个键值对的键不相等并且hashCode一致时，会产生hash冲突，形成链表，当继续添加类似键值对时，就会转化为红黑树。通过分析put方法，来分析什么时候转换为红黑树，是否是链表长度大于8，就一定会转化为红黑树呢？

/** * 内部调用putVal方法 */ public V put(K key, V value) { return putVal(hash(key), key, value, false, true); } /** * Implements Map.put and related methods * * @param hash hash for key 键的哈希值 * @param key the key 键 * @param value the value to put 值 * @param onlyIfAbsent if true, don't change existing value 表示是否覆盖添加，true的话就不覆盖，也就是说，只有不存在的时候才会添加 * @param evict if false, the table is in creation mode. * @return previous value, or null if none */ final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node<K,V>[] tab; Node<K,V> p; int n, i; // 初始化时table是空，第一次添加元素时需要初始化数组大小 if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length; // 通过数组长度和键的hash值做&操作，计算键在数组的索引位置，这个&操作相当于取模，不过&操作会更快，这也是为什么HashMap中的数组长度是2的幂次方的原因，方便做取模操作。计算出位置后，如果该位置为空，就创建一个Node节点，放在该位置即可。 if ((p = tab[i = (n - 1) & hash]) == null) tab[i] = newNode(hash, key, value, null); else { // 如果索引位置已经有值，也就是产生了哈希冲突 Node<K,V> e; K k; // 通过哈希值和==或者equals方法，判断当前添加的键值对的键是否和索引位置链表（可能只有一个元素，并未形成链表）头部节点或者红黑树的根节点是否一致，如果一致，说明键重复，在后边根据onlyIfAbsent进行处理 if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; // 如果键和链表头结点的键不一致，那么根据节点是链表节点Node还是红黑树节点TreeNode分别进行处理 else if (p instanceof TreeNode) // 如果是红黑树的根节点，调用红黑树的putTreeVal方法添加新的节点，上文对该方法已经做了说明 e = ((TreeNode<K,V>)p).putTreeVal(this, tab, hash, key, value); else { // 如果不是红黑树，而是链表，那么，按照链表顺序，向后遍历 for (int binCount = 0; ; ++binCount) { // 如果遍历到当前节点已经为空了，那么说明链表上没有节点与待添加键值对的键相同，那么构造一个新的Node添加到链表尾部 if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // 这里是重点！！！当遍历的次数，其实等价于链表的长度，大于等于阈值-1时，会调用treeifyBin方法，字面意思是转化为树形容器，下文分析 if (binCount >= TREEIFY_THRESHOLD - 1) // -1 for 1st treeifyBin(tab, hash); break; } // 这里还是判断待添加键值对的键是否存在 if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) break; p = e; } } // 如果e不是空，说明键已经存在 if (e != null) { // existing mapping for key V oldValue = e.value; // 判断参数，是否替换变量 if (!onlyIfAbsent || oldValue == null) e.value = value; afterNodeAccess(e); return oldValue; } } ++modCount; if (++size > threshold) // 扩容 resize(); afterNodeInsertion(evict); return null; }

putVal方法中，我们需要重点分析treeifyBin方法。该方法就是将链表转换为红黑树的方法，我们来看一下究竟是如何处理的

/** * Replaces all linked nodes in bin at index for given hash unless * table is too small, in which case resizes instead. */ final void treeifyBin(Node<K,V>[] tab, int hash) { int n, index; Node<K,V> e; // 重点是这里，当HashMap的数组是空或者长度小变量MIN_TREEIFY_CAPACITY的值时，调用了resize()方法，来扩容，然后就没有其他操作了，那么扩容操作，转换红黑树了吗？看后边对resize()方法的剖析 if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) resize(); // 定位数组索引位置节点，该处节点不应该为空，如果是空，那么添加元素的时候应该是直接封装一个Node节点放在该位置上 else if ((e = tab[index = (n - 1) & hash]) != null) { TreeNode<K,V> hd = null, tl = null; // 循环链表，把链表中的每个Node封装成TreeNode，通过TreeNode的pre链接起来，成为一个新的链表 do { TreeNode<K,V> p = replacementTreeNode(e, null); if (tl == null) hd = p; else { p.prev = tl; tl.next = p; } tl = p; } while ((e = e.next) != null); // 替换新的链表头结点TreeNode到数组元素 if ((tab[index] = hd) != null) // 调用TreeNode的treeify方法，形成真正的树形结构 hd.treeify(tab); } }

通过上边这个方法，我们看出来，当HashMap底层数组的长度不够MIN_TREEIFY_CAPACITY变量指定的值时（该值默认是64），只是做了扩容，那我们来分析一下

/** * Initializes or doubles table size. If null, allocates in * accord with initial capacity target held in field threshold. * Otherwise, because we are using power-of-two expansion, the * elements from each bin must either stay at same index, or move * with a power of two offset in the new table. * * 该方法用来初始化数组或者把数组变为两倍大小。 * 初始化大小是根据初始容量分配的。 * 因为使用的是2倍扩容，扩容后，原来位置的元素，一定是在原来的位置或者移动到2的幂次方的偏移量的位置，关于这句话，下边具体分析。 * @return the table */ final Node<K,V>[] resize() { Node<K,V>[] oldTab = table; int oldCap = (oldTab == null) ? 0 : oldTab.length; int oldThr = threshold; int newCap, newThr = 0; if (oldCap > 0) { if (oldCap >= MAXIMUM_CAPACITY) { threshold = Integer.MAX_VALUE; return oldTab; } // 定义新的容量为旧容量的2倍 else if ((newCap = oldCap << 1) < MAXIMUM_CAPACITY && oldCap >= DEFAULT_INITIAL_CAPACITY) newThr = oldThr << 1; // double threshold } else if (oldThr > 0) // initial capacity was placed in threshold newCap = oldThr; else { // zero initial threshold signifies using defaults newCap = DEFAULT_INITIAL_CAPACITY; newThr = (int)(DEFAULT_LOAD_FACTOR * DEFAULT_INITIAL_CAPACITY); } if (newThr == 0) { float ft = (float)newCap * loadFactor; newThr = (newCap < MAXIMUM_CAPACITY && ft < (float)MAXIMUM_CAPACITY ? (int)ft : Integer.MAX_VALUE); } threshold = newThr; // 定义新的Node数组 @SuppressWarnings({"rawtypes","unchecked"}) Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap]; table = newTab; if (oldTab != null) { // 遍历原来的数组，把原来的数组的数据重新哈希到新的数组上 for (int j = 0; j < oldCap; ++j) { Node<K,V> e; if ((e = oldTab[j]) != null) { oldTab[j] = null; // 如果索引位置元素没有构成链表或红黑树，那么重新计算索引位置即可 if (e.next == null) newTab[e.hash & (newCap - 1)] = e; // 如果是红黑树，因为hash可能不同，所以需要将树中所有元素按hash值拆开分散到不同的索引上，后文具体分析 else if (e instanceof TreeNode) ((TreeNode<K,V>)e).split(this, newTab, j, oldCap); // 如果是链表，也需要根据哈希值，重新计算位置，不过这里使用了一种技巧，不是通过hash和newCap-1进行&操作计算的 else { // preserve order Node<K,V> loHead = null, loTail = null; Node<K,V> hiHead = null, hiTail = null; Node<K,V> next; do { next = e.next; // 技巧就在这里，用hash和旧的容量值做&操作，注意，这里不是newCap-1，也不是oldCap-1，而是oldCap，关于这里的剖析，见下文文字详解 if ((e.hash & oldCap) == 0) { if (loTail == null) loHead = e; else loTail.next = e; loTail = e; } else { if (hiTail == null) hiHead = e; else hiTail.next = e; hiTail = e; } } while ((e = next) != null); if (loTail != null) { loTail.next = null; newTab[j] = loHead; } if (hiTail != null) { hiTail.next = null; newTab[j + oldCap] = hiHead; } } } } } return newTab; }

现在来详细分析一下在resize重新确认索引是，链表的操作中，为什么使用的是e.hash * oldCap呢？这还要从方法注释上的那句话开始，因为使用的是2倍扩容，扩容后，原来位置的元素，一定是在原来的位置或者移动到2的幂次方的偏移量的位置，为什么会是这样呢？看下图其中(a)表示扩容前对两个key进行hash，(b)表示扩容后对相同的两个key进行hash。从结果可以看出key2重新计算位置后，位置改变了，较原来移动了2的2次幂的偏移量。这是因为当扩容为2倍之后，容量-1的高位增加了一个1，就是图片中红色部分，所以才会有要么位置不变，要么偏移到2的幂次方的位置，而且，当容量-1的高位增加一位1后，哈希值中对应于这位的bit位的值如果是1，那么重新计算后，位置就会改变，如果是0，位置就不会改变，所以，重新计算后位置是否改变，只需要判断hash的高一位即可，那么通过什么来判断呢，就是旧的容量，因为容量都是2的幂次方，所以旧的容量的二进制表示中唯一的1和扩容后容量-1的二进制表示的最高位的1在相同的位置，所以，用hash和旧的容量做&操作，其实就是判断一下hash上对应的那一个bit位是1还是0，如果是0，那么表示位置不变，如果是1，表示位置改变。

回到正题，继续讨论链表转换为红黑树的问题，通过分析扩容方法，我们发现，扩容时，并没有把链表转换成红黑树。

总结

当我们添加一个键值对的时候，内部调用putVal方法，当链表到达指定变量值（默认为8）时，会触发treeifyBin方法，而在这个方法中，会判断数组长度，当不够默认值64的时候，只会触发扩容方法，不会转换为红黑树。

最新回复(0)