首页 > 数据库 >What is the difference between Mysql InnoDB B+ tree index and hash index? Why does MongoDB use B-tre

What is the difference between Mysql InnoDB B+ tree index and hash index? Why does MongoDB use B-tre

时间:2024-04-05 23:33:28浏览次数:41  
标签:node index What leaf tree nodes data

原文:What is the difference between Mysql InnoDB B+ tree index and hash index? Why does MongoDB use B-tree? | by Mina Ayoub | Medium

The most important difference between B-tree and B+ tree is that B+ tree only has leaf nodes to store data, and other nodes are used for indexing, while B-trees have Data fields for each index node.

B+ tree

The B+ tree is a balanced lookup tree (not a binary tree) designed for disks and other storage aids. In the B+ tree, all recorded nodes are stored in the leaf nodes of the same layer in order of size, and each leaf node is connected by a pointer.

The B+ tree index in the database is divided into a clustered index and a secondary index. The commonality of the two indexes is that the internal B+ tree is balanced, and the leaf nodes store all the data. The difference is whether the leaf node stores an entire row of data.

The B+ tree has the following characteristics :

  • The B+ tree can contain more nodes per node for two reasons, one is to reduce the height of the tree. The other is to change the data range into multiple intervals. The more the interval, the faster the data retrieval.
  • Each node no longer just stores a key, it can store multiple keys.
  • Non-leaf nodes store keys, and leaf nodes store keys and data.
  • The leaf nodes are linked to each other by two or two pointers, and the sequential query performance is higher.

Popular speaking

  • The non-leaf nodes of the B+ tree only store keys, occupying a very small space, so the data range that each layer of nodes can index is much wider. In other words, more data can be searched for each IO operation.
  • The leaf nodes are connected in pairs, which conforms to the read-ahead characteristics of the disk. For example, the leaf node stores 50 and 55, which has a pointer to the leaf nodes 60 and 62. When we read the data corresponding to 50 and 55 from the disk, due to the read-ahead characteristics of the disk, we will put 60 and 62 by the way. The corresponding data is read out. This time is a sequential read, not a disk seek, speeding up.
  • Support range query, and partial range query is very efficient, each node can index a larger and more accurate range, which means that the B+ tree single disk IO information is larger than the B-tree, and the I/O efficiency is higher.

The reason is that the data is stored in the leaf node layer, and there are pointers to other leaf nodes, so the range query only needs to traverse the leaf node layer, without the whole tree traversal.

Local principle and disk read-ahead

Due to the gap between disk access speed and memory, in order to improve efficiency, disk I/O should be minimized. Disks are often not read strictly on demand, but are read-ahead each time. After the disk reads the required data, it will Read a certain length of data backwards in memory. The theoretical basis for doing so is the well-known local principle in computer science:

When a piece of data is used, the data in its vicinity is usually used immediately, and the data required during the running of the program is usually concentrated.

B-tree

B-tree, where B is balance (balanced meaning), B-tree is a multi-path self-balancing search tree. It is similar to a normal balanced binary tree. The difference is that B-tree allows each node to have more Child node.

B-tree has the following characteristics

  • All key values ​​are distributed throughout the tree.
  • Any keyword appears and only appears in one node.
  • The search may end at a non-leaf node.
  • Do a lookup in the full set of keywords, performance approaching binary search.

The difference between B-tree and B+ tree

  • The nodes in the B+ tree do not store data, and all data stored in the leaf nodes causes the query time complexity to be fixed to log n.
  • The B-tree query time complexity is not fixed, and is related to the position of the key in the tree, preferably O(1).
  • The B+ leaf nodes are connected in pairs, which can greatly increase the interval accessibility, and can be used in range query.
  • B-tree Each node key and data together, can not find the interval.
  • The B+ tree is more suitable for external storage (storing disk data). Since the inner nodes have no data fields, each node can index a larger and more precise range.

Why does MongoDB use B-tree?

The nodes in the B+ tree do not store data, and all data stored in the leaf nodes causes the query time complexity to be fixed to log n. The B-tree query time complexity is not fixed, and it is related to the position of the key in the tree, preferably O(1).

We have said that as little disk IO as possible is an effective way to improve performance. MongoDB is a converged database, and the B-tree happens to be a cluster of key and data domains .

As for why MongoDB uses B-tree instead of B+ tree, it can be considered from the perspective of its design. It is not a traditional relational database, but a JSON format as a stored nosql. The purpose is high performance, high availability, and easy expansion. . First of all, it gets rid of the relational model. The advantages and 2 requirements described above are not so strong. Secondly, because Mysql uses B+ tree, the data is on the leaf node. Every query needs to access the leaf node, and MongoDB uses B-tree. All nodes have a Data field. As long as the specified index is found, it can be accessed. Undoubtedly, the average query is faster than Mysql .

Hash index

Simply put, the hash index uses a certain hash algorithm to convert the key value into a new hash value. The search does not need to be searched from the root node to the leaf node step by step like a B+ tree. Only one hash algorithm is needed. You can immediately locate the corresponding location, which is very fast.

The difference between B+ tree index and hash index

  • If it is an equivalence query, then the hash index obviously has an absolute advantage , because only one algorithm is needed to find the corresponding key value; of course, the premise is that the key value is unique. If the key value is not unique, you need to find the location of the key first, and then scan backward according to the linked list until you find the corresponding data.
  • If it is a range query retrieval, this time the hash index is useless , because the original orderly key value, after the hash algorithm, may become discontinuous, there is no way to use the index to complete the scope Query retrieval.
  • In the same way, the hash index can’t use the index to complete the sorting, and the partial fuzzy query like ‘xxx%’ (this partial fuzzy query is actually a range query in essence).
  • Hash indexes also do not support the leftmost matching rule for multicolumn joint indexes .
  • The keyword retrieval efficiency of the B+ tree index is relatively average, and the fluctuation range is not as large as that of the B-tree. In the case of a large number of repeated key values, the efficiency of the hash index is extremely low because there is a so-called hash collision problem.

标签:node,index,What,leaf,tree,nodes,data
From: https://www.cnblogs.com/lihan829/p/18116940

相关文章

  • WPF实现树形下拉列表框(TreeComboBox)
    前言树形下拉菜单是许多WPF应用程序中常见的用户界面元素,它能够以分层的方式展示数据,提供更好的用户体验。本文将深入探讨如何基于WPF创建一个可定制的树形下拉菜单控件,涵盖从原理到实际实现的关键步骤。一、需求分析    树形下拉菜单控件的核心是将ComboBox与TreeVi......
  • lessc assets/index.less assets/index.css这个命令的作用是什么?
    lesscassets/index.lessassets/index.css这个命令的作用是什么?lesscassets/index.lessassets/index.css这条命令是用来编译Less样式表文件的。具体来说,它的作用如下:lessc:这是Less编译器的命令行工具(lesscstandsforlesscompiler)。它是Less预处理器的一个组成......
  • 【WPF应用35】深度解析WPF中的TreeView控件:功能、用法、特性与最佳实践
    WPF(WindowsPresentationFoundation)是微软推出的一个用于构建桌面应用程序的图形子系统。在WPF中,TreeView是一种常用的树形控件,用于显示层次结构的数据显示。本文将详细介绍WPF中的TreeView控件,并提供一个简单的示例。一、TreeView控件的基本概念TreeView控件用于显示一......
  • TreeSet自定义对象compareTo(Object o)方法
    java小白,最近学到TreeSet,我们都知道在存储自定义对象时,需要使用Comparable或使用Comparator存储。刚刚碰到这样一段代码。publicclassPersonimplementsComparable{intage;Stringname;Person(intage,Stringname){this.age=age;th......
  • node.js启动文件服务器 并自动查询index.html等默认文件
    方法1'usestrict';consthttp=require('http'),fs=require('fs'),url=require('url'),path=require('path');//从命令行参数获取root目录,默认是当前目录varroot=path.resolve(process.argv[2]||'.&......
  • vue项目打包发现index.js加载了两次?差别在于请求头purpose:prefetch
    上线后打开f12,开启禁用缓存=>发现index.js加载了两次;6.6M直接双倍流量;一番研究得知:vue-cli打包时,会将一些文件preload和prefetch;(1)preload(预先加载文件)app.jsvendor.js(2)prefetch(闲时加载)index.js/router懒加载独立打包的文件【就是那串注释标明打包在xx模块用的】再者:启......
  • WHAT - 值得掌握的 computed 计算属性机制
    目录一、介绍二、计算属性vs方法:缓存优势三、计算属性vswatch1.主要区别:目的和用法2.watch性能问题四、计算属性底层实现五、计算属性只读和可写六、最佳实践1.不应该有副作用2.避免直接修改计算属性值一、介绍参考阅读:vue3官方文档......
  • 记录解决QT环境变量、qwt环境搭建、cannot load QT5core.dll错误、TreeWidget与TabWid
    一、配置QT环境变量:依次打开:设置->系统->关于->高级系统设置->环境变量->系统变量(s)->Path->编辑,将QT安装目录中以下文件路径复制粘贴至Path中:D:\BaiDuWangPan\SoftWare\QT_551\5.5\mingw492_32\binD:\BaiDuWangPan\SoftWare\QT_551\Tools\mingw492_32\bin相关解决方法可借鉴......
  • MySQL的多层SP中Cursor的m_max_cursor_index相关BUG分析
    源码分析丨MySQL的多层SP中Cursor相关BUG一、问题发现在一次开发中在sp中使用多层cursor的时候想知道每层的m_max_cursor_index值分别是多少,以用来做后续开发。于是做了以下的试验,但是发现第一个level=2那层的m_max_cursor_index的值有点问题。注:本次使用的MySQL数据库版本为......
  • P3521 [POI2011] ROT-Tree Rotations
    ​P3521[POI2011]ROT-TreeRotations线段树合并首先左右子树交换只会改变「跨过左右子树的逆序对」数量,对其他逆序对不会有任何影响,所以我们选择对每个结点的左右子树求解,判断是否交换。考虑对于每个节点建一个权值线段树,那么贡献就可以在merge操作中求解,原因是在权值线段......