|
1 | 1 | <!-- doc/src/sgml/hash.sgml -->
|
2 | 2 |
|
3 | 3 | <chapter id="hash-index">
|
| 4 | +<!--==========================orignal english content========================== |
4 | 5 | <title>Hash Indexes</title>
|
| 6 | +____________________________________________________________________________--> |
| 7 | +<title>哈希索引</title> |
5 | 8 |
|
| 9 | +<!--==========================orignal english content========================== |
6 | 10 | <indexterm>
|
7 | 11 | <primary>index</primary>
|
8 | 12 | <secondary>Hash</secondary>
|
9 | 13 | </indexterm>
|
| 14 | +____________________________________________________________________________--> |
| 15 | + <indexterm> |
| 16 | + <primary>索引</primary> |
| 17 | + <secondary>哈希</secondary> |
| 18 | + </indexterm> |
10 | 19 |
|
11 | 20 | <sect1 id="hash-intro">
|
| 21 | +<!--==========================orignal english content========================== |
12 | 22 | <title>Overview</title>
|
| 23 | +____________________________________________________________________________--> |
| 24 | + <title>概述</title> |
13 | 25 |
|
| 26 | +<!--==========================orignal english content========================== |
14 | 27 | <para>
|
15 | 28 | <productname>PostgreSQL</productname>
|
16 | 29 | includes an implementation of persistent on-disk hash indexes,
|
|
20 | 33 | indexed, thus there are no restrictions on the size of the data column
|
21 | 34 | being indexed.
|
22 | 35 | </para>
|
| 36 | +____________________________________________________________________________--> |
| 37 | + <para> |
| 38 | + <productname>PostgreSQL</productname> |
| 39 | + 包含永久性磁盘上的哈希索引的实现,这些索引可完全崩溃恢复。任何数据类型都可以通过哈希索引进行索引,包括没有明确定义线性排序的数据类型。哈希索引只存储被索引数据的哈希值,因此对被索引数据列的大小没有限制。 |
| 40 | + </para> |
23 | 41 |
|
| 42 | +<!--==========================orignal english content========================== |
24 | 43 | <para>
|
25 | 44 | Hash indexes support only single-column indexes and do not allow
|
26 | 45 | uniqueness checking.
|
27 | 46 | </para>
|
| 47 | +____________________________________________________________________________--> |
| 48 | + <para> |
| 49 | + 哈希索引仅支持单列索引并且不允许唯一性检查。 |
| 50 | + </para> |
28 | 51 |
|
| 52 | +<!--==========================orignal english content========================== |
29 | 53 | <para>
|
30 | 54 | Hash indexes support only the <literal>=</literal> operator,
|
31 | 55 | so WHERE clauses that specify range operations will not be able to take
|
32 | 56 | advantage of hash indexes.
|
33 | 57 | </para>
|
| 58 | +____________________________________________________________________________--> |
| 59 | + <para> |
| 60 | + 哈希索引仅支持<literal>=</literal>运算符,因此使用了范围操作的WHERE子句将无法利用哈希索引。 |
| 61 | + </para> |
34 | 62 |
|
| 63 | +<!--==========================orignal english content========================== |
35 | 64 | <para>
|
36 | 65 | Each hash index tuple stores just the 4-byte hash value, not the actual
|
37 | 66 | column value. As a result, hash indexes may be much smaller than B-trees
|
38 | 67 | when indexing longer data items such as UUIDs, URLs, etc. The absence of
|
39 | 68 | the column value also makes all hash index scans lossy. Hash indexes may
|
40 | 69 | take part in bitmap index scans and backward scans.
|
41 | 70 | </para>
|
| 71 | +____________________________________________________________________________--> |
| 72 | + <para> |
| 73 | + 每个哈希索引元组只存储4字节的哈希值,而不是实际的列值。因此,当索引较长的数据项(如UUID、URL等)时,哈希索引可能比B树小得多。缺少列值也会使所有哈希索引扫描有损。哈希索引可以参与位图索引扫描和反向扫描。 |
| 74 | + </para> |
42 | 75 |
|
| 76 | +<!--==========================orignal english content========================== |
43 | 77 | <para>
|
44 | 78 | Hash indexes are best optimized for SELECT and UPDATE-heavy workloads
|
45 | 79 | that use equality scans on larger tables. In a B-tree index, searches must
|
|
51 | 85 | reduction in "logical I/O" becomes even more pronounced on indexes/data
|
52 | 86 | larger than shared_buffers/RAM.
|
53 | 87 | </para>
|
| 88 | +____________________________________________________________________________--> |
| 89 | + <para> |
| 90 | + 哈希索引对于在较大表上使用相等扫描的SELECT和UPDATE的繁重工作进行了最佳优化。在B树索引中,搜索必须在树中向下搜索,直到找到叶子页。在具有数百万行的表中,这种向下搜索会增加数据访问时间。哈希索引中叶子页的等价物称为桶页。相反,哈希索引允许直接访问桶页,从而可能减少较大表中的索引访问时间。在大于shared_buffers/RAM的索引/数据上,这种“逻辑I/O”的减少变得更加明显。 |
| 91 | + </para> |
54 | 92 |
|
| 93 | +<!--==========================orignal english content========================== |
55 | 94 | <para>
|
56 | 95 | Hash indexes have been designed to cope with uneven distributions of
|
57 | 96 | hash values. Direct access to the bucket pages works well if the hash
|
|
63 | 102 | might actually be worse than a B-tree in terms of number of block
|
64 | 103 | accesses required, for some data.
|
65 | 104 | </para>
|
| 105 | +____________________________________________________________________________--> |
| 106 | + <para> |
| 107 | + 哈希索引被设计用于处理哈希值的不均匀分布。如果哈希值均匀分布,则直接访问桶页效果良好。当插入使得桶页变满时,额外的溢出页链接到该特定的桶页,本地扩展存储来容纳与该哈希值匹配的索引元组。在查询期间扫描哈希桶时,我们需要扫描所有溢出页。因此,就某些数据所需的块访问数而言,不平衡散列索引实际上可能比B树更差。 |
| 108 | + </para> |
66 | 109 |
|
| 110 | +<!--==========================orignal english content========================== |
67 | 111 | <para>
|
68 | 112 | As a result of the overflow cases, we can say that hash indexes are
|
69 | 113 | most suitable for unique, nearly unique data or data with a low number
|
|
72 | 116 | values from the index using a partial index condition, but this may
|
73 | 117 | not be suitable in many cases.
|
74 | 118 | </para>
|
| 119 | +____________________________________________________________________________--> |
| 120 | + <para> |
| 121 | + 通过溢出情况得到的结论是,我们可以说哈希索引最适合于唯一、几乎唯一的数据或每个哈希桶的行数较少的数据。 |
| 122 | + 避免问题的一种可能方法是使用部分索引条件从索引中排除高度非唯一的值,但这在许多情况下可能不适用。 |
| 123 | + </para> |
75 | 124 |
|
| 125 | +<!--==========================orignal english content========================== |
76 | 126 | <para>
|
77 | 127 | Like B-Trees, hash indexes perform simple index tuple deletion. This
|
78 | 128 | is a deferred maintenance operation that deletes index tuples that are
|
|
82 | 132 | index tuples. Removal cannot occur if the page is pinned at that time.
|
83 | 133 | Deletion of dead index pointers also occurs during VACUUM.
|
84 | 134 | </para>
|
| 135 | +____________________________________________________________________________--> |
| 136 | + <para> |
| 137 | + 与B树一样,哈希索引执行简单的索引元组删除。这是一个延迟维护操作,用于删除已知可以安全删除的索引元组(其项标识符的LP_DEAD位已设置的那些)。如果insert发现页面上没有可用空间,我们会尝试通过删除死索引元组来避免创建新的溢出页面。如果此时页面已被锁定,则无法删除。VACUUM期间也会删除死索引指针。 |
| 138 | + </para> |
85 | 139 |
|
| 140 | +<!--==========================orignal english content========================== |
86 | 141 | <para>
|
87 | 142 | If it can, VACUUM will also try to squeeze the index tuples onto as
|
88 | 143 | few overflow pages as possible, minimizing the overflow chain. If an
|
|
92 | 147 | rebuilding it with REINDEX.
|
93 | 148 | There is no provision for reducing the number of buckets, either.
|
94 | 149 | </para>
|
| 150 | +____________________________________________________________________________--> |
| 151 | + <para> |
| 152 | + 如果可以,VACUUM还将尝试将索引元组压缩到尽可能少的溢出页上,以最小化溢出链。如果溢出页变为空,溢出页可以被回收以在其他桶中重用,尽管我们从未将它们返回到操作系统。目前,除了使用REINDEX重建哈希索引外,没有任何收缩哈希索引的方法。也没有减少桶数量的方法。 |
| 153 | + </para> |
95 | 154 |
|
| 155 | +<!--==========================orignal english content========================== |
96 | 156 | <para>
|
97 | 157 | Hash indexes may expand the number of bucket pages as the number of
|
98 | 158 | rows indexed grows. The hash key-to-bucket-number mapping is chosen so that
|
|
101 | 161 | its tuples being transferred to the new bucket according to the updated
|
102 | 162 | key-to-bucket-number mapping.
|
103 | 163 | </para>
|
| 164 | +____________________________________________________________________________--> |
| 165 | + <para> |
| 166 | + 哈希索引可能会随着索引行数的增加而扩展桶页数。选择哈希键到桶号的映射,以便可以增量扩展索引。当一个新的桶要添加到索引中时,只需要“拆分”一个现有的桶,其中的一些元组将根据更新后的key到桶号的映射转移到新桶。 |
| 167 | + </para> |
104 | 168 |
|
| 169 | +<!--==========================orignal english content========================== |
105 | 170 | <para>
|
106 | 171 | The expansion occurs in the foreground, which could increase execution
|
107 | 172 | time for user inserts. Thus, hash indexes may not be suitable for tables
|
108 | 173 | with rapidly increasing number of rows.
|
109 | 174 | </para>
|
| 175 | +____________________________________________________________________________--> |
| 176 | + <para> |
| 177 | + 扩展发生在前端,这可能会增加用户插入的执行时间。因此,哈希索引可能不适用于行数快速增加的表。 |
| 178 | + </para> |
110 | 179 |
|
111 | 180 | </sect1>
|
112 | 181 |
|
113 | 182 | <sect1 id="hash-implementation">
|
| 183 | +<!--==========================orignal english content========================== |
114 | 184 | <title>Implementation</title>
|
| 185 | +____________________________________________________________________________--> |
| 186 | + <title>实现</title> |
115 | 187 |
|
| 188 | +<!--==========================orignal english content========================== |
116 | 189 | <para>
|
117 | 190 | There are four kinds of pages in a hash index: the meta page (page zero),
|
118 | 191 | which contains statically allocated control information; primary bucket
|
119 | 192 | pages; overflow pages; and bitmap pages, which keep track of overflow
|
120 | 193 | pages that have been freed and are available for re-use. For addressing
|
121 | 194 | purposes, bitmap pages are regarded as a subset of the overflow pages.
|
122 | 195 | </para>
|
| 196 | +____________________________________________________________________________--> |
| 197 | + <para> |
| 198 | + 哈希索引中有四种页面:元页面(零页),其中包含静态分配的控制信息;主桶页;溢出页;和位图页,它们跟踪已释放并可供重用的溢出页。出于寻址目的,位图页被视为溢出页的子集。 |
| 199 | + </para> |
123 | 200 |
|
| 201 | +<!--==========================orignal english content========================== |
124 | 202 | <para>
|
125 | 203 | Both scanning the index and inserting tuples require locating the bucket
|
126 | 204 | where a given tuple ought to be located. To do this, we need the bucket
|
|
131 | 209 | mapping as long as the target bucket hasn't been split since the last
|
132 | 210 | cache refresh.
|
133 | 211 | </para>
|
| 212 | +____________________________________________________________________________--> |
| 213 | + <para> |
| 214 | + 扫描索引和插入元组都需要定位给定元组应该位于的桶。为此,我们需要元页面中的桶计数、highmask和lowmask;然而,出于性能原因,必须为每个此类操作加锁和锁定元页是不可取的。相反,我们在每个后端的relcache条目中保留了元页面的缓存副本。只要目标桶自上次缓存刷新后未被拆分,这将生成正确的桶映射。 |
| 215 | + </para> |
134 | 216 |
|
| 217 | +<!--==========================orignal english content========================== |
135 | 218 | <para>
|
136 | 219 | Primary bucket pages and overflow pages are allocated independently since
|
137 | 220 | any given index might need more or fewer overflow pages relative to its
|
138 | 221 | number of buckets. The hash code uses an interesting set of addressing
|
139 | 222 | rules to support a variable number of overflow pages while not having to
|
140 | 223 | move primary bucket pages around after they are created.
|
141 | 224 | </para>
|
| 225 | +____________________________________________________________________________--> |
| 226 | + <para> |
| 227 | + 主桶页和溢出页是独立分配的,因为任何给定的索引相对于其桶的数量可能需要更多或更少的溢出页。哈希代码使用一组有趣的寻址规则来支持可变数量的溢出页,而不必在创建主桶页后四处移动。 |
| 228 | + </para> |
142 | 229 |
|
| 230 | +<!--==========================orignal english content========================== |
143 | 231 | <para>
|
144 | 232 | Each row in the table indexed is represented by a single index tuple in
|
145 | 233 | the hash index. Hash index tuples are stored in bucket pages, and if
|
|
148 | 236 | used within an index page. Note however that there is *no* assumption about
|
149 | 237 | the relative ordering of hash codes across different index pages of a bucket.
|
150 | 238 | </para>
|
| 239 | +____________________________________________________________________________--> |
| 240 | + <para> |
| 241 | + 索引表中的每一行由哈希索引中的单个索引元组表示。哈希索引元组存储在桶页中,如果存在,则存储在溢出页中。我们通过将任意一个索引页中的索引项按哈希代码排序来加快搜索速度,从而允许在索引页中使用二进制搜索。但是,请注意,对于一个桶的不同索引页上的哈希代码的相对排序,没有上述排序。 |
| 242 | + </para> |
151 | 243 |
|
| 244 | +<!--==========================orignal english content========================== |
152 | 245 | <para>
|
153 | 246 | The bucket splitting algorithms to expand the hash index are too complex to
|
154 | 247 | be worthy of mention here, though are described in more detail in
|
155 | 248 | <filename>src/backend/access/hash/README</filename>.
|
156 | 249 | The split algorithm is crash safe and can be restarted if not completed
|
157 | 250 | successfully.
|
158 | 251 | </para>
|
| 252 | +____________________________________________________________________________--> |
| 253 | + <para> |
| 254 | + 用于扩展哈希索引的桶分割算法过于复杂,不值得在此提及,下面有更详细的描述<filename>src/backend/access/hash/README</filename>。拆分算法是崩溃安全的,如果未成功完成,可以重新启动。 |
| 255 | + </para> |
159 | 256 |
|
160 | 257 | </sect1>
|
161 | 258 |
|
|
0 commit comments