@@ -786,7 +786,7 @@ ____________________________________________________________________________-->
786
786
<secondary>multivariate</secondary>
787
787
</indexterm>
788
788
789
- <sect2>
789
+ <sect2 id="functional-dependencies" >
790
790
<!--==========================orignal english content==========================
791
791
<title>Functional Dependencies</title>
792
792
____________________________________________________________________________-->
@@ -949,7 +949,7 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
949
949
</para>
950
950
</sect2>
951
951
952
- <sect2>
952
+ <sect2 id="multivariate-ndistinct-counts" >
953
953
<!--==========================orignal english content==========================
954
954
<title>Multivariate N-Distinct Counts</title>
955
955
____________________________________________________________________________-->
@@ -1034,6 +1034,216 @@ EXPLAIN (ANALYZE, TIMING OFF) SELECT COUNT(*) FROM t GROUP BY a, b;
1034
1034
</para>
1035
1035
1036
1036
</sect2>
1037
+
1038
+ <sect2 id="mcv-lists">
1039
+ <!--==========================orignal english content==========================
1040
+ <title>MCV Lists</title>
1041
+ ____________________________________________________________________________-->
1042
+ <title>MCV 列表</title>
1043
+
1044
+ <!--==========================orignal english content==========================
1045
+ <para>
1046
+ As explained in <xref linkend="functional-dependencies"/>, functional
1047
+ dependencies are very cheap and efficient type of statistics, but their
1048
+ main limitation is their global nature (only tracking dependencies at
1049
+ the column level, not between individual column values).
1050
+ </para>
1051
+ ____________________________________________________________________________-->
1052
+ <para>
1053
+ 如 <xref linkend="functional-dependencies"/>中所述,函数依赖是非常廉价和高效的统计类型,但它们的主要限制是其全局特性(仅跟踪列级别的依赖项,而不是在单个列值之间)。
1054
+ </para>
1055
+
1056
+ <!--==========================orignal english content==========================
1057
+ <para>
1058
+ This section introduces multivariate variant of <acronym>MCV</acronym>
1059
+ (most-common values) lists, a straightforward extension of the per-column
1060
+ statistics described in <xref linkend="row-estimation-examples"/>. These
1061
+ statistics address the limitation by storing individual values, but it is
1062
+ naturally more expensive, both in terms of building the statistics in
1063
+ <command>ANALYZE</command>, storage and planning time.
1064
+ </para>
1065
+ ____________________________________________________________________________-->
1066
+ <para>
1067
+ 本节介绍<acronym>MCV</acronym>(最常见值)列表的多变量变体, <xref linkend="row-estimation-examples"/> 中描述的每列统计数据的简单扩展。
1068
+ 这些统计数据通过存储单独的值来解决这个限制,但是就构建<command>ANALYZE</command>中的统计数据、存储和规划时间而言,它的成本自然更高。
1069
+ </para>
1070
+
1071
+ <!--==========================orignal english content==========================
1072
+ <para>
1073
+ Let's look at the query from <xref linkend="functional-dependencies"/>
1074
+ again, but this time with a <acronym>MCV</acronym> list created on the
1075
+ same set of columns (be sure to drop the functional dependencies, to
1076
+ make sure the planner uses the newly created statistics).
1077
+
1078
+ <programlisting>
1079
+ DROP STATISTICS stts;
1080
+ CREATE STATISTICS stts2 (mcv) ON a, b FROM t;
1081
+ ANALYZE t;
1082
+ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
1083
+ QUERY PLAN
1084
+ -−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-
1085
+ Seq Scan on t (cost=0.00..195.00 rows=100 width=8) (actual rows=100 loops=1)
1086
+ Filter: ((a = 1) AND (b = 1))
1087
+ Rows Removed by Filter: 9900
1088
+ </programlisting>
1089
+
1090
+ The estimate is as accurate as with the functional dependencies, mostly
1091
+ thanks to the table being fairly small and having a simple distribution
1092
+ with a low number of distinct values. Before looking at the second query,
1093
+ which was not handled by functional dependencies particularly well,
1094
+ let's inspect the <acronym>MCV</acronym> list a bit.
1095
+ </para>
1096
+ ____________________________________________________________________________-->
1097
+ <para>
1098
+ 让我们再看看来自<xref linkend="functional-dependencies"/>的查询,但这次在相同列集上创建了<acronym>MCV</acronym>列表(请确保删除函数依赖,以确保规划器使用新创建的统计数据)。
1099
+
1100
+ <programlisting>
1101
+ DROP STATISTICS stts;
1102
+ CREATE STATISTICS stts2 (mcv) ON a, b FROM t;
1103
+ ANALYZE t;
1104
+ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 1;
1105
+ QUERY PLAN
1106
+ -------------------------------------------------------------------------------
1107
+ Seq Scan on t (cost=0.00..195.00 rows=100 width=8) (actual rows=100 loops=1)
1108
+ Filter: ((a = 1) AND (b = 1))
1109
+ Rows Removed by Filter: 9900
1110
+ </programlisting>
1111
+
1112
+ The estimate is as accurate as with the functional dependencies, mostly
1113
+ thanks to the table being fairly small and having a simple distribution
1114
+ with a low number of distinct values. Before looking at the second query,
1115
+ which was not handled by functional dependencies particularly well,
1116
+ let's inspect the <acronym>MCV</acronym> list a bit.
1117
+ 估计值与函数依赖一样准确,这主要是由于表相当小而且具有少量不同值的简单分布。
1118
+ 在查看第二个查询之前,这个函数依赖处理得不是很好,让我们先检查一下<acronym>MCV</acronym>列表。
1119
+ </para>
1120
+
1121
+ <!--==========================orignal english content==========================
1122
+ <para>
1123
+ Inspecting the <acronym>MCV</acronym> list is possible using
1124
+ <function>pg_mcv_list_items</function> set-returning function.
1125
+
1126
+ <programlisting>
1127
+ SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
1128
+ pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts2';
1129
+ index | values | nulls | frequency | base_frequency
1130
+ -−-−-−-+-−-−-−-−-−+-−-−-−-+-−-−-−-−-−-+-−-−-−-−-−-−-−-−
1131
+ 0 | {0, 0} | {f,f} | 0.01 | 0.0001
1132
+ 1 | {1, 1} | {f,f} | 0.01 | 0.0001
1133
+ ...
1134
+ 49 | {49, 49} | {f,f} | 0.01 | 0.0001
1135
+ 50 | {50, 50} | {f,f} | 0.01 | 0.0001
1136
+ ...
1137
+ 97 | {97, 97} | {f,f} | 0.01 | 0.0001
1138
+ 98 | {98, 98} | {f,f} | 0.01 | 0.0001
1139
+ 99 | {99, 99} | {f,f} | 0.01 | 0.0001
1140
+ (100 rows)
1141
+ </programlisting>
1142
+
1143
+ This confirms there are 100 distinct combinations in the two columns, and
1144
+ all of them are about equally likely (1% frequency for each one). The
1145
+ base frequency is the frequency computed from per-column statistics, as if
1146
+ there were no multi-column statistics. Had there been any null values in
1147
+ either of the columns, this would be identified in the
1148
+ <structfield>nulls</structfield> column.
1149
+ </para>
1150
+ ____________________________________________________________________________-->
1151
+ <para>
1152
+ 可以使用<function>pg_mcv_list_items</function>集返回函数检查<acronym>MCV</acronym>列表。
1153
+
1154
+ <programlisting>
1155
+ SELECT m.* FROM pg_statistic_ext join pg_statistic_ext_data on (oid = stxoid),
1156
+ pg_mcv_list_items(stxdmcv) m WHERE stxname = 'stts2';
1157
+ index | values | nulls | frequency | base_frequency
1158
+ -------+----------+-------+-----------+----------------
1159
+ 0 | {0, 0} | {f,f} | 0.01 | 0.0001
1160
+ 1 | {1, 1} | {f,f} | 0.01 | 0.0001
1161
+ ...
1162
+ 49 | {49, 49} | {f,f} | 0.01 | 0.0001
1163
+ 50 | {50, 50} | {f,f} | 0.01 | 0.0001
1164
+ ...
1165
+ 97 | {97, 97} | {f,f} | 0.01 | 0.0001
1166
+ 98 | {98, 98} | {f,f} | 0.01 | 0.0001
1167
+ 99 | {99, 99} | {f,f} | 0.01 | 0.0001
1168
+ (100 rows)
1169
+ </programlisting>
1170
+
1171
+ 这确认了在这两列中有100种不同的组合,并且它们的概率都差不多(每个的频率为1%)。
1172
+ 基本频率是从每列统计数据中计算的频率,好像没有多列统计数据。如果任一列中有任何空值,将在 <structfield>nulls</structfield>列中标识。
1173
+ </para>
1174
+
1175
+ <!--==========================orignal english content==========================
1176
+ <para>
1177
+ When estimating the selectivity, the planner applies all the conditions
1178
+ on items in the <acronym>MCV</acronym> list, and then sums the frequencies
1179
+ of the matching ones. See <function>mcv_clauselist_selectivity</function>
1180
+ in <filename>src/backend/statistics/mcv.c</filename> for details.
1181
+ </para>
1182
+ ____________________________________________________________________________-->
1183
+ <para>
1184
+ 在估计选择性时,规划器对<acronym>MCV</acronym>列表中的项目应用所有条件,然后对匹配项的频率求和。
1185
+ 详情请参阅<filename>src/backend/statistics/mcv.c</filename>中的<function>mcv_clauselist_selectivity</function>。
1186
+ </para>
1187
+
1188
+ <!--==========================orignal english content==========================
1189
+ <para>
1190
+ Compared to functional dependencies, <acronym>MCV</acronym> lists have two
1191
+ major advantages. Firstly, the list stores actual values, making it possible
1192
+ to decide which combinations are compatible.
1193
+
1194
+ <programlisting>
1195
+ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 10;
1196
+ QUERY PLAN
1197
+ -−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-
1198
+ Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual rows=0 loops=1)
1199
+ Filter: ((a = 1) AND (b = 10))
1200
+ Rows Removed by Filter: 10000
1201
+ </programlisting>
1202
+
1203
+ Secondly, <acronym>MCV</acronym> lists handle a wider range of clause types,
1204
+ not just equality clauses like functional dependencies. For example,
1205
+ consider the following range query for the same table:
1206
+
1207
+ <programlisting>
1208
+ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a <= 49 AND b > 49;
1209
+ QUERY PLAN
1210
+ -−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-−-
1211
+ Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual rows=0 loops=1)
1212
+ Filter: ((a <= 49) AND (b > 49))
1213
+ Rows Removed by Filter: 10000
1214
+ </programlisting>
1215
+
1216
+ </para>
1217
+ ____________________________________________________________________________-->
1218
+ <para>
1219
+ 与函数依赖相比,<acronym>MCV</acronym>列表有两大主要优点。
1220
+ 首先,列表存储实际值,从而可以决定哪些组合是兼容的。
1221
+
1222
+ <programlisting>
1223
+ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a = 1 AND b = 10;
1224
+ QUERY PLAN
1225
+ ---------------------------------------------------------------------------
1226
+ Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual rows=0 loops=1)
1227
+ Filter: ((a = 1) AND (b = 10))
1228
+ Rows Removed by Filter: 10000
1229
+ </programlisting>
1230
+
1231
+ 第二,<acronym>MCV</acronym> 列表处理更广泛的子句类型,而不仅仅是类似函数依赖的相等子句。
1232
+ 例如,请考虑对同一表的以下范围查询:
1233
+
1234
+ <programlisting>
1235
+ EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t WHERE a <= 49 AND b > 49;
1236
+ QUERY PLAN
1237
+ ---------------------------------------------------------------------------
1238
+ Seq Scan on t (cost=0.00..195.00 rows=1 width=8) (actual rows=0 loops=1)
1239
+ Filter: ((a <= 49) AND (b > 49))
1240
+ Rows Removed by Filter: 10000
1241
+ </programlisting>
1242
+
1243
+ </para>
1244
+
1245
+ </sect2>
1246
+
1037
1247
</sect1>
1038
1248
1039
1249
<sect1 id="planner-stats-security">
0 commit comments