1
1
<!-- doc/src/sgml/unaccent.sgml -->
2
2
3
3
<sect1 id="unaccent" xreflabel="unaccent">
4
+ <!--==========================orignal english content==========================
5
+ <title>unaccent</title>
6
+ ____________________________________________________________________________-->
4
7
<title>unaccent</title>
5
8
9
+ <!--==========================orignal english content==========================
10
+ <indexterm zone="unaccent">
11
+ <primary>unaccent</primary>
12
+ </indexterm>
13
+ ____________________________________________________________________________-->
6
14
<indexterm zone="unaccent">
7
15
<primary>unaccent</primary>
8
16
</indexterm>
9
17
18
+ <!--==========================orignal english content==========================
19
+ <para>
20
+ <filename>unaccent</> is a text search dictionary that removes accents
21
+ (diacritic signs) from lexemes.
22
+ It's a filtering dictionary, which means its output is
23
+ always passed to the next dictionary (if any), unlike the normal
24
+ behavior of dictionaries. This allows accent-insensitive processing
25
+ for full text search.
26
+ </para>
27
+ ____________________________________________________________________________-->
10
28
<para>
11
29
<filename>unaccent</>是一个文本搜索字典,它能从词位中移除重音(附加符号)。它是一个过滤词典,这表示它的输出总是会被传递给下一个字典(如果有),这和字典的通常行为不同。这允许为全文搜索做与重音无关的处理。
12
30
</para>
13
31
32
+ <!--==========================orignal english content==========================
33
+ <para>
34
+ The current implementation of <filename>unaccent</> cannot be used as a
35
+ normalizing dictionary for the <filename>thesaurus</filename> dictionary.
36
+ </para>
37
+ ____________________________________________________________________________-->
14
38
<para>
15
39
<filename>unaccent</>的当前实现不能被用作<filename>thesaurus</filename>字典的正规化字典。
16
40
</para>
17
41
18
42
<sect2>
43
+ <!--==========================orignal english content==========================
44
+ <title>Configuration</title>
45
+ ____________________________________________________________________________-->
19
46
<title>配置</title>
20
47
48
+ <!--==========================orignal english content==========================
49
+ <para>
50
+ An <literal>unaccent</> dictionary accepts the following options:
51
+ </para>
52
+ ____________________________________________________________________________-->
21
53
<para>
22
54
<literal>unaccent</>字典接受下列选项:
23
55
</para>
24
56
<itemizedlist>
25
57
<listitem>
58
+ <!--==========================orignal english content==========================
59
+ <para>
60
+ <literal>RULES</> is the base name of the file containing the list of
61
+ translation rules. This file must be stored in
62
+ <filename>$SHAREDIR/tsearch_data/</> (where <literal>$SHAREDIR</> means
63
+ the <productname>PostgreSQL</> installation's shared-data directory).
64
+ Its name must end in <literal>.rules</> (which is not to be included in
65
+ the <literal>RULES</> parameter).
66
+ </para>
67
+ ____________________________________________________________________________-->
26
68
<para>
27
69
<literal>RULES</>是包含翻译规则列表的文件的基本名。这个文件必须被存储在<filename>$SHAREDIR/tsearch_data/</>(这里<literal>$SHAREDIR</>表示<productname>PostgreSQL</>安装的共享数据目录)中。它的名称必须以<literal>.rules</>(不包含在<literal>RULES</>参数中)结束。
28
70
</para>
29
71
</listitem>
30
72
</itemizedlist>
73
+ <!--==========================orignal english content==========================
74
+ <para>
75
+ The rules file has the following format:
76
+ </para>
77
+ ____________________________________________________________________________-->
31
78
<para>
32
79
规则文件具有下面的格式:
33
80
</para>
34
81
<itemizedlist>
35
82
<listitem>
83
+ <!--==========================orignal english content==========================
84
+ <para>
85
+ Each line represents one translation rule, consisting of a character with
86
+ accent followed by a character without accent. The first is translated
87
+ into the second. For example,
88
+ <programlisting>
89
+ À A
90
+ Á A
91
+ Â A
92
+ Ã A
93
+ Ä A
94
+ Å A
95
+ Æ AE
96
+ </programlisting>
97
+ The two characters must be separated by whitespace, and any leading or
98
+ trailing whitespace on a line is ignored.
99
+ </para>
100
+ ____________________________________________________________________________-->
36
101
<para>
37
102
每一行表示一个由带有重音的字符和不带重音的字符构成的对。第一个字符将被翻译成第二个。例如:
38
103
<programlisting>
42
107
à A
43
108
Ä A
44
109
Å A
45
- Æ A
110
+ Æ AE
46
111
</programlisting>
112
+ 两个字符必须由空格分隔,并且一行上的任何前导或尾随空白都将被忽略。
113
+ </para>
114
+ </listitem>
115
+
116
+ <listitem>
117
+ <!--==========================orignal english content==========================
118
+ <para>
119
+ Alternatively, if only one character is given on a line, instances of
120
+ that character are deleted; this is useful in languages where accents
121
+ are represented by separate characters.
122
+ </para>
123
+ ____________________________________________________________________________-->
124
+ <para>
125
+ 或者,如果一行只给出一个字符,则删除该字符的实例;
126
+ 这在用单独的字符表示重音的语言中是有用的。
127
+ </para>
128
+ </listitem>
129
+
130
+ <listitem>
131
+ <!--==========================orignal english content==========================
132
+ <para>
133
+ Actually, each <quote>character</> can be any string not containing
134
+ whitespace, so <filename>unaccent</> dictionaries could be used for
135
+ other sorts of substring substitutions besides diacritic removal.
136
+ </para>
137
+ ____________________________________________________________________________-->
138
+ <para>
139
+ 实际上,每个<quote>字符</>可以是不包含空格的任何字符串,因此,
140
+ 除了去除变音符之外,<filename>unaccent</>字典也可以用于其他类型的字符串替换。
141
+ </para>
142
+ </listitem>
143
+
144
+ <listitem>
145
+ <!--==========================orignal english content==========================
146
+ <para>
147
+ As with other <productname>PostgreSQL</> text search configuration files,
148
+ the rules file must be stored in UTF-8 encoding. The data is
149
+ automatically translated into the current database's encoding when
150
+ loaded. Any lines containing untranslatable characters are silently
151
+ ignored, so that rules files can contain rules that are not applicable in
152
+ the current encoding.
153
+ </para>
154
+ ____________________________________________________________________________-->
155
+ <para>
156
+ 与其他<productname>PostgreSQL</>文本搜索配置文件一样,
157
+ 规则文件必须以UTF-8编码方式存储。加载时,数据将自动转换为当前数据库的编码。
158
+ 任何含有不可翻译字符的行都将被忽略,因此规则文件可以包含当前编码中不适用的规则。
47
159
</para>
48
160
</listitem>
49
161
</itemizedlist>
50
162
163
+ <!--==========================orignal english content==========================
164
+ <para>
165
+ A more complete example, which is directly useful for most European
166
+ languages, can be found in <filename>unaccent.rules</>, which is installed
167
+ in <filename>$SHAREDIR/tsearch_data/</> when the <filename>unaccent</>
168
+ module is installed. This rules file translates characters with accents
169
+ to the same characters without accents, and it also expands ligatures
170
+ into the equivalent series of simple characters (for example, Æ to
171
+ AE).
172
+ </para>
173
+ ____________________________________________________________________________-->
51
174
<para>
52
175
在<filename>unaccent.rules</>中可以找到一个更完整的例子,它可以直接用于大部分欧洲语言,当<filename>unaccent</>模块被安装时,它被安装在<filename>$SHAREDIR/tsearch_data/</>中。
53
176
</para>
54
177
</sect2>
55
178
56
179
<sect2>
180
+ <!--==========================orignal english content==========================
181
+ <title>Usage</title>
182
+ ____________________________________________________________________________-->
57
183
<title>用法</title>
58
184
185
+ <!--==========================orignal english content==========================
186
+ <para>
187
+ Installing the <literal>unaccent</> extension creates a text
188
+ search template <literal>unaccent</> and a dictionary <literal>unaccent</>
189
+ based on it. The <literal>unaccent</> dictionary has the default
190
+ parameter setting <literal>RULES='unaccent'</>, which makes it immediately
191
+ usable with the standard <filename>unaccent.rules</> file.
192
+ If you wish, you can alter the parameter, for example
193
+
194
+ <programlisting>
195
+ mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
196
+ </programlisting>
197
+
198
+ or create new dictionaries based on the template.
199
+ </para>
200
+ ____________________________________________________________________________-->
59
201
<para>
60
202
安装<literal>unaccent</>扩展会创建一个文本搜索模板<literal>unaccent</>和一个基于前者的字典<literal>unaccent</>。<literal>unaccent</>字典有默认的参数设置<literal>RULES='unaccent'</>,这会让该字典使用标准的<filename>unaccent.rules</>文件。如果希望修改该参数,可以
61
203
@@ -66,6 +208,18 @@ mydb=# ALTER TEXT SEARCH DICTIONARY unaccent (RULES='my_rules');
66
208
或者基于该模板创建新的字典。
67
209
</para>
68
210
211
+ <!--==========================orignal english content==========================
212
+ <para>
213
+ To test the dictionary, you can try:
214
+ <programlisting>
215
+ mydb=# select ts_lexize('unaccent','Hôtel');
216
+ ts_lexize
217
+ -−-−-−-−-−-
218
+ {Hotel}
219
+ (1 row)
220
+ </programlisting>
221
+ </para>
222
+ ____________________________________________________________________________-->
69
223
<para>
70
224
要测试该字典,你可以尝试:
71
225
<programlisting>
@@ -77,6 +231,35 @@ mydb=# select ts_lexize('unaccent','Hôtel');
77
231
</programlisting>
78
232
</para>
79
233
234
+ <!--==========================orignal english content==========================
235
+ <para>
236
+ Here is an example showing how to insert the
237
+ <filename>unaccent</> dictionary into a text search configuration:
238
+ <programlisting>
239
+ mydb=# CREATE TEXT SEARCH CONFIGURATION fr ( COPY = french );
240
+ mydb=# ALTER TEXT SEARCH CONFIGURATION fr
241
+ ALTER MAPPING FOR hword, hword_part, word
242
+ WITH unaccent, french_stem;
243
+ mydb=# select to_tsvector('fr','Hôtels de la Mer');
244
+ to_tsvector
245
+ -−-−-−-−-−-−-−-−-−-
246
+ 'hotel':1 'mer':4
247
+ (1 row)
248
+
249
+ mydb=# select to_tsvector('fr','Hôtel de la Mer') @@ to_tsquery('fr','Hotels');
250
+ ?column?
251
+ -−-−-−-−-−
252
+ t
253
+ (1 row)
254
+
255
+ mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels'));
256
+ ts_headline
257
+ -−-−-−-−-−-−-−-−-−-−-−-−
258
+ <b>Hôtel</b> de la Mer
259
+ (1 row)
260
+ </programlisting>
261
+ </para>
262
+ ____________________________________________________________________________-->
80
263
<para>
81
264
这里是一个展示把<filename>unaccent</>字典插入到一个文本搜索配置的例子:
82
265
<programlisting>
@@ -106,20 +289,61 @@ mydb=# select ts_headline('fr','Hôtel de la Mer',to_tsquery('fr','Hotels')
106
289
</sect2>
107
290
108
291
<sect2>
292
+ <!--==========================orignal english content==========================
293
+ <title>Functions</title>
294
+ ____________________________________________________________________________-->
109
295
<title>函数</title>
110
296
297
+ <!--==========================orignal english content==========================
298
+ <para>
299
+ The <function>unaccent()</> function removes accents (diacritic signs) from
300
+ a given string. Basically, it's a wrapper around
301
+ <filename>unaccent</>-type dictionaries, but it can be used outside normal
302
+ text search contexts.
303
+ </para>
304
+ ____________________________________________________________________________-->
111
305
<para>
112
306
<function>unaccent()</>函数从一个给定的字符串中移除重音(附加符号)。基本上,它是<filename>unaccent</>字典的一个包装器,但是它能在普通的文本搜索环境之外使用。
113
307
</para>
114
308
309
+ <!--==========================orignal english content==========================
310
+ <indexterm>
311
+ <primary>unaccent</primary>
312
+ </indexterm>
313
+ ____________________________________________________________________________-->
115
314
<indexterm>
116
315
<primary>unaccent</primary>
117
316
</indexterm>
118
317
318
+ <!--==========================orignal english content==========================
119
319
<synopsis>
120
320
unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type>
121
321
</synopsis>
322
+ ____________________________________________________________________________-->
323
+ <synopsis>
324
+ unaccent(<optional><replaceable class="PARAMETER">dictionary</replaceable>, </optional> <replaceable class="PARAMETER">string</replaceable>) returns <type>text</type>
325
+ </synopsis>
326
+
327
+ <!--==========================orignal english content==========================
328
+ <para>
329
+ If the <replaceable class="PARAMETER">dictionary</replaceable> argument is
330
+ omitted, <literal>unaccent</> is assumed.
331
+ </para>
332
+ ____________________________________________________________________________-->
333
+ <para>
334
+ 如果省略了<replaceable class="PARAMETER">dictionary</replaceable>参数,
335
+ 则使用<literal>unaccent</>。
336
+ </para>
122
337
338
+ <!--==========================orignal english content==========================
339
+ <para>
340
+ For example:
341
+ <programlisting>
342
+ SELECT unaccent('unaccent', 'Hôtel');
343
+ SELECT unaccent('Hôtel');
344
+ </programlisting>
345
+ </para>
346
+ ____________________________________________________________________________-->
123
347
<para>
124
348
例如:
125
349
<programlisting>
0 commit comments