Skip to content

Commit 793e7c0

Browse files
Merge pull request opencv#18019 from pemmanuelviel:pev--multiple-kmeans-trees
* Possibility to set more than one tree for the hierarchical KMeans (default is still 1 tree). This particularly improves NN retrieval results with binary vectors, allowing better quality compared to LSH for similar processing time when speed is the criterium. * Add explanations on the FLANN's hierarchical KMeans for binary data.
1 parent 3b337a1 commit 793e7c0

File tree

2 files changed

+179
-48
lines changed

2 files changed

+179
-48
lines changed

modules/flann/include/opencv2/flann.hpp

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -191,8 +191,28 @@ class GenericIndex
191191
KDTreeIndexParams( int trees = 4 );
192192
};
193193
@endcode
194+
- **HierarchicalClusteringIndexParams** When passing an object of this type the index constructed
195+
will be a hierarchical tree of clusters, dividing each set of points into n clusters whose centers
196+
are picked among the points without further refinement of their position.
197+
This algorithm fits both floating, integer and binary vectors. :
198+
@code
199+
struct HierarchicalClusteringIndexParams : public IndexParams
200+
{
201+
HierarchicalClusteringIndexParams(
202+
int branching = 32,
203+
flann_centers_init_t centers_init = CENTERS_RANDOM,
204+
int trees = 4,
205+
int leaf_size = 100);
206+
207+
};
208+
@endcode
194209
- **KMeansIndexParams** When passing an object of this type the index constructed will be a
195-
hierarchical k-means tree. :
210+
hierarchical k-means tree (one tree by default), dividing each set of points into n clusters
211+
whose barycenters are refined iteratively.
212+
Note that this algorithm has been extended to the support of binary vectors as an alternative
213+
to LSH when knn search speed is the criterium. It will also outperform LSH when processing
214+
directly (i.e. without the use of MCA/PCA) datasets whose points share mostly the same values
215+
for most of the dimensions. It is recommended to set more than one tree with binary data. :
196216
@code
197217
struct KMeansIndexParams : public IndexParams
198218
{
@@ -201,6 +221,13 @@ class GenericIndex
201221
int iterations = 11,
202222
flann_centers_init_t centers_init = CENTERS_RANDOM,
203223
float cb_index = 0.2 );
224+
225+
KMeansIndexParams(
226+
int branching,
227+
int iterations,
228+
flann_centers_init_t centers_init,
229+
float cb_index,
230+
int trees );
204231
};
205232
@endcode
206233
- **CompositeIndexParams** When using a parameters object of this type the index created
@@ -219,7 +246,8 @@ class GenericIndex
219246
- **LshIndexParams** When using a parameters object of this type the index created uses
220247
multi-probe LSH (by Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search
221248
by Qin Lv, William Josephson, Zhe Wang, Moses Charikar, Kai Li., Proceedings of the 33rd
222-
International Conference on Very Large Data Bases (VLDB). Vienna, Austria. September 2007) :
249+
International Conference on Very Large Data Bases (VLDB). Vienna, Austria. September 2007).
250+
This algorithm is designed for binary vectors. :
223251
@code
224252
struct LshIndexParams : public IndexParams
225253
{

0 commit comments

Comments
 (0)