Evaluate the performance of multimodal retrieval algorithms with mAP on some datasets.
- CCA
- PLS
- BLM
- GMMFA
PASCAL VOC 2007. This experiment uses the formative dataset provided by [1].
Wikipedia. http://www.svcl.ucsd.edu/projects/crossmodal/
NUS-WIDE : https://lms.comp.nus.edu.sg/wp-content/uploads/2019/research/nuswide/NUS-WIDE.html
Train with 2808 samples with only one object. The feature used are GIST(visual feature) and word-frequency(tag feature) which were explained in [1].
Train with 2173 samples. The images were represented by 128-dimensional vector quantized features with SIFT and the text feature is derived from the latent Dirichlet allocation model with 10 dimensions.
Train with 10000 samples due to the limitation of machine. The feature used are 225-d block-wise color moments(visual feature) and 1000-d word-frequency vector(text feature).
For PASCAL datasets, I evaluate the model with 2841 test samples with only one object.
For Wiki datasets, I evaluate the model with 693 test samples.
For NUS-WIDE datasets, I evaluate the model with 5000 test samples.
Image-to-text : Retrieve related images with text from testset. Return an ordered list, in which each element indicates the index of retrieved image in testset**.**
Text-to-image : Retrieve related text with image from testset. Return an ordered list, in which each element indicates the index of retrieved text in testset**.**
In evaluate part, if the object/class in prediction image is same with the ground_truth, set it as true.
The computing method of mAP following the steps when it used in recommended system research area. Take [2] as reference.
[1]. Accounting for the Relative Importance of Objects in Image Retrieval, S.J.Hwang, BMVC 2010 [2]. https://zhuanlan.zhihu.com/p/74429856
Image-to-text(mAP) | Text-to-image(mAP) | |
---|---|---|
CCA | 0.1962 | 0.1754 |
PLS | 0.2266 | 0.1879 |
BLM | 0.2419 | 0.2085 |
GMMFA | 0.2424 | 0.2089 |
CCA+PCA | 0.2252 | 0.1958 |
PLS+PCA | 0.2450 | 0.2015 |
BLM+PCA | 0.2450 | 0.2045 |
GMMFA+PCA | 0.2465 | 0.2050 |
Image-to-text(mAP) | Text-to-image(mAP) | |
---|---|---|
CCA | 0.2435 | 0.1978 |
PLS | 0.2075 | 0.1654 |
BLM | 0.2589 | 0.2008 |
GMMFA | 0.2481 | 0.1997 |
CCA+PCA | 0.2649 | 0.2162 |
PLS+PCA | 0.2477 | 0.2047 |
BLM+PCA | 0.2607 | 0.2101 |
GMMFA+PCA | 0.2471 | 0.2006 |
Image-to-text(mAP) | Text-to-image(mAP) | |
---|---|---|
CCA | 0.2316 | 0.2372 |
PLS | 0.2194 | 0.2246 |
BLM | 0.2519 | 0.2510 |
GMMFA | 0.2503 | 0.2440 |
CCA+PCA | 0.2261 | 0.2391 |
PLS+PCA | 0.2331 | 0.2363 |
BLM+PCA | 0.2470 | 0.2510 |
GMMFA+PCA | 0.2507 | 0.2442 |
Image-to-text(mAP) | Text-to-image(mAP) | |
---|---|---|
CCA | 0.3552 | 0.3382 |
PLS | 0.6611 | 0.6986 |
BLM | 0.6355 | 0.6381 |
GMMFA | 0.6374 | 0.6403 |
Image-to-text(mAP) | Text-to-image(mAP) | |
---|---|---|
CCA | 0.3126 | 0.2814 |
PLS | 0.3879 | 0.3505 |
BLM | 0.3840 | 0.3650 |
GMMFA | 0.3950 | 0.3570 |
Image-to-text(mAP) | Text-to-image(mAP) | |
---|---|---|
CCA | 0.2831 | 0.2826 |
PLS | 0.4231 | 0.4110 |
BLM | 0.3949 | 0.4110 |
GMMFA | 0.4074 | 0.4095 |