This project focuses on information retrieval and re-ranking method to extract relevant entries from a corpus of over 200,000 documents.
Traditional IR models such as BM25, TF-IDF, and Query Likelihood Models with Laplace, Lidstone, and Dirichlet smoothing are implemented.
To further imporve re-ranking performance, ML models including a Logistic classifier, LambdaMART, and an RNN with LSTM architecture are also developed and evaluated.
Data for this part can be downloaded here:
https://www.icloud.com/iclouddrive/093oIZ2ZycRSmv6Oy-Gy2rmjQ#ir-model-data
Data for this part can be downloaded here:
https://www.icloud.com/iclouddrive/0b7cu3NkGtgfd9N4LFA0PbmLg#passage-reanking-data