Query Classification for Dynamic Sparse-Dense Fusion

With the ever increasing database of knowledge on the internet, finding the most relevant information becomes harder and harder, as it is difficult to navigate large collections of documents and texts. Information Retrieval (IR) is the process of accessing documents from a database to satisfy an user’s information need.

In recent years, researchers have introduced the idea of combining sparse and dense representations to leverage both exact term matching and semantic meaning. Hybrid approaches often involve combining the scores or representations of sparse and dense models in various ways, such as by weighted interpolation. It is also proposed that a hybrid model can be used for tracking the connections made by dense models, revealing possible biases and allowing us to improve the semantic links.

We analyze the responses of different IR techniques, specifically sparse versus dense search, as well as their hybridization, to evaluate which performs best over various categories of queries. Using content analysis, we identify different categories of queries, such as keywords versus sentences, questions versus descriptions, and more. Then, using visualization tools, such as Numpy, Scikit, and Pandas, we perform a high-level evaluation and comparison of the performance of different sparse, dense, and hybrid models, such as BM25, SPLADE, BERT, and more.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
cls_token_split_files		cls_token_split_files
feature-extraction		feature-extraction
testModels		testModels
.DS_Store		.DS_Store
.gitignore		.gitignore
MLP.py		MLP.py
MLP_class.ipynb		MLP_class.ipynb
MLP_class_NEW.ipynb		MLP_class_NEW.ipynb
MLP_class_simlm.ipynb		MLP_class_simlm.ipynb
MLP_class_upsampled.ipynb		MLP_class_upsampled.ipynb
MLP_mod.ipynb		MLP_mod.ipynb
MLP_orig.ipynb		MLP_orig.ipynb
README.md		README.md
SBERT.ipynb		SBERT.ipynb
SPLADE&BM25.ipynb		SPLADE&BM25.ipynb
SPLADE.ipynb		SPLADE.ipynb
best_weights.ipynb		best_weights.ipynb
cls_tokens.npy		cls_tokens.npy
data_info.md		data_info.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Query Classification for Dynamic Sparse-Dense Fusion

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

hannahz0/ERSP-QCSDF

Folders and files

Latest commit

History

Repository files navigation

Query Classification for Dynamic Sparse-Dense Fusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages