Query Reformulation by Keyword Selection

Query Reformulation model for selecting keywords that provide more precision on fetching relevant documents that trained in the manner of reinforcement learning.

Dataset

Jeopardy! TV Show

https://www.kaggle.com/tunguz/200000-jeopardy-questions

TREC - Complex Answer Retrieval (TREC-CAR)

http://trec-car.cs.unh.edu/

Files

index_preprocess.py : Index query-title-documents
article.py : Wikipedia article and QTA indexer classes
query.py : Query class and query manager
search_engine :
- search.py : Search engine class
- rank_bm25.py : BM25 implementation
model :
- embedding.py : Word embedding class
- evaluate.py : Precision/Recall/NDCG evaluation
- preprocess.py : Preprocess for neural network
- query_reformulation.py : Query reformulation model
- train.py : Train model
- util.py : Utils such as get batch data, recreate query, reward

Indexed Data

Indexed articles and queries
Word embedding matrix and word tokenizer
Search engine

https://drive.google.com/open?id=1xoquzwTFES00TFWYKkQ6KLhm7wlTGJtu

Trained Models

CNN, LSTM, BiLSTM and retrained CNN models

https://drive.google.com/open?id=1CT1HGvBhXMiTLeeZ6J6isxghHMylYBeM

Usage

Index Dataset
- Change paths in the index_preprocess.py file and run with 'initial_run' true
Train Model
- Set search engine path in search.py
- Set dataset path in train.py
- Set output path of the model in train.py, select model network as CNN, LSTM or BiLSTM and run
Evaluate
- Set search engine path in search.py
- Set path of the trained model in evaluate.py
- Set dataset path in evaluate.py and run

You can start any step if you have the required files.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
model		model
search_engine		search_engine
.gitignore		.gitignore
README.md		README.md
article.py		article.py
index_preprocess.py		index_preprocess.py
query.py		query.py
requiremetns.txt		requiremetns.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Query Reformulation by Keyword Selection

Dataset

Files

Indexed Data

Trained Models

Usage

About

Uh oh!

Releases

Packages

Languages

Hacettepe-University-CMP681-2020-Spring/ir-project-ir-term-project-omer-sahin

Folders and files

Latest commit

History

Repository files navigation

Query Reformulation by Keyword Selection

Dataset

Files

Indexed Data

Trained Models

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages