Query Reformulation model for selecting keywords that provide more precision on fetching relevant documents that trained in the manner of reinforcement learning.
Jeopardy! TV Show
https://www.kaggle.com/tunguz/200000-jeopardy-questions
TREC - Complex Answer Retrieval (TREC-CAR)
-
index_preprocess.py : Index query-title-documents
-
article.py : Wikipedia article and QTA indexer classes
-
query.py : Query class and query manager
-
- search.py : Search engine class
- rank_bm25.py : BM25 implementation
-
model :
- embedding.py : Word embedding class
- evaluate.py : Precision/Recall/NDCG evaluation
- preprocess.py : Preprocess for neural network
- query_reformulation.py : Query reformulation model
- train.py : Train model
- util.py : Utils such as get batch data, recreate query, reward
- Indexed articles and queries
- Word embedding matrix and word tokenizer
- Search engine
https://drive.google.com/open?id=1xoquzwTFES00TFWYKkQ6KLhm7wlTGJtu
- CNN, LSTM, BiLSTM and retrained CNN models
https://drive.google.com/open?id=1CT1HGvBhXMiTLeeZ6J6isxghHMylYBeM
-
Index Dataset
- Change paths in the index_preprocess.py file and run with 'initial_run' true
-
Train Model
-
Evaluate
- Set search engine path in search.py
- Set path of the trained model in evaluate.py
- Set dataset path in evaluate.py and run
You can start any step if you have the required files.