Repository of a pipeline between Question generation and Question answering using : spaCy, UniLMv1, Bert
The recommended way to run the code is using Linux (ubuntu 18.04)
- Python version > 3.5 installed
- conda installed
In a shell :
. .bashrc
apt-get update
apt-get install -y vim wget ssh
pip install --user tensorboardX six numpy tqdm path.py pandas scikit-learn lmdb pyarrow py-lz4framed methodtools py-rouge pyrouge nltk
python -c "import nltk; nltk.download('punkt')"
conda install pytorch torchvision cpuonly -c pytorch
Install the repo as a package:
mkdir ~/code; cd ~/code
git clone https://github.com/Pdesmarc/Pipeline-QG-QA.git
cd ~/code/Pipeline-QG-QA/src
pip install --user --editable .
Please download a fine-tuned checkpoint of UniLM QG from here (The GDrive is Microsoft property).
Then (if you download the file in ~/Download)
mkdir ~/code/Pipeline-QG-QA/MODEL/
mv ~/Download/qg_model.bin ~/code/Pipeline-QG-QA/MODEL/
pip install spacy
python -m spacy download en_core_web_sm
pip install transformers
cd ~/code/Pipeline-QG-QA
./first_scenario.sh argument1
# argument1 = /PATH/TO/YOUR/FILE/NAME_OF_THE_FILE.txt
# example : ./scenario.sh ~/code/Pipeline-QG-QA/example/texte_brut.txt
The output will be a file named : resultat_final_scenario1.txt.
You also can find intermediate files at : script/tmp/ . Each file represents a step between two scripts :
- result_Text_Answer.txt : intermediate file between spaCy script and Unilm script
- questions_generated.txt : intermediate file between Unilm script and Bert script