Pipeline-QG-QA

Repository of a pipeline between Question generation and Question answering using : spaCy, UniLMv1, Bert

Environment

Linux

The recommended way to run the code is using Linux (ubuntu 18.04)

Requirements

Python version > 3.5 installed
conda installed

To do

In a shell :

. .bashrc
apt-get update
apt-get install -y vim wget ssh

pip install --user tensorboardX six numpy tqdm path.py pandas scikit-learn lmdb pyarrow py-lz4framed methodtools py-rouge pyrouge nltk
python -c "import nltk; nltk.download('punkt')"
conda install pytorch torchvision cpuonly -c pytorch

Install the repo as a package:

mkdir ~/code; cd ~/code
git clone https://github.com/Pdesmarc/Pipeline-QG-QA.git
cd ~/code/Pipeline-QG-QA/src
pip install --user --editable .

Set up

Unilm QG Model

Please download a fine-tuned checkpoint of UniLM QG from here (The GDrive is Microsoft property).

Then (if you download the file in ~/Download)

mkdir ~/code/Pipeline-QG-QA/MODEL/
mv ~/Download/qg_model.bin ~/code/Pipeline-QG-QA/MODEL/

spaCY

pip install spacy
python -m spacy download en_core_web_sm

Bert finetuned QA

pip install transformers

How to run it

⚠️ Your input file should be a .txt file where each new line represents a new paragraph. Each paragraph/line must be composed less than 512 tokens. See bert_tokenizer requirement for more infomation. You can find an example file in the example folder named texte_brut.txt

cd ~/code/Pipeline-QG-QA
./first_scenario.sh argument1
# argument1 = /PATH/TO/YOUR/FILE/NAME_OF_THE_FILE.txt 
# example : ./scenario.sh ~/code/Pipeline-QG-QA/example/texte_brut.txt

The output will be a file named : resultat_final_scenario1.txt.

You also can find intermediate files at : script/tmp/ . Each file represents a step between two scripts :

result_Text_Answer.txt : intermediate file between spaCy script and Unilm script
questions_generated.txt : intermediate file between Unilm script and Bert script

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
example		example
script		script
src		src
storage		storage
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE.md		NOTICE.md
README.md		README.md
scenario.sh		scenario.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pipeline-QG-QA

Environment

Linux

Requirements

To do

Set up

Unilm QG Model

spaCY

Bert finetuned QA

How to run it

About

Uh oh!

Releases

Packages

Languages

License

etvincen/Pipeline-QG-QA

Folders and files

Latest commit

History

Repository files navigation

Pipeline-QG-QA

Environment

Linux

Requirements

To do

Set up

Unilm QG Model

spaCY

Bert finetuned QA

How to run it

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages