CommonSense Reason Generation

In this repository we are looking at the task of generating reasons explaining why a statement is against common-sense.

I.e. given an input "He eats the submarine." the model should return something along the lines of "Submarines are not edible.".

We follow the challenge given in SemEval 2020 Task C.

Quickly Try Our Repo Out

We have produced a python notebook allowing for a quick look around some main features of this repo at JUSTers/colab_quick_start.ipynb.

Installation

git clone https://github.com/FredericOdermatt/NLP_commonsense

Inside NLP_commonsense install the submodules KaLM (our fork) and fairseq

git submodule update --init

To have clean environments we use conda, install miniconda from the official website

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod +x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh

restart shell to make conda work after conda init

Create an environment for the project

conda create -n nlp_env python=3.7.4
conda activate nlp_env

Install torch 1.4.0

conda install pytorch==1.4.0 torchvision==0.5.0 -c pytorch

Install fairseq which is a submodule of the cloned gitrepo.

pip install -e fairseq

Install other requirements

pip install -r requirements.txt
conda install --file requirements_conda.txt

To download the nltk extensions 'punkt' and 'wordnet' (for MeteorScore) execute the provided script.

chmod +x setup_nltk.sh
./setup_nltk.sh

JUSTers

Training on Google Colab - JUSTers

Google Colab has some GPUs that provide up to either 16 or even 25 Gb of GPU RAM. To train high batch-sizes we provide a python-notebook on google colab.

Open JUSTers/colab_quick_start.ipynb directly in google colab by clicking on this link. After training on colab you can download the trained model using rsync as described in the notebook.

Training on Leonhard - JUSTers

Before running this script on the GPU, you should execute it on CPU first. This will download all needed pretrained models for the scoring methods. This might take several minutes. This has to be done only once and the GPU can be used afterwards.

The following training script only considers the data without evidence. ./train.sh OUT_DIR_NAME 16 5 5.

bsub -o test.out -R "rusage[mem=12000,ngpus_excl_p=1]" -J train_Justers -W 4:00 ./train.sh ${SCRATCH}/JUSTers/first_try 16 5 5

Training Ad-JUSTers

The following training script considers the data including evidence from Wiktionary. The current input format of the sentences during training is: "additional evidence <|evidence|> false-statement <|continue|> training target". ./train.sh OUT_DIR_NAME 16 5 5.

bsub -o test.out -R "rusage[mem=12000,ngpus_excl_p=1]" -J train_Justers -W 4:00 ./train_with_evidence.sh PATH_TO_MODEL_FOLDER 16 5 5

$1 output directory
$2 batch_size (JUSTers: 64, however memory issue for cluster)
$3 per_gpu_train_batch_size (JUSTers: 5, however memory issue for cluster)
$4 num_train_epochs (JUSTers: 5)

To include additional evidence from Urban Dictionary change the commented section in the file finetune_envidence.py.

Generate Explanations - JUSTers

bsub -o test_gen.out -R "rusage[ngpus_excl_p=1,mem=12000]" -J JUSTers_generate -W 4:00 ./generate.sh PATH_TO_MODEL_FOLDER 5 1 0.9

$1 path to folder containing model.bin etc.
$2 k of TOP-K sampling (JUSTers: 50)
$3 temperature (JUSTers: 1)
$3 p (JUSTers: 0.9)

KaLM

Training - KaLM

bsub -o test.out -R "rusage[mem=8164,ngpus_excl_p=1]" -J first_test -W 4:00 <<< "NLP_commonsense/train_kalm.sh"

-o: name of output file (should end in .out)
-R: requirements for GPU
-J: job name, useful for overview and to use bpeek
-W: how much time is given to the job

Interactive Generation - KaLM

Execute the following locally (not on the cluster). It allows you to interactively submit input sentences to the trained model and see the output.

./evaluate_kalm.sh $SCRATCH/KaLM/trained_models/checkpoint1.pt
...
2020-11-25 18:21:19 | INFO | fairseq_cli.interactive | Type the input sentence and press return:
The submarine is delicious.
...
Output: There is no way to be eaten in the sky.

Evaluation

To evaluate a desired model with the implemented scores use the executable evaluate.sh . Provide the arguments ref_path and pred_path to the corresponding references and the predictions of your model. Further, set the bool for the desired metrics to be computed. Important to note is that MoverScore and BERTScore are only executable on GPU (as suggested in the command below). METEOR on the other hand is only executable on CPU. So its currently not possible to compute MoverScore together with METEOR in a single run. To combine all scores in a single .csv, first run the script with all metrics set to True besides METEOR. Then, run the script again, this time setting all scores to False besides METEOR.

bsub -o test.out -R "rusage[mem=12000,ngpus_excl_p=1]" -J evaluation_scores -W 4:00 ./evaluate.sh data100/references_complete.csv data100/kalm.csv

Visualization

First compute the automated scores of the generated outputs by executing running the above mentioned evaluate.sh script. To create the scatter plot matrices along the correlation coefficients execute the file visualize_scores.py. This file uses the above created .csv and outputs a .png file with the matrix. GPU execution is not necessary.

python Visualization/visualize_scores.py

Notes

bjobs: lists current jobs
bbjobs: lists current jobs, better overview
bjobs -d: lists jobs that have finished a short while ago
conda list: lists all installed packages in conda environment
bpeek -J JOBNAME: will output recent lines a job wrote on the GPU
An activated enviroment will automatically be picked up by the submission system.

Working with submodules

The submodules are their own git-repo. Any change inside KaLM should be added, commited and pushed first inside KaLM.
Then in a second step you can git add KaLM in the main folder and commit this change. To update submodules that were changed run

git submodule update

Name		Name	Last commit message	Last commit date
Latest commit History 157 Commits
Data		Data
JUSTers		JUSTers
KaLM @ 222b14e		KaLM @ 222b14e
Scoring		Scoring
Visualization		Visualization
data100		data100
fairseq @ 09a5d86		fairseq @ 09a5d86
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
evaluate.sh		evaluate.sh
evaluate_all.sh		evaluate_all.sh
evaluate_all_checkpoints.sh		evaluate_all_checkpoints.sh
evaluate_kalm.sh		evaluate_kalm.sh
requirements.txt		requirements.txt
requirements_conda.txt		requirements_conda.txt
scorer_debug.py		scorer_debug.py
setup_nltk.sh		setup_nltk.sh
train_kalm.sh		train_kalm.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CommonSense Reason Generation

Quickly Try Our Repo Out

Installation

JUSTers

Training on Google Colab - JUSTers

Training on Leonhard - JUSTers

Training Ad-JUSTers

Generate Explanations - JUSTers

KaLM

Training - KaLM

Interactive Generation - KaLM

Evaluation

Visualization

Notes

Working with submodules

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

FredericOdermatt/NLP_commonsense

Folders and files

Latest commit

History

Repository files navigation

CommonSense Reason Generation

Quickly Try Our Repo Out

Installation

JUSTers

Training on Google Colab - JUSTers

Training on Leonhard - JUSTers

Training Ad-JUSTers

Generate Explanations - JUSTers

KaLM

Training - KaLM

Interactive Generation - KaLM

Evaluation

Visualization

Notes

Working with submodules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages