NLU-Evidence Detection

This repository contains implementations of two different approaches for Natural Language Understanding (NLU) Evidence Detection:

Co-Attention Siamese Deep Learning Model
ModernBERT SBERT Dual Embedding Model

Problem Statement

Given a claim and a piece of evidence, determine if the evidence is relevant to the claim.

Repository Structure

NLU-EvidenceDetection/
├── training_data/                           # Training data for all models
├── CoAttentionSiameseDeepLearning.ipynb     # Notebook for Siamese model training
├── CoAttentionSiameseDeepLearning.keras     # Saved Siamese model
├── CoAttentionSiameseEvaluation.ipynb       # Evaluation notebook for Siamese model
├── CoAttentionSiameseInference.ipynb        # Inference notebook for Siamese model
├── CoAttentionSiamese_model_card.md         # Model card with details for Siamese model
├── Group_7_B.csv                            # Test data predictions for non-transformer model
├── Group_7_C.csv                            # Test data predictions for transformer model
├── README.md                                # This file
├── modernbert_sbert_dual_embedding_model_card.md    # Model card for ModernBERT+SBERT model
├── modernbert_sbert_embeddings.ipynb        # ModernBERT+SBERT embeddings training notebook
├── modernbert_sbert_embeddings_evaluation.ipynb     # ModernBERT+SBERT model evaluation
├── modernbert_sbert_embeddings_inference.ipynb      # ModernBERT+SBERT model inference
└── poster.pdf                               # Poster for information regarding both models and results

Running the Notebooks

Each notebook in this repository has its own dependency installation cells at the beginning that will install the specific packages needed for that particular notebook. This makes it easier to run individual notebooks without needing to install all dependencies for the entire repository without having any conflicting dependencies.

If running on a local machine, it is recommended to first create a virtual environment before running the notebook as to not interfere with global dependencies.

Example of creating a virtual environment in the current directory:

python -m venv .venv

When running the notebooks:

Execute the dependency installation cells first
The notebooks should be self-contained with all necessary code and instructions

Model 1: Co-Attention Siamese Deep Learning Model

Overview

This model utilizes a Siamese neural network architecture with co-attention mechanisms to detect evidence in text. The model takes pairs of text (claim and potential evidence) and predicts whether the second text provides evidence for the first.

Model Architecture

Siamese Network: Dual-path network that processes two texts separately before comparing them
Multi-Headed Co-Attention Mechanism: Allows the claims to attend to the evidence, and vice versa
Embedding Process:
- Text inputs first pre-processed by removing '[ref]', '[ref' and 'ref]'
- Every text encoded by Sentence BERT into a 384 dimensional vector
- Siamese encoder shared by both claim and evidence embeddings to produce task-relevant embeddings

How to Run

Setup Environment:

pip install tensorflow==2.17.0 numpy==1.26.4 pandas==2.2.3 scikit-learn==1.5.2 sentence-transformers==3.4.1 regex==2024.9.11 optuna==4.1.0

Training:
- Open CoAttentionSiameseDeepLearning.ipynb in Jupyter Notebook or Google Colab
- Run the dependency installation cells at the beginning of the notebook
- Follow the instructions to load training data and train the model
- The trained model will be saved as .keras file
Evaluation:
- Use CoAttentionSiameseEvaluation.ipynb to evaluate model performance
- The notebook includes confusion matrix visualization and performance metrics
Inference:
- Use CoAttentionSiameseInference.ipynb for making predictions on new data

Performance

The model achieved the following metrics on the development set:

Accuracy: 84.48%
Macro F1-Score: 80.12%
Macro Precision: 80.97%
Macro Recall: 79.40%
Weighted F1-Score: 84.27%

The model was trained for 23 epochs with optimized hyperparameters determined through Bayesian optimization.

Model 2: ModernBERT + SBERT Dual Embedding Model

Overview

This model leverages pre-trained BERT models (specifically Sentence-BERT and ModernBERT) to create embeddings for text pairs and then determines evidence relationships using these embeddings.

Model Architecture

Dual Embedding Approach: Combines contextualized embeddings from ModernBERT-base with sentence embeddings from SBERT (all-MiniLM-L6-v2)
Text Processing:
- Removes reference tags
- Normalizes accented characters using unidecode
- Cleans irregular spacing around punctuation
- Normalizes whitespace
Training Approach:
- Fine-tuned using QLoRA (Quantized Low-Rank Adaptation) with 4-bit quantization and flash-attention for efficiency
- Uses both synonym replacement and class weights to address class imbalance

How to Run

Setup Environment:

pip install torch==2.6.0+cu126 transformers peft bitsandbytes flash-attn sentence-transformers sklearn numpy pandas unidecode

Training/Fine-tuning:
- Open modernbert_sbert_embeddings.ipynb in Jupyter Notebook or Google Colab
- Run the dependency installation cells at the beginning of the notebook
- Follow the notebook to load data and fine-tune the model
Evaluation:
- Use modernbert_sbert_embeddings_evaluation.ipynb to evaluate model performance
Inference:
- Use modernbert_sbert_embeddings_inference.ipynb for making predictions on new data

Performance

The model achieved the following metrics on the development set:

Accuracy: 87.38%
Macro F1-Score: 84.79%
Macro Precision: 83.76%
Macro Recall: 86.14%
Weighted F1-Score: 87.59%
Matthews Correlation Coefficient: 0.6986

The model uses an optimal threshold of 0.5433 determined through validation data to convert probabilities to binary predictions.

Data Sources and Attribution

Training Data

The training data is located in the training_data/ directory
The dataset consists of 30K pairs of texts drawn from emails, news articles, and blog posts
For the Siamese model, 21K pairs were used for training and 6K for validation with class weighting to handle imbalance (72% negative samples)
For the ModernBERT model, both class weighting and data augmentation (synonym replacement for the positive class) were applied to address class imbalance

Pre-trained Models

Both models utilize Sentence-BERT embeddings from all-MiniLM-L6-v2
The ModernBERT implementation also uses the ModernBERT-base model from HuggingFace

Cloud-stored Models

The weights of the models are stored as follows:

The ModernBERT implementation is stored on HuggingFace:
- DualEncoderModernBERT: https://huggingface.co/ddosdub/DualEncoderModernBERT
The Co-Attention Siamese model is stored in the GitHub repository itself: https://github.com/chuongg3/NLU-EvidenceDetection

Development Notes

These tools and techniques were instrumental in achieving the reported performance metrics while maintaining efficient development workflows:

Hyperparameter Optimization: Bayesian optimization with Optuna was used to fine-tune model parameters with Trial Pruning (automated early stopping).
Class Imbalance Handling: The original training dataset had class imbalance, which was addressed through class weighting and data augmentation techniques.

Use of Generative AI Tools

Claude and other generative AI models were used for debugging and document proof-reading and improvements.

Citation

If you use this code or models in your research, please cite:

@misc{NLU-EvidenceDetection,
  author = {Tuan Chuong Goh and Dhruv Sharma},
  title = {NLU-Evidence Detection},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/chuongg3/NLU-EvidenceDetection}}
}

For the Co-Attention mechanism implementation, please also cite:

@inproceedings{lu2016hierarchical,
  title={Hierarchical question-image co-attention for visual question answering},
  author={Lu, Jiasen and Yang, Jianwei and Batra, Dhruv and Parikh, Devi},
  booktitle={Advances in neural information processing systems},
  pages={289--297},
  year={2016},
  url={https://proceedings.neurips.cc/paper_files/paper/2016/file/9dcb88e0137649590b755372b040afad-Paper.pdf}
}

For the ModernBERT model, please cite:

@misc{modernbert,
  title={Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference}, 
  author={Benjamin Warner and Antoine Chaffin and Benjamin Clavié and Orion Weller and Oskar Hallström and Said Taghadouini and Alexis Gallagher and Raja Biswas and Faisal Ladhak and Tom Aarsen and Nathan Cooper and Griffin Adams and Jeremy Howard and Iacopo Poli},
  year={2024},
  eprint={2412.13663},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2412.13663}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NLU-Evidence Detection

Problem Statement

Repository Structure

Running the Notebooks

Model 1: Co-Attention Siamese Deep Learning Model

Overview

Model Architecture

How to Run

Performance

Model 2: ModernBERT + SBERT Dual Embedding Model

Overview

Model Architecture

How to Run

Performance

Data Sources and Attribution

Training Data

Pre-trained Models

Cloud-stored Models

Development Notes

Use of Generative AI Tools

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
training_data		training_data
CoAttentionSiameseDeepLearning.ipynb		CoAttentionSiameseDeepLearning.ipynb
CoAttentionSiameseDeepLearning.keras		CoAttentionSiameseDeepLearning.keras
CoAttentionSiameseEvaluation.ipynb		CoAttentionSiameseEvaluation.ipynb
CoAttentionSiameseInference.ipynb		CoAttentionSiameseInference.ipynb
CoAttentionSiamese_model_card.md		CoAttentionSiamese_model_card.md
Group_7_B.csv		Group_7_B.csv
Group_7_C.csv		Group_7_C.csv
README.md		README.md
modernbert_sbert_dual_embedding_model_card.md		modernbert_sbert_dual_embedding_model_card.md
modernbert_sbert_embeddings.ipynb		modernbert_sbert_embeddings.ipynb
modernbert_sbert_embeddings_evaluation.ipynb		modernbert_sbert_embeddings_evaluation.ipynb
modernbert_sbert_embeddings_inference.ipynb		modernbert_sbert_embeddings_inference.ipynb
poster.pdf		poster.pdf

chuongg3/NLU-EvidenceDetection

Folders and files

Latest commit

History

Repository files navigation

NLU-Evidence Detection

Problem Statement

Repository Structure

Running the Notebooks

Model 1: Co-Attention Siamese Deep Learning Model

Overview

Model Architecture

How to Run

Performance

Model 2: ModernBERT + SBERT Dual Embedding Model

Overview

Model Architecture

How to Run

Performance

Data Sources and Attribution

Training Data

Pre-trained Models

Cloud-stored Models

Development Notes

Use of Generative AI Tools

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages