Skip to content

manveertamber/enhancing_domain_adaptation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CADET Embeddings

We present CADET, a framework for fine-tuning embedding models for retrieval on specific corpora using diverse synthetic queries and cross-encoder listwise distillation.

We will continue to refine this codebase. For questions or support, please reach out to mtamber@uwaterloo.ca.


Overview

Directories

  • encoding/
    Contains scripts to encode corpora and evaluate models.

  • query_generation/
    Includes scripts for generating synthetic queries.

  • reranker/
    Code for reranking.

  • training_scripts/
    Scripts for fine-tuning models.


If you use CADET, please cite the following paper:

  @article{tamber2025teaching,
    title={Teaching Dense Retrieval Models to Specialize with Listwise Distillation and LLM Data Augmentation},
    author={Tamber, Manveer Singh and Kazi, Suleman and Sourabh, Vivek and Lin, Jimmy},
    journal={arXiv:2502.19712},
    year={2025}
  }

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published