This is a prototype for Document-Level Rare Language Detection using Lists of Keywords, with a focus on French-based Creoles.
The most important files are experiment.py, which has the overall pipeline logic, and indexing.py and scoring.py, which isolate the simple functions used to execute the main steps. Main.rs in the folder "rust_version" is a rewrite of the indexing portion in Rust (it runs 2~3x faster)
I will be updating the documentation in the coming days.