Canada Labour Research Assistant (CLaRA)
An LLM-powered assistant that directly quotes retrieved passages
Key Features • Quick start • Use Case & Portability • Telemetry & API Calls • Contributions • Acknowledgements
The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered research assistant that directly quotes sources to mitigate hallucinations and construct context-grounded answers to questions about Canadian labour laws, standards, and regulations. It can be run locally and without any Internet connection, thus guaranteeing the confidentiality of your conversations.
- one running on an Ollama serving backend, suitable for experimentation or low user numbers.
- one running on a vLLM serving backend, suitable for use cases requiring more scalability.
✅ Retrieval-Augmented Generation (RAG) to infuse context in each query.
✅ Chunking strategy to improve question answering.
✅ Metadata leveraging to improve question answering and make the information easily verifiable.
✅ Reranking to prioritize relevant sources when detecting a query mentioning legal provisions.
✅ Dynamic context window allocation to prevent source document chunks from getting truncated, and manage memory efficiently.
✅ Performance optimizations to reduce latency (database caching, tokenizer caching, response streaming).
✅ Locally Runs on CPU and/or consumer-grade GPUs for small and medium enterprises/organizations.
✅ Production-Ready for multiple scenarios with two builds offered out-of-the-box (Ollama or vLLM).
✅ Runs offline with no Internet connection required (see instructions further below).
✅ Guaranteed confidentiality as a result of local-and-offline runtime mode.
✅ Minimalist set of base dependencies for more portability and resilience (see pyproject.toml).
✅ Bring-Your-Own-Model with Ollama (or supported pre-trained models) and vLLM (or supported pre-trained models here).
✅ Bring-Your-Own-Inference-Provider and easily switch between two inference modes (local vs. remote) in the UI.
✅ RAG-enabled conversation history that includes previous document chunks for deeper context and research.
✅ UI Databases Dropdown to easily swap between databases on-the-fly.
✅ On-the-Fly LoRA Adapters for your Fine-Tuned Models. With vLLM, simply pass the path to your fine-tuned LoRA adapter.
✅ 3 runtime modes: normal, evaluation (to assess the LLM answers), or profiling (to track performance).
✅ Evaluation mode (still in early development) allows to measure the quality of responses generated.
✅ Profiling mode provides analytics to measure the impact of each function call, and component added/subtracted from the architecture.
✅ Streamlined Installation process in one easy step (*Ollama build only; we've streamlined the installation of the vLLM build nonetheless, see quick start - build #2 below).
How to set up this system for 100% local-and-off-the-Internet inference
Because models require tokenizers, and because the open source models we use both for the embedding of documents and for LLM inference are stored on Hugging Face, the methods and functions coming with libraries like sentence-transformers
are on the first call pulling the models and tokenizers, then saving a copy in a cache to increase future performance (see e.g. the definition of the SentenceTransformer
class).
Once you have downloaded the models, you can still use these libraries locally, without the need for any Internet connection. The same can be done with the LLM's tokenizer in order to avoid making external calls unnecessarily (this is what we've done with the .tokenizers folder)
For LLM inference, you can do the same thing and download the LLM model, store its main files, then run the system completely offline.
Build #1 - Ollama local server & (optional) remote server
Ensure you have Ollama installed and a bash terminal available. Then, clone this repo and cd into the new directory:
git clone https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant.git
cd canada-labour-research-assistant
Run ./full_install_pipeline_ollama.sh
.
If you prefer to do it one step at a time:
Install the virtual environment by running the following command in your bash terminal:
./setup/ollama_build/install_venv.sh
Make sure your virtual environment is activated. Then, create the database by running the following command in a terminal:
./setup/create_or_update_database.sh
You are now ready to launch the application with:
./run_app_ollama.sh
You can now enter the mode of your choice in the console to run the application.
The default mode to enter in the console is 'local', i.e. local mode. It will run and use your machine to run the application, thereby protecting your privacy and data.
Should you want to use remote mode and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that the privacy of your conversations will not be guaranteed anymore if you do so.
To enable remote mode, simply add the necessary credentials in .streamlit/secrets.toml
, following the format below:
authorization = "<api_key>"
api_url = "<api_url>"
Then, enter 'remote' in the console when launching the app. Streamlit will pick up those credentials and use them to call the API you chose as your third-party inference provider. You can always switch back to local mode later on through the UI if need.
Build #2 - vLLM local server & (optional) remote server
For Windows users: install WSL2 to have a Linux kernel.
Then, install the drivers as appropriate to run GPU paravirtualization on WSL-Ubuntu.
If you intend to use LoRA adapters, install jq
by running sudo apt-get install jq
.
Run ./full_install_pipeline_vllm.sh
.
If you prefer to do it one step at a time:
Install the virtual environment by running:
source ./setup/vllm_build/install_venv.sh
Activate your virtual environemnt with source .venv/bin/activate
, then run:
source ./setup/create_or_update_database.sh
Launch the application with:
source ./run_app_vllm.sh
By default, local mode will run and use your machine to run the application, thereby protecting your privacy and data.
Please note: while running on WSL, vLLM sometimes has trouble releasing memory once you shutdown or close your terminal. To make sure your memory is released, run wsl --shutdown
in another terminal.
Should you want to use remote mode and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that the privacy of your conversations will not be guaranteed anymore if you do so.
To enable remote mode, simply add the necessary credentials in .streamlit/secrets.toml
, following the format below:
authorization = "<api_key>"
api_url = "<api_url>"
Once this setup completed, you will be able to switch to remote mode via the UI.
The application can be customized for your own use case by creating new databases. To add a new database:
- Create a JSON configuration file in the
collections/
folder - Update
VectorDBDataFiles.included_databases
indb_config.py
to include your database - Run the database creation script - your database will be automatically created and included in the app
See below for detailed instructions.
Refer to collections/example.json for a template config file, or collections/example_with_comments.txt for a detailed commented example with path format explanations.
- Create or edit database configuration files in the
collections/
folder. Each database is defined by a JSON file in this directory. - Configure database metadata in your JSON file:
name
: The database identifieris_default
: Set totrue
to make this database the default selection in the UIsave_html
: Set totrue
to save HTML content locallylanguages
: List of language codes (e.g.,["en", "fr"]
) that your database supportsressource_name
: Dictionary mapping language codes to display names for the UI (e.g.,{"en": "Labour", "fr": "Travail"}
)
- Add your data sources using these supported formats, organized by language:
- Web pages: Add URLs under the
"page"
key. Each web page entry is an array with format["NAME", "URL", depth]
where:depth = 0
: Extract only the page itselfdepth = 1
: Extract the page and all links within itdepth = 2
: Extract the page, all links within it, and links within those links (2 levels deep)- Maximum depth limit is 2
- Legal pages: Add law URLs under the
"law"
key as arrays with format["name", "URL"]
- IPG pages: Add IPG URLs under the
"ipg"
key, organized by language - PDF files: Add URLs or local file/folder paths under the
"pdf"
key, organized by language. Local paths can be anywhere on your computer using OS-appropriate formats. - Page blacklist: Add URLs to exclude under the
"page_blacklist"
key, organized by language - Note: Data sources must be organized by language codes (e.g.,
"en"
,"fr"
). You can support one or more languages per database
- Web pages: Add URLs under the
- Add your database to the application by updating
VectorDBDataFiles.included_databases
indb_config.py
. Add your database name (must match the"name"
field in your JSON file) to the list. For example:included_databases = ["labour", "equity", "transport", "your_new_database"]
- External PDFs: Direct URLs to PDF files
- Local PDFs: Absolute file paths to local PDFs anywhere on your computer (supports folder paths to include all PDFs in a directory). Use OS-appropriate path formats (e.g.,
C:/Documents/file.pdf
on Windows,/home/user/Documents/file.pdf
on Linux/Mac) - Web pages: URLs to web content (supports blacklisting specific pages)
Important: PDF files can be located anywhere on your computer (including the application folder, but avoid the
static/
folder as it's managed automatically). Just specify the paths, the database script will automatically import and process them.
Once you have created your JSON configuration file and updated VectorDBDataFiles.included_databases
, run the database creation script:
./setup/create_or_update_database.sh
This script will automatically:
- Process all databases listed in
VectorDBDataFiles.included_databases
- Extract content from all configured sources in your JSON files
- Create vector databases for RAG (Retrieval-Augmented Generation)
- PDF files are automatically downloaded to the
static/
folder for offline access - Static files are accessible via
app/static/...
URLs within the application - Note: Removing a database JSON file doesn't delete its files from the
static/
folder
The solution is designed so you can easily verify the information used by the LLM to construct its responses. To do so, 'direct quotations' mode will format and highlight relevant passages taken from the sources. You can click on these passages to directly go to the source and validate the information.
Using the current configuration of for webcrawling, you can create two distinct databases and swap between each of them in the UI. Each database includes the following documents:
Labour Database:
- Canada Labour Code (CLC)
- Canada Labour Standards and Regulations (CLSR)
- Interpretations, Policies, and Guidelines (IPGs)
- Canada webpages on topics covering: labour standards, occupational health and safety, etc.
Equity Database:
- Workplace equity, etc.
Transport Database:
- Acts and regulations related to transport.
In an effort to ensure the highest standards of privacy protection, we have tested and confirmed that the system works offline, without any required Internet connection, thus guaranteeing your conversations remain private.
In addition, we have researched and taken the following measures:
- ChromaDB allows to disable telemetry, and we've done just that by following the instructions here.
- Ollama does not have any telemetry. See this explainer.
- Streamlit allows to disable telemetry, and we've done just that by turning
gatherUsageStats
to 'false'. See this explainer. - vLLM allows opting out from telemetry using the
DO_NOT_TRACK
environment variable, and we've done just that. See the doc - Hugging Face allows disabling calls to its website via the
HF_HUB_OFFLINE
environment variable, and we've done just that. See this PR
See the open issues for a full list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE
for more information.
Special thanks to Hadi Hojjati @hhojjati98 for the stimulating discussions, brainstormings, and general advice. Both of us appreciated those.
We would like to thank everyone who participates in conducting open research as well as sharing knowledge and code.
In particular, we are grateful to the creators and contributors who made it possible to build CLaRA:
- Webcrawling and html processing: Beautiful Soup
- PDF files content extraction: PyMuPDF
- GPU Paravirtualization: NVIDIA
- Llama3.2-Instruct model: Meta
- Vector database: Chroma
- LLM inference serving: Ollama & vLLM
- Embedding models: SentenceTransformers and Hugging Face
- User Interface: Streamlit
We are grateful to, and would like to acknowledge the AI research community. In particular, we drew ideas and inspiration from the following papers, articles, and conference:
Bengio, Yoshua. "Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?". Presentation given at the World Summit AI Canada on April 16 (2025).
Bengio, Yoshua, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann et al. "Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?." arXiv preprint arXiv:2502.15657 (2025).
He, Jia, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X. Wang, and Sadid Hasan. "Does Prompt Formatting Have Any Impact on LLM Performance?." Online. https://arxiv.org/abs/2411.10541 arXiv:2411.10541 (2024).
Laban, Philippe, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. "SummaC: Re-visiting NLI-based models for inconsistency detection in summarization." Transactions of the Association for Computational Linguistics 10 (2022): 163-177. Arxiv: https://arxiv.org/abs/2111.09525. Repository: https://github.com/tingofurro/summac
Lin, Chin-Yew, and Franz Josef Och. "Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics." In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04), pp. 605-612. https://aclanthology.org/P04-1077.pdf. 2004.
Wikipedia. "ROUGE (metric)." Online. https://en.wikipedia.org/wiki/ROUGE_(metric). 2023.
Wikipedia. "Longest common subsequence". Online. https://en.wikipedia.org/wiki/Longest_common_subsequence. 2025.
Yeung, Matt. "Deterministic Quoting: Making LLMs Safer for Healthcare." Online. https://mattyyeung.github.io/deterministic-quoting (2024).
If you draw inspiration or use this solution, please cite the following work:
@misc{clara-2025,
author = {Bonin, Pierre-Olivier, and Allard, Marc-André},
title = {Canada Labour Research Assistant (CLaRA)},
howpublished = {\url{https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant}},
year = {2025},
}