Canada Labour Research Assistant (CLaRA)
_{An LLM-powered assistant that directly quotes retrieved passages}

Key Features • Quick start • Use Case & Portability • Telemetry & API Calls • Contributions • Acknowledgements

The Canada Labour Research Assistant (CLaRA) is a privacy-first LLM-powered research assistant that directly quotes sources to mitigate hallucinations and construct context-grounded answers to questions about Canadian labour laws, standards, and regulations. It can be run locally and without any Internet connection, thus guaranteeing the confidentiality of your conversations.

Preview (click to expand)

CLaRA comes in two builds

one running on an Ollama serving backend, suitable for experimentation or low user numbers.
one running on a vLLM serving backend, suitable for use cases requiring more scalability.

Key Features

✅ Retrieval-Augmented Generation (RAG) to infuse context in each query.
✅ Chunking strategy to improve question answering.
✅ Metadata leveraging to improve question answering and make the information easily verifiable.
✅ Reranking to prioritize relevant sources when detecting a query mentioning legal provisions.
✅ Dynamic context window allocation to prevent source document chunks from getting truncated, and manage memory efficiently.
✅ Performance optimizations to reduce latency (database caching, tokenizer caching, response streaming).
✅ Locally Runs on CPU and/or consumer-grade GPUs for small and medium enterprises/organizations.
✅ Production-Ready for multiple scenarios with two builds offered out-of-the-box (Ollama or vLLM).
✅ Runs offline with no Internet connection required (see instructions further below).
✅ Guaranteed confidentiality as a result of local-and-offline runtime mode.
✅ Minimalist set of base dependencies for more portability and resilience (see pyproject.toml).
✅ Bring-Your-Own-Model with Ollama (or supported pre-trained models) and vLLM (or supported pre-trained models here).
✅ Bring-Your-Own-Inference-Provider and easily switch between two inference modes (local vs. remote) in the UI.
✅ RAG-enabled conversation history that includes previous document chunks for deeper context and research. ✅ UI Databases Dropdown to easily swap between databases on-the-fly.
✅ On-the-Fly LoRA Adapters for your Fine-Tuned Models. With vLLM, simply pass the path to your fine-tuned LoRA adapter.
✅ 3 runtime modes: normal, evaluation (to assess the LLM answers), or profiling (to track performance).
✅ Evaluation mode (still in early development) allows to measure the quality of responses generated.
✅ Profiling mode provides analytics to measure the impact of each function call, and component added/subtracted from the architecture.
✅ Streamlined Installation process in one easy step (*Ollama build only; we've streamlined the installation of the vLLM build nonetheless, see quick start - build #2 below).

Quick Start

How to set up this system for 100% local-and-off-the-Internet inference

Because models require tokenizers, and because the open source models we use both for the embedding of documents and for LLM inference are stored on Hugging Face, the methods and functions coming with libraries like sentence-transformers are on the first call pulling the models and tokenizers, then saving a copy in a cache to increase future performance (see e.g. the definition of the SentenceTransformer class).

Once you have downloaded the models, you can still use these libraries locally, without the need for any Internet connection. The same can be done with the LLM's tokenizer in order to avoid making external calls unnecessarily (this is what we've done with the .tokenizers folder)

For LLM inference, you can do the same thing and download the LLM model, store its main files, then run the system completely offline.

Build #1 - Ollama local server & (optional) remote server

Preliminary Steps

Ensure you have Ollama installed and a bash terminal available. Then, clone this repo and cd into the new directory:

git clone https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant.git
cd canada-labour-research-assistant

All-in-one setup

Run ./full_install_pipeline_ollama.sh.

If you prefer to do it one step at a time:

Step 1

Install the virtual environment by running the following command in your bash terminal:

./setup/ollama_build/install_venv.sh

Step 2

Make sure your virtual environment is activated. Then, create the database by running the following command in a terminal:

./setup/create_or_update_database.sh

Step 3

You are now ready to launch the application with:

./run_app_ollama.sh

You can now enter the mode of your choice in the console to run the application.

The default mode to enter in the console is 'local', i.e. local mode. It will run and use your machine to run the application, thereby protecting your privacy and data.

Should you want to use remote mode and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that the privacy of your conversations will not be guaranteed anymore if you do so.

To enable remote mode, simply add the necessary credentials in .streamlit/secrets.toml, following the format below:

authorization = "<api_key>"
api_url = "<api_url>"

Then, enter 'remote' in the console when launching the app. Streamlit will pick up those credentials and use them to call the API you chose as your third-party inference provider. You can always switch back to local mode later on through the UI if need.

Build #2 - vLLM local server & (optional) remote server

Preliminary Step

For Windows users: install WSL2 to have a Linux kernel.

Then, install the drivers as appropriate to run GPU paravirtualization on WSL-Ubuntu.

If you intend to use LoRA adapters, install jq by running sudo apt-get install jq.

All-in-one setup

Run ./full_install_pipeline_vllm.sh.

If you prefer to do it one step at a time:

Step #1

Install the virtual environment by running:

source ./setup/vllm_build/install_venv.sh

Step #2

Activate your virtual environemnt with source .venv/bin/activate, then run:

source ./setup/create_or_update_database.sh

Step #3

Launch the application with:

source ./run_app_vllm.sh

By default, local mode will run and use your machine to run the application, thereby protecting your privacy and data.

Please note: while running on WSL, vLLM sometimes has trouble releasing memory once you shutdown or close your terminal. To make sure your memory is released, run wsl --shutdown in another terminal.

Should you want to use remote mode and take advantage of third party compute for larger models and workloads, it is possible to do so, and to switch between each mode on-the-fly through the UI's toggle button. Please note that the privacy of your conversations will not be guaranteed anymore if you do so.

To enable remote mode, simply add the necessary credentials in .streamlit/secrets.toml, following the format below:

authorization = "<api_key>"
api_url = "<api_url>"

Once this setup completed, you will be able to switch to remote mode via the UI.

Database Creation Explained & How to Create Your Own Knowledge Base

The application can be customized for your own use case by creating new databases. To add a new database:

Create a JSON configuration file in the collections/ folder
Update VectorDBDataFiles.included_databases in db_config.py to include your database
Run the database creation script - your database will be automatically created and included in the app

See below for detailed instructions.

Refer to collections/example.json for a template config file, or collections/example_with_comments.txt for a detailed commented example with path format explanations.

Configuration

Create or edit database configuration files in the collections/ folder. Each database is defined by a JSON file in this directory.
Configure database metadata in your JSON file:
- name: The database identifier
- is_default: Set to true to make this database the default selection in the UI
- save_html: Set to true to save HTML content locally
- languages: List of language codes (e.g., ["en", "fr"]) that your database supports
- ressource_name: Dictionary mapping language codes to display names for the UI (e.g., {"en": "Labour", "fr": "Travail"})
Add your data sources using these supported formats, organized by language:
- Web pages: Add URLs under the "page" key. Each web page entry is an array with format ["NAME", "URL", depth] where:
  - depth = 0: Extract only the page itself
  - depth = 1: Extract the page and all links within it
  - depth = 2: Extract the page, all links within it, and links within those links (2 levels deep)
  - Maximum depth limit is 2
- Legal pages: Add law URLs under the "law" key as arrays with format ["name", "URL"]
- IPG pages: Add IPG URLs under the "ipg" key, organized by language
- PDF files: Add URLs or local file/folder paths under the "pdf" key, organized by language. Local paths can be anywhere on your computer using OS-appropriate formats.
- Page blacklist: Add URLs to exclude under the "page_blacklist" key, organized by language
- Note: Data sources must be organized by language codes (e.g., "en", "fr"). You can support one or more languages per database
Add your database to the application by updating VectorDBDataFiles.included_databases in db_config.py. Add your database name (must match the "name" field in your JSON file) to the list. For example:
```
included_databases = ["labour", "equity", "transport", "your_new_database"]
```

Supported Data Sources

External PDFs: Direct URLs to PDF files
Local PDFs: Absolute file paths to local PDFs anywhere on your computer (supports folder paths to include all PDFs in a directory). Use OS-appropriate path formats (e.g., C:/Documents/file.pdf on Windows, /home/user/Documents/file.pdf on Linux/Mac)
Web pages: URLs to web content (supports blacklisting specific pages)

Important: PDF files can be located anywhere on your computer (including the application folder, but avoid the static/ folder as it's managed automatically). Just specify the paths, the database script will automatically import and process them.

Building Your Database

Once you have created your JSON configuration file and updated VectorDBDataFiles.included_databases, run the database creation script:

./setup/create_or_update_database.sh

This script will automatically:

Process all databases listed in VectorDBDataFiles.included_databases
Extract content from all configured sources in your JSON files
Create vector databases for RAG (Retrieval-Augmented Generation)

File Management

PDF files are automatically downloaded to the static/ folder for offline access
Static files are accessible via app/static/... URLs within the application
Note: Removing a database JSON file doesn't delete its files from the static/ folder

Use Case and Portability

The solution is designed so you can easily verify the information used by the LLM to construct its responses. To do so, 'direct quotations' mode will format and highlight relevant passages taken from the sources. You can click on these passages to directly go to the source and validate the information.

Using the current configuration of for webcrawling, you can create two distinct databases and swap between each of them in the UI. Each database includes the following documents:

Labour Database:

Canada Labour Code (CLC)
Canada Labour Standards and Regulations (CLSR)
Interpretations, Policies, and Guidelines (IPGs)
Canada webpages on topics covering: labour standards, occupational health and safety, etc.

Equity Database:

Workplace equity, etc.

Transport Database:

Acts and regulations related to transport.

Telemetry and API Calls

In an effort to ensure the highest standards of privacy protection, we have tested and confirmed that the system works offline, without any required Internet connection, thus guaranteeing your conversations remain private.

In addition, we have researched and taken the following measures:

ChromaDB allows to disable telemetry, and we've done just that by following the instructions here.
Ollama does not have any telemetry. See this explainer.
Streamlit allows to disable telemetry, and we've done just that by turning gatherUsageStats to 'false'. See this explainer.
vLLM allows opting out from telemetry using the DO_NOT_TRACK environment variable, and we've done just that. See the doc
Hugging Face allows disabling calls to its website via the HF_HUB_OFFLINE environment variable, and we've done just that. See this PR

Roadmap

See the open issues for a full list of proposed features (and known issues).

Contributions

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgements

Special thanks to Hadi Hojjati @hhojjati98 for the stimulating discussions, brainstormings, and general advice. Both of us appreciated those.

We would like to thank everyone who participates in conducting open research as well as sharing knowledge and code.

In particular, we are grateful to the creators and contributors who made it possible to build CLaRA:

Webcrawling & Preprocessing

Webcrawling and html processing: Beautiful Soup
PDF files content extraction: PyMuPDF

Backend

GPU Paravirtualization: NVIDIA
Llama3.2-Instruct model: Meta
Vector database: Chroma
LLM inference serving: Ollama & vLLM
Embedding models: SentenceTransformers and Hugging Face

Frontend

User Interface: Streamlit

References

We are grateful to, and would like to acknowledge the AI research community. In particular, we drew ideas and inspiration from the following papers, articles, and conference:

Bengio, Yoshua. "Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?". Presentation given at the World Summit AI Canada on April 16 (2025).

Bengio, Yoshua, Michael Cohen, Damiano Fornasiere, Joumana Ghosn, Pietro Greiner, Matt MacDermott, Sören Mindermann et al. "Superintelligent agents pose catastrophic risks: Can scientist ai offer a safer path?." arXiv preprint arXiv:2502.15657 (2025).

He, Jia, Mukund Rungta, David Koleczek, Arshdeep Sekhon, Franklin X. Wang, and Sadid Hasan. "Does Prompt Formatting Have Any Impact on LLM Performance?." Online. https://arxiv.org/abs/2411.10541 arXiv:2411.10541 (2024).

Laban, Philippe, Tobias Schnabel, Paul N. Bennett, and Marti A. Hearst. "SummaC: Re-visiting NLI-based models for inconsistency detection in summarization." Transactions of the Association for Computational Linguistics 10 (2022): 163-177. Arxiv: https://arxiv.org/abs/2111.09525. Repository: https://github.com/tingofurro/summac

Lin, Chin-Yew, and Franz Josef Och. "Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics." In Proceedings of the 42nd annual meeting of the association for computational linguistics (ACL-04), pp. 605-612. https://aclanthology.org/P04-1077.pdf. 2004.

Wikipedia. "ROUGE (metric)." Online. https://en.wikipedia.org/wiki/ROUGE_(metric). 2023.

Wikipedia. "Longest common subsequence". Online. https://en.wikipedia.org/wiki/Longest_common_subsequence. 2025.

Yeung, Matt. "Deterministic Quoting: Making LLMs Safer for Healthcare." Online. https://mattyyeung.github.io/deterministic-quoting (2024).

Citation

If you draw inspiration or use this solution, please cite the following work:

@misc{clara-2025,
  author       = {Bonin, Pierre-Olivier, and Allard, Marc-André},
  title        = {Canada Labour Research Assistant (CLaRA)},
  howpublished = {\url{https://github.com/pierreolivierbonin/Canada-Labour-Research-Assistant}},
  year         = {2025},
}

Contact

Pierre-Olivier Bonin, Marc-André Allard

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
.assets		.assets
.debugging		.debugging
.evaluation		.evaluation
.models		.models
.profiling		.profiling
.setup_vllm		.setup_vllm
.streamlit		.streamlit
.tokenizers		.tokenizers
collections		collections
resources		resources
scripts		scripts
setup		setup
src		src
styles		styles
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
chat_template_llama3.2_json.jinja		chat_template_llama3.2_json.jinja
chatbot_app.py		chatbot_app.py
config.py		config.py
config.yaml		config.yaml
db_config.py		db_config.py
full_install_pipeline_ollama.sh		full_install_pipeline_ollama.sh
full_install_pipeline_vllm.sh		full_install_pipeline_vllm.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_app_ollama.sh		run_app_ollama.sh
run_app_vllm.sh		run_app_vllm.sh

License

pierreolivierbonin/Canada-Labour-Research-Assistant

Folders and files

Latest commit

History

Repository files navigation

Canada Labour Research Assistant (CLaRA) An LLM-powered assistant that directly quotes retrieved passages

CLaRA comes in two builds

Key Features

Quick Start

Preliminary Steps

All-in-one setup

Step 1

Step 2

Step 3

Preliminary Step

All-in-one setup

Step #1

Step #2

Step #3

Database Creation Explained & How to Create Your Own Knowledge Base

Configuration

Supported Data Sources

Building Your Database

File Management

Use Case and Portability

Telemetry and API Calls

Roadmap

Contributions

License

Acknowledgements

Webcrawling & Preprocessing

Backend

Frontend

References

Citation

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 3

Uh oh!

Languages

Canada Labour Research Assistant (CLaRA)
_{An LLM-powered assistant that directly quotes retrieved passages}