This repository contains all code, experiments, and documentation related to the master thesis of Tomáš Mlynář. The project focuses on the adaptation large language models (LLMs), their training, evaluation, and benchmarking, with a particular emphasis on Czech language resources and evaluation frameworks.
All published models and datasets are available on Hugging Face Hub
datasets_creation/ # Scripts and notebooks for dataset creation and preprocessing
evaluation/ # Evaluation scripts, benchmarks, and analysis - notebooks
scripts/ # Shell scripts for running experiments and evaluations
training/ # Training scripts and notebooks (pretraining, finetuning, NLI, etc.)
-
Clone the repository:
git clone https://gitlab.fel.cvut.cz/factchecking/master-thesis-repository-tomas-mlynar.git cd master-thesis-repository-tomas-mlynar
-
(Recommended) Create and activate a Python virtual environments (there are 3 requirements files available for different components):
python3 -m venv venv source venv/bin/activate
-
Install dependencies (from the desired requirements file):
pip install -r master_venv_requirements.txt # main venv for the project
pip install -r unsloth_venv_requirements.txt # for training with Unsloth
pip install -r wildbench_venv_requirements.txt # for evaluation with WildBench
- WildBench for evaluation scripts and benchmarks.
- Supervisors, collaborators, and the Czech Technical University in Prague for support and guidance.