- Install uv
- Set up the environment:
uv sync
-
Download the dataset from Kaggle
-
Move the downloaded file to
data/raw/
Run the data processing script:
uv run src/process.py
Run the model training script:
uv run src/train_model.py
Both scripts use Hydra for configuration management. The default configurations are in the conf/main.yaml
file. You can override any configuration parameter using the command line. For example:
# Override test size in process.py
uv run src/process.py process.test_size=0.3
# Override hyperparameters in train_model.py
uv run src/train_model.py train.hyperparameters.svm__C=10
To see all available configuration options, you can use the --help
flag:
# View configuration options for process.py
uv run src/process.py --help
# View configuration options for train_model.py
uv run src/train_model.py --help