This document summarizes the HMC Torch project, a hierarchical multi-label classification network implemented in PyTorch. The project is currently at version 0.0.1 and includes various Jupyter notebooks, scripts, and configuration files.
- Version: 0.0.1
- Main Changes:
- Removed the main function from the training file to simplify the code structure.
- Updated
.gitignore
to improve version control management.
This section outlines the key components of the project to help users navigate the codebase. The project contains the following key files and directories:
- Notebooks:
Dataset.ipynb
: Handles dataset loading and preprocessing.Executer-model.ipynb
: Contains the model execution logic.Inference.ipynb
: Used for making predictions with the trained model.
- Scripts:
executer.py
: Core execution script for the model.
- Configuration:
pyproject.toml
: Project configuration file.poetry.lock
: Dependency lock file.
- Documentation:
README.md
: Provides an overview and instructions for the project.LICENSE
: Licensing information for the project.
Before setting up the project, ensure you have the following prerequisites installed and configured:
It is recommended to use a virtual environment to manage dependencies. Run the following command to create one:
python -m venv .venv
- Command Prompt (Windows):
source .\.venv\Scripts\activate
- PowerShell (Windows):
.\.venv\Scripts\Activate.ps1
- Linux/MacOS:
source .venv/bin/activate
Poetry is used for dependency management. Install it using pip or another method:
pip install poetry
If you plan to use a GPU for training, configure the project with the following commands:
poetry source add pytorch-gpu https://download.pytorch.org/whl/cu118 --priority=explicit &&
poetry source remove pytorch-cpu || true
If you plan to use a CPU for training, configure the project with the following commands:
poetry source add pytorch-cpu https://download.pytorch.org/whl/cpu --priority=explicit &&
poetry source remove pytorch-gpu || true
These steps will ensure that the project is ready for GPU-based execution.
Once the virtual environment is activated and Poetry is installed, run the following command to install all project dependencies:
poetry install --no-root --with dev
By completing these steps, your environment will be fully prepared to run the HMC Torch project.
pip install kaggle
kaggle datasets download brunosette/gene-ontology-original
# Export your Kaggle username and API key
# export KAGGLE_USERNAME=<YOUR USERNAME>
# export KAGGLE_KEY=<YOUR KAGGLE KEY>
curl -L -u $KAGGLE_USERNAME:$KAGGLE_KEY\
-o ~/Downloads/gene-ontology-original.zip\
https://www.kaggle.com/api/v1/datasets/download/brunosette/gene-ontology-original
mkdir data
unzip gene-ontology-original.zip -d data/
To execute the training process, follow these steps:
Before running the training script, ensure it has the necessary execution permissions:
chmod +x train.sh
You can run the training script with the desired device configuration:
-
For CPU Execution:
./train.sh --device cpu
-
For GPU Execution:
./train.sh --device gpu
These commands will initiate the training process using the specified hardware.
To deploy the project locally, you only need to run the deployment script, as it automates all the steps outlined above. Ensure the script has execution permissions and specify the desired hardware configuration:
- Make the script executable:
chmod +x deploy_local.sh
- Run the deployment script:
- For GPU Deployment:
./deploy_local cuda
- For CPU Deployment:
./deploy_local cpu
By following these instructions, the deployment script will handle the setup process, including dependency installation and environment configuration, based on your hardware preferences.
The project has a total of 8 commits, with the latest updates made on August 25, 2024. Notable commits include:
- Remove main func in train file: Simplified the training process.
- Update project: General updates across various files.
The HMC Torch project is structured to facilitate the development and implementation of hierarchical multi-label classification models using PyTorch. The recent updates have streamlined the codebase and improved project organization.