This is the official repository of the paper "Total Disentanglement of Font Images into Style and Character Class Features," accepted at ICDAR 2025.
total_disen/
├── src/
│ ├── pretraining.py # Pretraining script
│ ├── finetuning.py # Finetuning script
│ ├── models.py # Model definitions
│ ├── dataset.py # Dataset classes and data loading functions
│ ├── metrics.py # Evaluation metrics
│ ├── feature_visualization.py # Feature visualization using PCA
│ └── font_generation.py # Font generation by combining features
├── sample_data/ # Sample dataset for testing
│ ├── train/ # Training fonts (6 fonts)
│ ├── valid/ # Validation fonts (2 fonts)
│ └── test/ # Test fonts (1 font)
├── checkpoints/ # Model outputs (created during training)
├── results/ # Analysis outputs (created during analysis)
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
└── README.md # This file
The required dataset structure is as follows:
sample_data/
├── train/
│ ├── font1/
│ │ ├── A.png
│ │ ├── B.png
│ │ └── ... (26 letters A-Z)
│ ├── font2/
│ ├── font3/
│ └── ... (multiple fonts for training)
├── valid/
│ ├── font1/
│ └── font2/
│ └── ... (fonts for validation)
└── test/
├── font1/
└── font2/
└── ... (fonts for testing)
Each font directory contains 26 PNG images (A.png through Z.png) at 64x64 resolution.
Download the sample dataset here: sample_data.zip
Download the trained model checkpoints here: checkpoins.zip
-
Build the Docker image
docker build -t font-disentanglement .
-
Run the container interactively
For GPU support (recommended):
docker run -it --rm --gpus all \ -v $(pwd)/src:/workspace/src \ -v $(pwd)/sample_data:/workspace/sample_data \ -v $(pwd)/checkpoints:/workspace/checkpoints \ -v $(pwd)/results:/workspace/results \ font-disentanglement bash
For CPU-only:
docker run -it --rm \ -v $(pwd)/src:/workspace/src \ -v $(pwd)/sample_data:/workspace/sample_data \ -v $(pwd)/checkpoints:/workspace/checkpoints \ -v $(pwd)/results:/workspace/results \ font-disentanglement bash
This will open a bash shell inside the container where you can run the training and analysis scripts.
If you prefer not to use Docker:
pip install -r requirements.txt
pip install torch torchvision
If you want to skip training and use the trained models directly:
- Download the sample dataset and trained model checkpoints (links above)
- Extract the checkpoints to the
checkpoints/
directory - Jump directly to the Analysis Scripts section
Train the initial disentanglement model using sample data:
python src/pretraining.py \
--zdim 256 \
--batch_size 32 \
--num_epochs 50 \
--device cpu \
--font sample_data \
--save_folder my_pretraining
Key pretraining arguments:
--w_class
: Classification loss weight (default: 0.001)--w_rec
: Reconstruction loss weight (default: 1)--w_rec2
: Reconstruction loss weight for cross generation (default: 1)
After pretraining, finetune the model:
python src/finetuning.py \
--pretrain_model checkpoints/my_pretraining/best_model.pth \
--save_folder my_finetuning \
--zdim 256 \
--batch_size 32 \
--num_epochs 50 \
--device cpu \
--font sample_data
Key finetuning arguments:
--w_f
: Font style variance loss weight (default: 1)--w_c
: Content variance loss weight (default: 1)--w_class
: Classification loss weight (default: 0.001)--w_rec
: Reconstruction loss weight (default: 1)
Visualize learned font and character features using PCA:
python src/feature_visualization.py \
--model_path checkpoints/my_finetuning/best_model.pth \
--dataset sample_data/test \
--device cpu \
--output_dir results/feature_visualization \
--show_images
Generates:
- Font-colored scatter plots (both with images and dots)
- Character-colored scatter plots (both with images and dots)
- Comparison plot (side-by-side font and character coloring)
Generate new font styles by combining character features from one font with style features from another:
python src/font_generation.py \
--content_dir sample_data/test/BreeSerif-Regular \
--style_ref_dir sample_data/test/PressStart2P-Regular \
--style_ref_char A \
--model_path checkpoints/my_finetuning/best_model.pth \
--device cpu \
--output_dir results/font_generation
Key generation arguments:
--content_dir
: Directory of content font (provides character shapes)--style_ref_dir
: Directory of style reference font (provides style)--style_ref_char
: Character to extract style from (e.g., 'A', 'B', etc.)
Generates:
- Visualization showing: Content → Style Reference → Generated
- Metrics CSV with distance measurements for each character
--zdim
: Latent space dimension (default: 256)--batch_size
: Training batch size (default: 32)--num_epochs
: Number of training epochs (default: 50-100)--device
: Device to use ('cpu' or 'cuda:0')--font
: Dataset to use ('sample_data' for included sample)
--dataset
: Path to dataset directory (for feature_visualization.py)--content_dir
: Specific font directory for content (for font_generation.py)--style_ref_dir
: Specific font directory for style reference (for font_generation.py)--show_images
: Generate scatter plots with actual character images
Saved in user-specified directories under checkpoints/
:
checkpoints/{save_folder}/
├── best_model.pth # Best model checkpoint
├── least_model.pth # Latest model checkpoint
├── {dataset}_train_loss.csv # Training loss history
├── {dataset}_loss.png # Training loss plots
└── param.txt # Training parameters
Saved in user-specified directories under results/
:
results/feature_visualization/
├── font_pca_dots.png # Font-colored feature scatter plot
├── char_pca_dots.png # Character-colored feature scatter plot
├── font_pca_images.png # Font-colored with actual images (if --show_images)
├── char_pca_images.png # Character-colored with actual images (if --show_images)
└── comparison_pca.png # Side-by-side comparison
results/font_generation/
├── generation_{content}_to_{style}_{char}.png # Generation visualization
└── metrics_{content}_{style}_{char}.csv # Detailed metrics
The disentanglement model separates font representations into:
- Character features (z_c): Shape and structure of individual letters
- Font features (z_f): Style characteristics like thickness, serif, slant
This separation allows for flexible font generation by combining character shapes from one font with style characteristics from another.
If you use this code in your research, please cite our paper:
# Coming soon - ICDAR 2025 proceedings