Skip to content

CTLab-ITMO/Text2SVG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text2SVG

This repo includes code for three key steps, required for SVG images generation via LLM.

  • src/optimization -- SVG images optimization and cleaning;
  • src/captioning -- generation of high-quality captions with VLM;
  • src/training -- finetuning LLM with unsloth.

Prerequisites

pip install .
npm install -g svgo

Optimization

We run optimization in two stages:

  • Initial optimization from raw SVG, path conversion, shifting to [[0, 256], [0, 256]] scale.
  • SVGO optimization, clearing.
optimize_dir
  --input_dir        # Directory containing original SVG files.
  --output_dir       # Directory to store optimized SVGs.
  --cubic_only       # Enable conversion of segments to cubic.
  --normalize_points # Enable normalization of points.
  --normalize_scale  # Normalization scale.
  --normalize_to_int # Round coordinates to integers after normalization.
  --num_threads      # Number of threads for the Python optimization stage.
  --svgo_config      # Path to an SVGO configuration file.

Example

Before optimization:

Initial image

<svg version="1.1" xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 20 20">
<title>power</title>
<path d="M10.625 1.681c0-0.345-0.28-0.625-0.625-0.625s-0.625 0.28-0.625 0.625v8.125c0 0.345 0.28 0.625 0.625 0.625s0.625-0.28 0.625-0.625v-8.125z"></path>
<path d="M7.12 2.881c0.318-0.135 0.466-0.502 0.33-0.82s-0.502-0.466-0.82-0.33c-3.156 1.343-5.38 4.436-5.38 8.075 0 4.845 3.905 8.75 8.75 8.75s8.75-3.905 8.75-8.75c0-3.639-2.225-6.732-5.38-8.075-0.318-0.135-0.685 0.013-0.82 0.33s0.013 0.685 0.33 0.82c2.719 1.157 4.62 3.814 4.62 6.925 0 4.155-3.345 7.5-7.5 7.5s-7.5-3.345-7.5-7.5c0-3.111 1.9-5.768 4.62-6.925z"></path>

After optimization:

Optimized image

<svg viewBox="0 0 256 256">
  <path d="M136 22Q135 14 128 14T120 22V126Q121 133 128 134 135 133 136 126z"/>
  <path d="M91 37Q98 33 95 26 92 20 85 22C44 39 16 79 16 126 16 188 66 238 128 238S240 188 240 126C240 79 212 39 171 22Q164 20 161 26 158 33 165 37C200 52 224 86 224 126 224 179 181 222 128 222S32 179 32 126C32 86 56 52 91 37"/>
</svg>

Captioning

We caption SVG images via VLM, Qwen/Qwen2-VL-7B-Instruct by default.

caption_dir
  --dataset     # Dataset for captioning. Should contain columns: 'svg_name' and 'svg_contents'.
  --start_index # Start index (inclusive) for the subset.
  --end_index   # End index (exclusive) for the subset.
  --batch_size  # Number of examples to process in a single batch.
  --max_samples # Limit total number of processed samples after subset selection.
  --hf_repo     # Push to this private HF dataset repo (e.g., 'username/my_repo').
  --model_path  # Path for VL model.
  --output_csv  # Path to the local CSV file where results will be stored.

Example

Generated caption: Black power button with a diagonal line. The button has a circular shape with a rectangular line bisecting it.

Optimized image

Training

Currently there are several scripts for training and evaluation. All training is handled via Unsloth framework.

python3 run_training.py

Example of current generation

Initial image

About

Repo for generation of SVGs by text prompts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •