VisionConverter is a library for converting object detection annotation datasets between popular formats. It simplifies dataset interoperability for machine learning and computer vision projects.
Key Features:
- Bidirectional conversion between supported formats
- Unified internal representation ensures consistent and reliable transformations
Conversion Process:
- Load the input dataset from the specified path
- Transforms to internal representation
- Convert from internal representation to target output format
- Save the converted dataset to the desired output location
pip install vision-converter
Clone the repository and install the package:
git clone https://github.com/GCousido/vision-converter.git
cd vision-converter
pip install .
For development including dependencies for testing and in editable mode:
git clone https://github.com/GCousido/vision-converter.git
cd vision-converter
pip install -e ".[dev]"
You can use VisionConverter as a Python library to convert datasets programmatically.
from vision_converter import YoloFormat, YoloConverter, CocoFormat, CocoConverter, NeutralFormat
yolo_dataset: YoloFormat = YoloFormat.read_from_folder("./dataset/yolo")
internal_dataset: NeutralFormat = YoloConverter.toNeutral(yolo_dataset)
coco_dataset: CocoFormat = CocoConverter.fromNeutral(internal_dataset)
coco_dataset.save("./dataset/coco")
The CLI provides a simple interface for converting datasets:
vconverter --input-format <INPUT_FORMAT> --input-path <INPUT_PATH> --output-format <OUTPUT_FORMAT> --output-path <OUTPUT_PATH> <OPTIONS>
--input-format
: Source format--input-path
: Path to the folder containing the input dataset--output-format
: Target format--output-path
: Path to save the converted dataset
--copy-images
: Copy images files to the output directory.--symlink-images
: Creates symbolic links to the original images in the output directory.
Convert a YOLO dataset to COCO:
vconverter --input-format yolo --input-path ./datasets/yolo --output-format coco --output-path ./datasets/coco
Convert Pascal VOC to YOLO:
vconverter --input-format pascal_voc --input-path ./datasets/pascalvoc --output-format yolo --output-path ./datasets/yolo
Convert COCO to Pascal VOC with images:
vconverter --input-format coco --input-path ./datasets/coco --output-format pascal_voc --output-path ./datasets/pascalvoc --copy-images
Format | Input | Output | Parameter Value | Description |
---|---|---|---|---|
YOLO | ✅ | ✅ | yolo | YOLO format (.txt files with normalized coordinates and classes.txt for class names) |
COCO | ✅ | ✅ | coco | Microsoft COCO format (.json with absolute coordinates) |
Pascal VOC | ✅ | ✅ | pascal_voc | Pascal Visual Object Classes format (.xml files with absolute coordinates) |
CreateML | ✅ | ✅ | createml | Apple CreateML format (.json with centered bounding boxes and absolute coordinates) |
TensorFlow CSV | ✅ | ✅ | tensorflow_csv | TensorFlow Object Detection CSV format (.csv with absolute coordinates) |
LabelMe | ✅ | ✅ | labelme | LabelMe JSON format (.json files with shape annotations and optional embedded image data) |
VGG | ✅ | ✅ | vgg | VGG Image Annotator format (.json with multiple shape types and region attributes) |
- File Structure: One
.txt
file per image with same basename as the image - Annotation Format:
<class_id> <x_center> <y_center> <width> <height>
- Coordinates: Normalized values between 0 and 1 (relatives to the image size)
- Additional Files:
classes.txt
containing class names, one per line
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg │
│ img2.jpg │
├── labels/ ├── labels/
│ img1.txt │ img1.txt
│ img2.txt │ img2.txt
│ classes.txt │ classes.txt
- File Structure: Single
.json
file containing all annotations - Annotation Format: JSON with images, annotations and categories arrays
- Coordinates: Absolute pixel values
[x, y, width, height]
- Metadata: Includes dataset
info
,licenses
, andcategory
definitions
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg |
│ img2.jpg |
├── annotations.json ├── annotations.json
- File Structure: One
.xml
file per image, sharing the basename with the image file - Annotation Format: XML structure with bounding box coordinates and class names
- Coordinates: Absolute pixel values
<xmin>, <ymin>, <xmax>, <ymax>
- Metadata: Rich annotation metadata, including image
size
, object attributes (difficult
,truncated
,occluded
), andsource
info
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── JPEGImages/ ├── JPEGImages/
│ img1.jpg │
│ img2.jpg │
├── Annotations/ ├── Annotations/
│ img1.xml │ img1.xml
│ img2.xml │ img2.xml
|-- ImageSets/ |-- ImageSets/
- File Structure: Single
.json
file containing all annotations and an images/ folder with image files - Annotation Format: JSON array with entries for each image, each containing image filename and annotations array
- Coordinates: Absolute pixel values with bounding boxes defined by center coordinates and dimensions
{x_center, y_center, width, height}
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg │
│ img2.jpg │
├── annotations.json ├── annotations.json
- File Structure: Single
.csv
file containing all annotations - Annotation Format: CSV structure with specific columns for image metadata and bounding box coordinates
- Coordinates: Absolute pixel values
<xmin>, <ymin>, <xmax>, <ymax>
- Required Columns:
filename
,width
,height
,class
,xmin
,ymin
,xmax
,ymax
- Features: Human-readable format, direct compatibility with TensorFlow Object Detection API, supports multiple objects per image
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── images/ ├── images/
│ img1.jpg │
│ img2.jpg │
├── annotations.csv ├── annotations.csv
- File Structure: One
.json
file per image containing annotations and image metadata - Annotation Format: JSON with shapes array, each shape having
label
,points
,shape_type
,group_id
,flags
, and optionaldescription
- Coordinates: Absolute pixel values for
points
definingshapes
(e.g., polygons, rectangles) - Image Data: Optional
base64
encoded image data embedded inimageData
field - Metadata: Includes dataset
version
,flags
,imagePath
,imageHeight
,imageWidth
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
├── img1.jpg ├── img1.jpg
├── img1.json ├── img1.json
├── img2.jpg ├── img2.jpg
├── img2.json ├── img2.json
- File Structure: Single
.json
file containing all annotations with VIA metadata structure - Annotation Format: JSON with
_via_img_metadata
containing image entries, each withregions
array for shape annotations - Coordinates: Absolute pixel values with support for 6 shape types:
rect
,circle
,ellipse
,polygon
,polyline
,point
- Shape Types:
- Rectangle:
{x, y, width, height}
- top-left corner and dimensions - Circle:
{cx, cy, r}
- center coordinates and radius - Ellipse:
{cx, cy, rx, ry, theta}
- center, radii, and rotation angle - Polygon:
{all_points_x[], all_points_y[]}
- arrays of vertex coordinates - Polyline:
{all_points_x[], all_points_y[]}
- arrays of line point coordinates - Point:
{cx, cy}
- single point coordinates
- Rectangle:
- Metadata: Includes
file_attributes
for image-level data,region_attributes
for annotation-level data, and optional VIA project settings
EXPECTED INPUT FILE STRUCTURE GENERATED OUTPUT FILE STRUCTURE
dataset/ dataset/
|-- images/ ├── images/
| img1.jpg |
| img2.jpg |
├── annotations.json ├── annotations.json
This project is licensed under the MIT License - see the LICENSE file for details.