AMD Nitro-T

Nitro-T is a family of text-to-image diffusion models focused on highly efficient training. Our models achieve competitive scores on image generation benchmarks compared to previous models focused on efficient training while requiring less than 1 day of training from scratch on 32 AMD Instinct^TM MI300X GPUs.

This repository provides training and data preparation scripts to reproduce our results. We hope this codebase for efficient diffusion model training enables researchers to iterate faster on ideas and lowers the barrier for independent developers to build custom models.

The models can be found on HuggingFace:

Nitro-T-0.6B, a 512px DiT-based model
Nitro-T-1.2B, a 1024px MMDiT-based model

Environment

The codebase in implemented using PyTorch. Follow the official instructions to install it in your compute environment.

Docker image

When running on AMD Instinct^TM GPUs, it is recommended to use the public PyTorch ROCm images to get optimized performance out-of-the-box.

docker pull rocm/pytorch-training

Dependencies

pip install -r requirements.txt

Preparing the training dataset

The Nitro-T models were trained on a dataset of ~35M images consisting of both real and synthetic data sources that are openly available on the internet. Use the scripts in core/datasets/scripts to download and pre-process the dataset. The scripts are based on the excellent MicroDiT repo and modified for our use case.

Training the models

Launch a training run using this script:

bash scripts/run_train.sh

Use the config files to control the training process

configs/accelerate.yaml: Set the multi-GPU / multi-node distributed training setup, torch compile, etc.
configs/default_config.yaml: Set the training hyperparameters, dataset paths.
Experiment-specific configs override the values in default_config.yaml

Minimal inference example

You must use diffusers>=0.34 in order to load the model from the Huggingface hub (Issue)

import torch
from diffusers import DiffusionPipeline
from transformers import AutoModelForCausalLM

torch.set_grad_enabled(False)

device = torch.device('cuda:0')
dtype = torch.bfloat16
resolution = 512
MODEL_NAME = "amd/Nitro-T-0.6B"

text_encoder = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", torch_dtype=dtype)
pipe = DiffusionPipeline.from_pretrained(
    MODEL_NAME,
    text_encoder=text_encoder,
    torch_dtype=dtype, 
    trust_remote_code=True,
)
pipe.to(device)

image = pipe(
    prompt="The image is a close-up portrait of a scientist in a modern laboratory. He has short, neatly styled black hair and wears thin, stylish eyeglasses. The lighting is soft and warm, highlighting his facial features against a backdrop of lab equipment and glowing screens.",
    height=resolution, width=resolution,
    num_inference_steps=20,
    guidance_scale=4.0,
).images[0]

image.save("output.png")

Examples of images generated by Nitro-T


Images generated by Nitro-T-1.2B at 1024px resolution


Images generated by Nitro-T-0.6B at 512px resolution

Acknowledgements

We would like to thank MicroDiT for sparking the idea for this project and providing easy dataset processing scripts, and Diffusers for providing modular building blocks for diffusion models.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
core		core
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AMD Nitro-T

Environment

Docker image

Dependencies

Preparing the training dataset

Training the models

Minimal inference example

Examples of images generated by Nitro-T

Acknowledgements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

AMD-AGI/Nitro-T

Folders and files

Latest commit

History

Repository files navigation

AMD Nitro-T

Environment

Docker image

Dependencies

Preparing the training dataset

Training the models

Minimal inference example

Examples of images generated by Nitro-T

Acknowledgements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages