Skip to content

AMD-AGI/Nitro-T

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMD Nitro-T


image_row_4x1

Nitro-T is a family of text-to-image diffusion models focused on highly efficient training. Our models achieve competitive scores on image generation benchmarks compared to previous models focused on efficient training while requiring less than 1 day of training from scratch on 32 AMD InstinctTM MI300X GPUs.

This repository provides training and data preparation scripts to reproduce our results. We hope this codebase for efficient diffusion model training enables researchers to iterate faster on ideas and lowers the barrier for independent developers to build custom models.

The models can be found on HuggingFace:

Environment

The codebase in implemented using PyTorch. Follow the official instructions to install it in your compute environment.

Docker image

When running on AMD InstinctTM GPUs, it is recommended to use the public PyTorch ROCm images to get optimized performance out-of-the-box.

docker pull rocm/pytorch-training

Dependencies

pip install -r requirements.txt

Preparing the training dataset

The Nitro-T models were trained on a dataset of ~35M images consisting of both real and synthetic data sources that are openly available on the internet. Use the scripts in core/datasets/scripts to download and pre-process the dataset. The scripts are based on the excellent MicroDiT repo and modified for our use case.

Training the models

Launch a training run using this script:

bash scripts/run_train.sh

Use the config files to control the training process

  • configs/accelerate.yaml: Set the multi-GPU / multi-node distributed training setup, torch compile, etc.
  • configs/default_config.yaml: Set the training hyperparameters, dataset paths.
  • Experiment-specific configs override the values in default_config.yaml

Minimal inference example

You must use diffusers>=0.34 in order to load the model from the Huggingface hub (Issue)

import torch
from diffusers import DiffusionPipeline
from transformers import AutoModelForCausalLM

torch.set_grad_enabled(False)

device = torch.device('cuda:0')
dtype = torch.bfloat16
resolution = 512
MODEL_NAME = "amd/Nitro-T-0.6B"

text_encoder = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.2-1B", torch_dtype=dtype)
pipe = DiffusionPipeline.from_pretrained(
    MODEL_NAME,
    text_encoder=text_encoder,
    torch_dtype=dtype, 
    trust_remote_code=True,
)
pipe.to(device)

image = pipe(
    prompt="The image is a close-up portrait of a scientist in a modern laboratory. He has short, neatly styled black hair and wears thin, stylish eyeglasses. The lighting is soft and warm, highlighting his facial features against a backdrop of lab equipment and glowing screens.",
    height=resolution, width=resolution,
    num_inference_steps=20,
    guidance_scale=4.0,
).images[0]

image.save("output.png")

Examples of images generated by Nitro-T

image_grid_3x3_1024_v2_downsized
Images generated by Nitro-T-1.2B at 1024px resolution
image_grid_3x3_512_v2
Images generated by Nitro-T-0.6B at 512px resolution

Acknowledgements

We would like to thank MicroDiT for sparking the idea for this project and providing easy dataset processing scripts, and Diffusers for providing modular building blocks for diffusion models.

License

Copyright (c) 2025 Advanced Micro Devices, Inc. All Rights Reserved.

This project is licensed under the MIT License.

About

Nitro-T is a family of text-to-image diffusion models focused on highly efficient training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published