Releases: NVIDIA/physicsnemo
Releases · NVIDIA/physicsnemo
v1.2.0
PhysicsNeMo General Release v1.2.0
Added
- Diffusion Transformer (DiT) model. The DiT model can be accessed in
physicsnemo.experimental.models.dit.DiT
.⚠️ Warning: - Experimental feature
subject to future API changes. - Improved documentation for diffusion models and diffusion utils.
- Safe API to override
__init__
's arguments saved in checkpoint file with
Module.from_checkpoint("chkpt.mdlus", override_args=set(...))
. - PyTorch Geometric MeshGraphNet backend.
- Functionality in DoMINO to take arbitrary number of
scalar
orvector
global parameters and encode them usingclass ParameterModel
- TopoDiff model and example.
- Added ability for DoMINO model to return volume neighbors.
- Added functionality in DoMINO recipe to introduce physics residual losses.
- Diffusion models, metrics, and utils: implementation of Student-t
distribution for EDM-based diffusion models (t-EDM). This feature is adapted
from the paper Heavy-Tailed Diffusion Models, Pandey et al..
This includes a new EDM preconditioner (tEDMPrecondSuperRes
), a loss
function (tEDMResidualLoss
), and a new option in corrdiffdiffusion_step
.
⚠️ This is an experimental feature that can be accessed through the
physicsnemo.experimental
module; it might also be subjected to API changes
without notice. - Bumped Ruff version from 0.0.290 to 0.12.5. Replaced Black with
ruff-format
. - Domino improvements with Unet attention module and user configs
- Hybrid MeshGraphNet for modeling structural deformation
- Enabled TransformerEngine backend in the
transolver
model. - Inference code for x-meshgraphnet example for external aerodynamics.
- Added a new example for external_aerodynamics: training
transolver
on
irregular mesh data for DrivaerML surface data. - Added a new example for external aerodynamics for finetuning pretrained models.
Changed
- Diffusion utils:
physicsnemo.utils.generative
renamed intophysicsnemo.utils.diffusion
- Diffusion models: in CorrDiff model wrappers (
EDMPrecondSuperResolution
and
UNet
), the argumentsprofile_mode
andamp_mode
cannot be overriden by
from_checkpoint
. They are now properties that can be dynamically changed
after the model instantiation with, for example,model.amp_mode = True
andmodel.profile_mode = False
. - Updated healpix data module to use correct
DistributedSampler
target for
test data loader - Existing DGL-based vortex shedding example has been renamed to
vortex_shedding_mgn_dgl
.
Added newvortex_shedding_mgn
example that uses PyTorch Geometric instead. - HEALPixLayer can now use earth2grid HEALPix padding ops, if desired
- Migrated Vortex Shedding Reduced Mesh example to PyTorch Geometric.
- CorrDiff example: fixed bugs when training regression
UNet
. - Diffusion models: fixed bugs related to gradient checkpointing on non-square
images. - Diffusion models: created a separate class
Attention
for clarity and
modularity. UpdatedUNetBlock
accordingly to use theAttention
class
instead of custom attention logic. This will update the model architecture
forSongUNet
-based diffusion models. Changes are not BC-breaking and are
transparent to the user. ⚠️ BC-breaking: refactored the automatic mixed precision
(AMP) API in layers and models defined inphysicsnemo/models/diffusion/
for
improved usability. Note: it is now, not only possible, but required to
explicitly setmodel.amp_mode = True
in order to use the model in a
torch.autocast
clause. This applies to allSongUNet
-based models.- Diffusion models: fixed and improved API to enable fp16 forward pass in
UNet
andEDMPrecondSuperResolution
model wrappers; fp16 forward pass can
now be toggled/untoggled by settingmodel.use_fp16 = True
. - Diffusion models: improved API for Apex group norm.
SongUNet
-based models
will automatically perform conversion of the input tensors to
torch.channels_last
memory format whenmodel.use_apex_gn
isTrue
. New
warnings are raised when attempting to use Apex group norm on CPU. - Diffusion utils: systematic compilation of patching operations in
stochastic_sampler
for improved performance. - CorrDiff example: added option for Student-t EDM (t-EDM) in
train.py
and
generate.py
. When training a CorrDiff diffusion model, this feature can be
enabled with the hydra overrides++training.hp.distribution=student_t
and
++training.hp.nu_student_t=<nu_value>
. For generation, this feature can be
enabled with similar overrides:++generation.distribution=student_t
and
++generation.nu_student_t=<nu_value>
. - CorrDiff example: the parameters
P_mean
andP_std
(used to compute the
noise levelsigma
) are now configurable. They can be set with the hydra
overrides++training.hp.P_mean=<P_mean_value>
and
++training.hp.P_std=<P_std_value>
for training (and similar ones with
training.hp
replaced bygeneration
for generation). - Diffusion utils: patch-based inference and lead time support with
deterministic sampler. - Existing DGL-based XAeroNet example has been renamed to
xaeronet_dgl
.
Added newxaeronet
example that uses PyTorch Geometric instead. - Updated the deforming plate example to use the Hybrid MeshGraphNet model.
⚠️ BC-breaking: Refactored thetransolver
model to improve
readability and performance, and extend to more use cases.- Diffusion models: improved lead time support for
SongUNetPosLtEmbd
and
EDMLoss
. Lead-time embeddings can now be used with/without positional
embeddings. - Diffusion models: consolidate
ApexGroupNorm
andGroupNorm
in
models/diffusion/layers.py
with a factoryget_group_norm
that can
be used to instantiate either one of them.get_group_norm
is now the
recommended way to instantiate a GroupNorm layer inSongUNet
-based and
other diffusion models. - Physicsnemo models: improved checkpoint loading API in
Module.from_checkpoint
that now exposes astrict
parameter to raise error
on missing/unexpected keys, similar to that used in
torch.nn.Module.load_state_dict
. - Migrated Hybrid MGN and deforming plate example to PyTorch Geometric.
Fixed
- Bug fixes in DoMINO model in sphere sampling and tensor reshaping
- Bug fixes in DoMINO utils random sampling and test.py
- Optimized DoMINO config params based on DrivAer ML
v1.1.1
v1.1.0
PhysicsNeMo (Core) General Release v1.1.0
Added
- Added ReGen score-based data assimilation example
- General purpose patching API for patch-based diffusion
- New positional embedding selection strategy for CorrDiff SongUNet models
- Added Multi-Storage Client to allow checkpointing to/from Object Storage
Changed
- Simplified CorrDiff config files, updated default values
- Refactored CorrDiff losses and samplers to use the patching API
- Support for non-square images and patches in patch-based diffusion
- ERA5 download example updated to use current file format convention and
restricts global statistics computation to the training set - Support for training custom StormCast models and various other improvements for StormCast
- Updated CorrDiff training code to support multiple patch iterations to amortize
regression cost and usage oftorch.compile
- Refactored
physicsnemo/models/diffusion/layers.py
to optimize data type
casting workflow, avoiding unnecessary casting under autocast mode - Refactored Conv2d to enable fusion of conv2d with bias addition
- Refactored GroupNorm, UNetBlock, SongUNet, SongUNetPosEmbd to support usage of
Apex GroupNorm, fusion of activation with GroupNorm, and AMP workflow. - Updated SongUNetPosEmbd to avoid unnecessary HtoD Memcpy of
pos_embd
- Updated
from_checkpoint
to accommodate conversion between Apex optimized ckp
and non-optimized ckp - Refactored CorrDiff NVTX annotation workflow to be configurable
- Refactored
ResidualLoss
to support patch-accumlating training for
amortizing regression costs - Explicit handling of Warp device for ball query and sdf
- Merged SongUNetPosLtEmb with SongUNetPosEmb, add support for batch>1
- Add lead time embedding support for
positional_embedding_selector
. Enable
arbitrary positioning of probabilistic variables - Enable lead time aware regression without CE loss
- Bumped minimum PyTorch version from 2.0.0 to 2.4.0, to minimize
support surface forphysicsnemo.distributed
functionality.
Dependencies
- Made
nvidia.dali
an optional dependency
v1.0.1
v1.0.0
PhysicsNeMo (Core) General Release v1.0.0
Added
- DoMINO model architecture, datapipe and training recipe
- Added matrix decomposition scheme to improve graph partitioning
- DrivAerML dataset support in FIGConvNet example.
- Retraining recipe for DoMINO from a pretrained model checkpoint
- Prototype support for domain parallelism of using ShardTensor (new).
- Enable DeviceMesh initialization via DistributedManager.
- Added Datacenter CFD use case.
- Add leave-in profiling utilities to physicsnemo, to easily enable torch/python/nsight
profiling in all aspects of the codebase.
Changed
- Refactored StormCast training example
- Enhancements and bug fixes to DoMINO model and training example
- Enhancement to parameterize DoMINO model with inlet velocity
- Moved non-dimensionaliztion out of domino datapipe to datapipe in domino example
- Updated utils in
physicsnemo.launch.logging
to avoid unnecessarywandb
andmlflow
imports - Moved to experiment-based Hydra config in Lagrangian-MGN example
- Make data caching optional in
MeshDatapipe
- The use of older
importlib_metadata
library is removed
Deprecated
- ProcessGroupConfig is tagged for future deprecation in favor of DeviceMesh.
Fixed
- Update pytests to skip when the required dependencies are not present
- Bug in data processing script in domino training example
- Fixed NCCL_ASYNC_ERROR_HANDLING deprecation warning
Dependencies
- Remove the numpy dependency upper bound
- Moved pytz and nvtx to optional
- Update the base image for the Dockerfile
- Introduce Multi-Storage Client (MSC) as an optional dependency.
- Introduce
wrapt
as an optional dependency, needed when using
ShardTensor's automatic domain parallelism
v0.9.0
Modulus (core) general release v0.9.0
Added
- FIGConvUNet model and example.
- The Transolver model.
- The XAeroNet model.
- Incoporated CorrDiff-GEFS-HRRR model into CorrDiff, with lead-time aware SongUNet and
cross entropy loss.
Changed
- Refactored EDMPrecondSRV2 preconditioner and fixed the bug related to the metadata
- Extended the checkpointing utility to store metadata.
- Corrected missing export of loggin function used by transolver model
v0.8.0
Modulus (core) general release v0.8.0
Added
- Graph Transformer processor for GraphCast/GenCast.
- Utility to generate STL from Signed Distance Field.
- Metrics for CAE and CFD domain such as integrals, drag, and turbulence invariances and
spectrum. - Added gradient clipping to StaticCapture utilities.
- Bistride Multiscale MeshGraphNet example.
Changed
- Refactored CorrDiff training recipe for improved usability
- Fixed timezone calculation in datapipe cosine zenith utility.
v0.7.0
Modulus (core) general release v0.7.0
Added
- Code logging for CorrDiff via Wandb.
- Augmentation pipeline for CorrDiff.
- Regression output as additional conditioning for CorrDiff.
- Learnable positional embedding for CorrDiff.
- Support for patch-based CorrDiff training and generation (stochastic sampling only)
- Enable CorrDiff multi-gpu generation
- Diffusion model for fluid data super-resolution (CMU contribution).
- The Virtual Foundry GraphNet.
- A synthetic dataloader for global weather prediction models, demonstrated on GraphCast.
- Sorted Empirical CDF CRPS algorithm
- Support for history, cos zenith, and downscaling/upscaling in the ERA5 HDF5 dataloader.
- An example showing how to train a "tensor-parallel" version of GraphCast on a
Shallow-Water-Equation example. - 3D UNet
- AeroGraphNet example of training of MeshGraphNet on Ahmed body and DrivAerNet datasets.
- Warp SDF routine
- DLWP HEALPix model
- Pangu Weather model
- Fengwu model
- SwinRNN model
- Modulated AFNO model
Changed
- Raise
ModulusUndefinedGroupError
when querying undefined process groups - Changed Indexing error in
examples/cfd/swe_nonlinear_pino
formodulus
loss function - Safeguarding against uninitialized usage of
DistributedManager
Removed
- Remove mlflow from deployment image
Fixed
- Fixed bug in the partitioning logic for distributing graph structures
intended for distributed message-passing. - Fixed bugs for corrdiff diffusion training of
EDMv1
andEDMv2
Dependencies
- Update DALI to CUDA 12 compatible version.
- Update minimum python version to 3.10
v0.6.0
Modulus (core) general release v0.6.0
Added
- Added citation file
- Link to the CWA dataset
- ClimateDatapipe: an improved datapipe for HDF5/NetCDF4 formatted climate data
- Performance optimizations to CorrDiff
- Physics-Informed Nonlinear Shallow Water Equations example
- Warp neighbor search routine with a minimal example
- Strict option for loading Modulus checkpoints
- Regression only or diffusion only inference for CorrDiff
- Support for organization level model files on NGC file system
- Physics-Informed Magnetohydrodynamics example
Changed
- Updated Ahmed Body and Vortex Shedding examples to use Hydra config
- Added more config options to FCN AFNO example
- Moved posiitonal embedding in CorrDiff from the dataloader to network architecture
Deprecated
modulus.models.diffusion.preconditioning.EDMPrecondSR
. UseEDMPecondSRV2
instead
Removed
- Pickle dependency for CorrDiff
Fixed
- Consistent handling of single GPU runs in DistributedManager
- Output location of objects downloaded with NGC file system
- Bug in scaling the conditional input in CorrDiff deterministic sampler
Dependencies
- Updated DGL build in Dockerfile
- Updated default base image
- Moved Onnx from optional to required dependencies
- Optional Makani dependency required for SFNO model
v0.5.0
Modulus (core) general release v0.5.0
Added
- Distributed process group configuration mechanism.
- DistributedManager utility to instantiate process groups based on a process group config.
- Helper functions to facilitate distributed training with shared parameters.
- Brain anomaly detection example.
- Updated Frechet Inception Distance to use Wasserstein 2-norm with improved stability.
- Molecular Dynamics example.
- Improved usage of GraphPartition, added more flexible ways of defining a partitioned graph.
- Physics-Informed Stokes Flow example.
Changed
- MLFLow logging such that only proc 0 logs to MLFlow.
- FNO given separate methods for constructing lift and spectral encoder layers.
Removed
- The experimental SFNO
Dependencies
- Removed experimental SFNO dependencies
- Added CorrDiff dependencies (cftime, einops, pyspng)
- Made tqdm a required dependency