Proposal: Core Taxonomy & Ontology for PDE Data and Model Declarations in NVIDIA Modulus #757

YGMaerz · 2025-01-11T18:47:29Z

YGMaerz
Jan 11, 2025

Hi Modulus Community!

I want to propose that Modulus incorporates a core taxonomy and ontology for PDE data and model declarations.

Context

I’ve been working with various parts of NVIDIA Modulus and noticed that data compatibility between different operator-learning methods (FNO, AFNO, PINNs, etc.) often requires manual reformatting—resampling unstructured meshes onto grids, extracting point clouds, handling boundary data, etc. These steps are crucial but can be inconsistent or ad hoc. Moreover, for advanced AutoML or workflow pipelines, we lack a consistent way to match a given dataset to the right neural surrogate or automate conversions when needed.

Idea

Rather than building a large "modulus-transform" library in-house, I propose we define a Core Taxonomy & Ontology for describing PDE data—and expand that idea to let each model also declare its data requirements. This would be a minimal but standardized way for data sets to “announce”:

Dimension (1D, 2D, 3D, etc.).
Geometry Type (grid, mesh, or point).
Uniform (boolean) for spacing/topology.
Representation details (array layouts, adjacency lists, boundary masks).
… plus other optional fields like boundary, is_transient, etc.
(See the Appendix below for a detailed field listing.)

Key Motivations

AutoML Workflows: Make it easier to select the right model for a given dataset or PDE problem—potentially enabling a “descriptor → model” matching engine in the future.
Automated Transformations: We can enable automated transformations (e.g., unstructured mesh → uniform grid) by exposing an ontology interface that external libraries can plug into. This fosters more sophisticated workflows that build on Modulus.
Interoperability: Encourage clearer data definitions so diverse neural operator implementations (FNO, WNO, DiffusionNet, etc.) and research repositories can collaborate seamlessly.

Why This Matters

Clarity: Each dataset can include a small YAML/JSON descriptor summarizing how it’s structured (e.g., dimension, geometry type, boundary info).
Automatic Checking: If a user tries to feed an unstructured mesh into AFNO, Modulus can detect a mismatch and suggest external tools for mesh→grid conversions.
No Need for a Full Library: We simply define the interface—transformation tasks can be performed by external open-source projects (PyVista, VTK, Open3D) if desired.
Easier Model Selection: Each Modulus operator can declare a snippet of which data types it supports. If your dataset descriptor doesn’t match, you receive a clear alert (e.g., “Needs a 2D uniform grid”) or a pointer to an alternative model.

Friction Points So Far

Mismatch in Data Requirements: Some models need uniform grids (FNO, WNO), others want unstructured meshes (DiffusionNet), and others use collocation points (PINNs). Without explicit descriptors, errors or haphazard re-sampling scripts abound.
Manual, Ad-Hoc Conversions: We frequently code one-off transformations (mesh → grid, point → grid, etc.), guess boundary handling, and end up cluttering our workflow with repetitive tasks instead of focusing on PDE modeling.
Lack of Interoperability: Switching from one surrogate model to another is unclear if we don’t know which data conversions are feasible or necessary. You might have a robust unstructured mesh but discover the model needs a uniform 2D grid.
Impact on Experimentation & Collaboration: Without a formal descriptor stating dimension, geometry, boundary flags, HPC teams or new collaborators struggle to replicate or extend existing projects. Even trying “the same dataset on two PDE surrogates” can become a hassle.

Envisioned Changes

Data Descriptor: A standard .yaml or .json file accompanies each dataset.
For example:

data_structure:
  dimension: 3
  geometry_type: "mesh"
  uniform: false
  boundary: true
  representation:
    vertices: (N, 3)
    faces: (M, 3)

Model Declarations: Each built-in operator (FNO, PINNs, etc.) has a compact snippet describing its accepted formats (e.g., “grid, uniform=true, dimension=2 or 3”).
Integrating External Tools: If mismatch occurs, you can invoke a minimal “adapter” or CLI hook (e.g., “Mesh to Uniform Grid” with PyVista). Modulus itself only interfaces with these transformations, so we avoid building or maintaining a massive transformation library in-house.

Benefits

Streamlined User Experience: No guesswork on array shapes or boundary labeling—datasets explicitly state their structure.
Multi-Model Experimentation: If you want to try AFNO and DiffusionNet on the same domain, you know what conversions (if any) are needed.
Reproducibility: PDE data sets shared with a descriptor remove ambiguity—everyone knows exactly how the geometry is stored.

Request & Next Steps

Feedback: Is this feasible, or do you see potential snags? Any must-have fields missing in the descriptor?
Suggestions: Are there library routines (PyVista/VTK/Open3D) worth referencing as “recommended” transforms?
Use Cases: Please share if you’ve faced painful integration or confusion about data formats in Modulus—that’s exactly what this proposal aims to fix.

Below, I’ll outline in the Appendix how a Core Taxonomy & Ontology can define the minimal fields for PDE data, along with short examples of model declarations and a “master table” of popular PDE surrogates. Let me know your thoughts on the approach!

Thanks!
Georg Maerz

Appendix

Table-of-content:

Motivation, Use cases, and Benefits
Core Idea and Proposed Solution Outline
- A. Taxonomy fields
- B. The Concept: “Descriptors” for Datasets and for models
- C. Model Declarations: “Accepted Formats”
- D. Benefits of This Approach
- E. Implementation Sketch

1. Deep-Dive on Taxonomy & Ontology
- 3.1. Taxonomy Fields
- 3.2. Data Representation
- 3.3 External Tools: “Adapters”
1. Overview of Model classification (Master Table of Models vs. Accepted Data + Four Example Models)
- 4.1. Master Table
- 4.2. Four Quick Examples

1. Motivation, Use Cases, and Benefits

A well-defined taxonomy for physics-based AI data isn’t just an academic exercise—it enables practical solutions in real-world workflows, from industrial design optimization to academic PDE research. By clearly identifying how each dataset (and model) is structured, we can reduce guesswork, streamline transformations, and facilitate AutoML scenarios. Below are key ways this ontology helps:

1.1 Automated Model Selection

Scenario: A user has a 3D non-uniform surface mesh and wants to see which neural PDE surrogates (FNO, WNO, DiffusionNet, etc.) can handle it directly.
How the Taxonomy Helps:

By tagging the dataset with fields like dimension: 3, geometry_type: "mesh", uniform: false, etc., a tool (or “ontology engine”) can instantly tell which models list “3D unstructured mesh” in their accepted data.
Benefit: Quick, automatic identification of surrogates that match the dataset—and if none match, the system suggests a transformation step or an alternative approach. This fosters an AutoML-style pipeline for PDE data.

1.2 Data Transformations and Interchange

Scenario: Converting from a volumetric mesh (e.g., tetrahedral elements) to a uniform grid (for a spectral-based model), or from a surface mesh to a point cloud (for a point-based model).
How the Taxonomy Helps:

The ontology describes both the source (e.g., “3D_non_uniform_volumetric_mesh”) and the intended target (“3D_grid”).
Benefit: A user can rely on consistent “source → target” descriptors to invoke standard transformation utilities (e.g., VTK, PyVista) without writing ad-hoc scripts each time—reducing friction in multi-step PDE workflows.

1.3 Multi-Model Experimentation

Scenario: You want to compare how FNO, WNO, and DiffusionNet perform on the same dataset.
How the Taxonomy Helps:

A single descriptor can specify the domain geometry, boundary info, and channels. An “ontology engine” can see if FNO/WNO require uniform grids or if DiffusionNet uses an unstructured mesh.
Benefit: Allows fair, robust experimentation—users can systematically check which conversions are needed (if any) to apply multiple models to the same PDE domain.

1.4 Reproducibility and Collaboration

Scenario: A research team shares PDE data and training scripts on GitHub. Another team wants to replicate or extend the results using a different PDE solver or neural architecture.
How the Taxonomy Helps:

If the dataset is labeled with a standard descriptor (e.g., “3D_point_cloud with boundary labeling, is_transient: true”), new collaborators know exactly what data shape they’re dealing with.
Benefit: Fewer misunderstandings about array layouts, boundary conditions, or spacing. Studies become simpler to reproduce—everyone is on the same page about data definitions.

1.5 Extensibility for New Models

Scenario: A novel PDE surrogate emerges, requiring specialized data (multi-block structured grids, spherical geodesic tiling, etc.).
How the Taxonomy Helps:

That method can be added to the “Model vs. Data Table,” specifying exactly which geometry types it accepts. If new geometry types are needed, they can be incorporated with minimal disruption to existing definitions.
Benefit: The taxonomy evolves naturally, preserving consistency as the community adds or modifies PDE surrogates.

1.6 Support for Industrial & HPC Workflows

Scenario: In aerospace or automotive industries, massive HPC simulations produce million-element meshes or large spatiotemporal datasets. Engineers want to apply neural surrogates for design optimization or digital twins.
How the Taxonomy Helps:

Because each dataset is explicitly classified, HPC engineers can build pipelines that automatically convert solver outputs into ML-ready formats—or feed them into an AutoML system for PDE surrogates.
Benefit: Scalability—industrial workflows no longer rely on one-off data manipulations. The taxonomy ensures a consistent approach that scales to big data scenarios.

1. Motivation, Use cases, and Benefits

Automl and workflow explanation

A well-defined taxonomy for physics-based AI data isn’t just an academic exercise—it enables practical solutions in real-world workflows, from industrial design optimization to academic PDE research. Below are the primary use cases where this ontology proves most valuable, along with the benefits derived from a standardized approach.

Automated Model Selection
- Scenario: A user has a dataset in some format (e.g., a 3D non-uniform surface mesh) and wants to know which models (FNO, WNO, DiffusionNet, etc.) can accept it without extensive pre-processing.
- How the Taxonomy Helps: By describing the dataset with our ontology (e.g., dimension: 3, geometry_type: "mesh", uniform: false, cell_type: "triangle", etc.), a selection tool can immediately check which models list “3D unstructured mesh” in their accepted data formats.
- Benefit: Rapid identification of compatible surrogates, or an automatic suggestion to transform the data if the user wants to apply a different model.
Data Transformations and Interchange
- Scenario: Converting from a volumetric mesh (e.g., a tetrahedral finite-element mesh) to a uniform grid (for a spectral-based neural operator), or sampling from a surface mesh to a point cloud (for a point-based model).
- How the Taxonomy Helps: The ontology states both the source (e.g., “3D_non_uniform_volumetric_mesh”) and target (“3D_grid”) in clear terms. This formalization enables a library of transform functions to check if a direct or approximate conversion is possible (and how “lossy” it might be).
- Benefit: Consistency—instead of custom ad-hoc scripts each time, a user can rely on a universal set of transformation utilities keyed to these standardized descriptors.
Multi-Model Experimentation
- Scenario: A data scientist wants to compare how different neural PDE surrogates (e.g., FNO vs. WNO vs. DiffusionNet) perform on the same dataset.
- How the Taxonomy Helps: By specifying a single dataset descriptor, the pipeline can attempt (or automate) conversions to each model’s accepted format. If all are feasible, the user can easily benchmark results.
- Benefit: Encourages fair comparisons across diverse ML models and fosters more robust experimentation—no longer blocked by “I can’t feed a mesh to that code” or “this model only wants a grid.”
Reproducibility and Collaboration
- Scenario: Researchers share PDE data and code on GitHub or a similar platform. Another team wants to replicate or extend those results using a different neural solver.
- How the Taxonomy Helps: If the dataset is labeled with a standard descriptor (e.g., “3D_point_cloud with boundary labeling, is_transient: true”), collaborators immediately know the format and how to handle it.
- Benefit: Fewer misunderstandings about data shapes or boundary definitions. Studies become easier to reproduce because the data format is unambiguously documented.
Extensibility for New Models
- Scenario: A new operator-learning method emerges, requiring specialized data (say, multi-block structured grids or spherical geodesic tiling).
- How the Taxonomy Helps: The new method can be added to the “Model vs. Data Table,” indicating precisely which geometry types and representations it accepts. If new geometry categories are needed, they can be added to the taxonomy with minimal disruption to existing entries.
- Benefit: The taxonomy evolves naturally, ensuring that the entire community maintains a consistent approach to describing PDE data.
Support for Industrial & HPC Workflows
- Scenario: Large-scale simulations in automotive or aerospace industries produce massive HPC data (e.g., million-element volumetric meshes or time-series snapshots). Analysts want to apply neural surrogates for design optimization or digital twins.
- How the Taxonomy Helps: Because data structures are explicitly classified, HPC engineers can script robust pipelines that batch-convert solver outputs into ML-ready formats, or feed them into an AutoML system for PDE surrogates.
- Benefit: Scalability and consistency—industrial workflows become less reliant on custom, one-off data manipulations and more on standard, documented transformations.

2. Core Idea and Proposed Solution Outline

2.A. The Concept: “Descriptors” for Dataset

Rather than continuing with one-off data scripts, we introduce a lightweight “descriptor” file for each PDE dataset or for each entry in the model zoo of a model. This descriptor (in JSON, YAML, or similar) captures minimal metadata:

dimension (e.g., 1, 2, or 3 for the spatial domain)
geometry_type ("grid", "mesh", or "point")
uniform (true/false for spacing/connectivity)
representation (e.g., [N, H, W, C] for grids, (vertices, faces) for meshes)
boundary (true if boundary conditions/labels are explicitly stored)
channels (number of PDE variables or feature channels)
coordinate_mapping: String or null: how we map discrete indices to physical coordinates (e.g., "implicit uniform", or name of a coordinate array).
cell_type: String or null: describes element shape in a mesh (e.g., "triangle", "tetra", "quad"). null if not applicable (e.g., grid).
plus optional fields like decimation_level, is_transient, etc.

The fields above are the so-called Taxonomy. (These fields are detailed in Section 3, Taxonomy Fields.)

Why? So each dataset “announces” how it’s structured or each model in the model zoo "announces" which input it can take (i.e., which transformation a workflow should do).
If your data is a 2D uniform grid, the descriptor might look like this:

data_structure:
  dimension: 2
  geometry_type: "grid"
  uniform: true
  representation:
    array_layout: "[N, H, W, C]"
  boundary: false
  channels: 1
  # etc.

2.B. Model Declarations: “Accepted Formats”

On the flip side, each neural PDE model or operator in Modulus can declare what data format(s) it supports. For example, a Fourier-based operator like FNO might say:

model_name: "FNO_2D"
accepted_formats:
  - dimension: 2
    geometry_type: "grid"
    uniform: true
    # optional constraints like channels >= 1

When you load FNO, Modulus checks: “Does your dataset’s descriptor match geometry_type: "grid", dimension: 2, uniform: true?” If yes—great. If not, it says “FNO expects a 2D uniform grid, but your data is a 3D unstructured mesh—please re-sample or choose a different model.”

This avoids building a massive transformation library in-house. We only define the interface—the “language” each dataset speaks and each model requires. Actual mesh→grid, grid→point-cloud transformations can be done with open-source libraries like PyVista, VTK, or Open3D, if needed.

Some models can accept multiple formats if it’s flexible (e.g., “FNO_2D” plus “FNO_3D” or partial uniform).

2.C. Benefits of This Approach

Immediate Clarity: If you see dimension: 3 and geometry_type: "mesh", you know you’re dealing with an unstructured domain in 3D. Models that only do uniform grids or collocation points are off the table (or need re-sampling).
AutoML-Style Pipelines: In principle, one could build an automated “data→model” matching system. If Modulus sees dimension=2/uniform grid, it might suggest “FNO or AFNO.” If geometry_type=“mesh,” it might suggest “DiffusionNet” or “MeshGraphNet.”
Lightweight: We’re not rewriting code to handle each possible transform. We’re just documenting data structures and letting the user (or external scripts) handle conversions if needed.
Scalability: As new PDE surrogates come online (DeepONet, PDE-Transformer, etc.), they add a snippet describing accepted data. As new PDE data sets appear, they provide a descriptor. Everything remains consistent without a huge refactor.

2.D. Implementation Sketch

Data Loaders in Modulus:
- Could parse a .yaml or .json descriptor for each dataset.
- Compare it against the “accepted_formats” of the chosen model.
- Either proceed or prompt a “format mismatch” warning.
User Workflow:
1. Create dataset descriptor (dimension, geometry_type, uniform, etc.).
2. Choose a PDE model in Modulus. If it matches, train. If not, convert externally or pick another model.
No Full Library:
- If your data is a surface mesh but you want a 2D uniform grid for WNO, Modulus might just say: “Mismatch. Try PyVista to voxelize the mesh.”
- We avoid huge in-house transformations, focusing on interfaces and easy checks.

Next: I’ll share a short “master table” summarizing models vs. data structures, plus four quick examples (AFNO, PINNs, WNO, DiffusionNet) showing how each might declare accepted formats. Then we can discuss feedback, potential pitfalls, and next steps!

3. Deep-Dive on Taxonomy & Ontology

3.1. Taxonomy Fields

We introduce a minimal set of fields—listed in the table below—to consistently describe the shape, connectivity, and additional metadata of PDE data. Each field is a key–value pair capturing an aspect of the dataset’s domain geometry or PDE variables.

Field	Meaning & Possible Values
dimension	Integer in ({1,2,3,\dots}). The spatial dimensionality of the domain. E.g., 2 for a planar field, 3 for volumetric.
geometry_type	Categorical: one of ({\text{"point"}, \text{"grid"}, \text{"mesh"}}). Describes whether data is raw points, a structured lattice, or an unstructured mesh.
uniform	Boolean: `true` if spacing or topology is regular, `false` if non-uniform/unstructured.
representation	Nested object clarifying how coordinates, adjacency, or array layouts are stored (e.g., `[N, H, W, C]`, adjacency lists, etc.).
is_transient	Boolean: `true` if the data includes multiple time steps within one descriptor, `false` otherwise.
boundary	Boolean: indicates if the data explicitly labels boundary nodes, faces, or conditions. Useful for PDE setups requiring BC enforcement.
cell_type	String or null: describes element shape in a mesh (e.g., `"triangle"`, `"tetra"`, `"quad"`). `null` if not applicable (e.g., grid).
decimation	Boolean: indicates if the data has been downsampled (coarsened). Often relevant for HPC or multi-resolution pipelines.
decimation_level	Integer or Float: optional field that quantifies the ratio or factor of decimation.
channels	Integer or String: indicates how many PDE variables or feature channels each point/element holds (e.g., velocity components, scalar fields).
coordinate_mapping	String or null: how we map discrete indices to physical coordinates (e.g., `"implicit uniform"`, or name of a coordinate array).

Why These Fields?

Real PDE data can vary wildly in geometry (points vs. grids vs. meshes), uniformity (structured vs. unstructured), and required PDE metadata (boundary conditions, material properties, etc.).
These fields strike a balance between minimalism (so it’s easy to fill out) and completeness (to meaningfully distinguish different data formats).

3.2. Data Representation

While the taxonomy fields (dimension, geometry type, uniformity, etc.) describe the conceptual layout of a dataset, the actual storage of PDE data can vary widely. In practice, these variations determine how easily data can be loaded, transformed, or fed into a physics-based AI model. Below, we outline typical representations for grids, meshes, and point sets, along with transient data handling.

Uniform Grids (Structured)
- Shape: For a 2D field, data might be stored as a 3D array ((N, H, W)) (or ((N, H, W, C)) if there are multiple channels).
- Index-to-Coordinate Mapping: Often implicit—for example, each grid cell or point is at ((x_0 + i ,\Delta x, y_0 + j ,\Delta y)).
- Implementation Detail: If stored in NumPy or PyTorch, the tensor shape might be [..., H, W], where the exact order of dimensions depends on user convention (e.g., channels_last vs. channels_first).
- Taxonomy Example:
  - geometry_type: "grid",
  - uniform: true,
  - representation: array_layout: "[N, H, W, C]",
  - coordinate_mapping: "implicit uniform".
Non-Uniform Grids (Structured but Variable Spacing)
- Shape: Similar array structure ((N, H, W, \dots)), but each row/column can have unequal spacing (\Delta x_i, \Delta y_j).
- Index-to-Coordinate Mapping: Typically stored in separate arrays (e.g., a 1D array for x coordinates and another for y, or a 2D coordinate mesh).
- Taxonomy Example:
  - geometry_type: "grid",
  - uniform: false,
  - representation: coordinate_mapping: "[x(i), y(j)]", etc.
Unstructured Meshes
- Vertices: An array storing the spatial coordinates of each node, e.g. (N, 2) for 2D or (N, 3) for 3D.
- Faces/Cells: A separate array listing which vertex indices make up each element. For surfaces, these might be triangles (M, 3) or quads (M, 4). For volumetric meshes, tetrahedra (M, 4) or hexahedra (M, 8).
- Adjacency: Optional but often used for graph neural networks or to speed up neighbor queries. Could be stored as a list of edges, a node→neighbors dictionary, or a sparse matrix.
- Taxonomy Example:
  - geometry_type: "mesh",
  - uniform: false,
  - representation: vertices: (N, 3), faces: (M, 3), adjacency: "list".
Point Clouds
- Shape: Typically an array (N, d) where d is the embedding dimension (2D or 3D).
- No Connectivity: The points have no explicit adjacency, so PDE-based operations (if any) might rely on nearest-neighbor searches or custom approaches.
- Taxonomy Example:
  - geometry_type: "point",
  - uniform: false (usually random or sensor-based),
  - representation: array_layout: "[N, d]".
Transient Data (Multiple Time Steps)
- Single vs. Multi-File: Some pipelines store each time step in a separate file; others stack them in a 4D or 5D array (e.g., ((N_\text{time}, H, W, C))).
- Taxonomy Attribute: is_transient: true indicates that the dataset descriptor includes multiple time frames in one structure.
- Implementation Detail: An index in the first dimension might correspond to time, e.g., (t, x, y, channels).
Boundary or Auxiliary Annotations
- For PDE boundary conditions, a user may add arrays labeling boundary nodes or prescribing Dirichlet/Neumann values. This might appear as:
  - A boolean mask array boundary_mask for each node or grid cell.
  - A dictionary specifying which edges/faces are walls, inlets, or symmetry boundaries.
- Taxonomy: boundary: true, plus a note in representation describing how boundary info is stored.
Decimation / Multi-Scale
- If the data has been downsampled (for instance, from a high-resolution CFD simulation to a coarser grid), an additional field like decimation_level can track how aggressive the reduction was.
- Multi-scale or multi-resolution workflows may store hierarchical meshes or multiple grid sizes, but for simplicity, each descriptor focuses on a single resolution.

3.2.2. How This Connects to the Taxonomy

Each of these representation strategies ties back to the fields in data descriptor. For instance:

dimension + geometry_type clarify if we’re dealing with (N, H, W) arrays on a uniform grid or (vertices, faces) in a mesh.
uniform + representation indicate if we have consistent spacing or must store coordinates.
boundary signals if PDE boundary annotations are included.
is_transient determines whether time steps are embedded in the same data structure.

By consistently encoding these details, we can quickly see whether a dataset (say, an unstructured surface mesh with boundary info) is compatible with a given model (e.g., a graph-based PDE surrogate) or if we need a data transformation (e.g., re-sampling that mesh onto a uniform grid for a Fourier-based operator).

Overall, a clear data representation—in line with the taxonomy fields—makes physics-based ML pipelines more automatable and transparent, ensuring that each step from raw solver output (or sensor measurement) to neural network training is well-defined.

Below is External Tools: “Adapters” of the comprehensive document. This section outlines practical use cases where our Core Taxonomy & Ontology facilitates key workflows (model selection, data transformations, multi-model experimentation), and highlights how a standardized data description benefits the broader physics-based AI community.

3.3 External Tools: “Adapters”

VTK / PyVista
- Open-source libraries that handle mesh → grid or mesh → point cloud transformations.
- Modulus could provide adapters: Python classes or scripts that read the descriptor, call PyVista/VTK operations (e.g., “surface to volumetric,” “voxelization,” etc.), then generate an updated descriptor for the output.
Open3D / PCL
- Tools for point clouds: downsampling, normal estimation, boundary detection.
- If Modulus sees “geometry_type: point,” “uniform: false,” and a user wants a structured grid, an adapter calls Open3D’s nearest-neighbor interpolation or something similar.
High-Performance Tools (e.g., HPC meshing libraries)
- In large-scale HPC contexts, specialized C++ or parallel meshing codes might do the conversions.
- Modulus can define a minimal CLI or Python interface that passes the descriptor fields so the external tool can run the appropriate pipeline.

4. Overview of Model classification (Master Table of Models vs. Accepted Data + Four Example Models)

To illustrate how each PDE surrogate can declare its data requirements—and how users know if their dataset fits—here’s a master table showing typical “Accepted Data” for eight well-known models in physics-based AI. After that, we’ll do four quick sub-sections to give more detailed examples.

4.1. Master Table

Model	Accepted Data	Notes
AFNO	2D/3D uniform grid (structured)	Adaptive Fourier Neural Operator. - Extends FNO with adaptive frequency weighting. - Ideal for global PDE phenomena on a regular lattice.
FNO	2D/3D (often uniform grid, sometimes partial non-uniform)	Fourier Neural Operator. - Uses global Fourier transforms; typically no explicit boundary labeling. - Good for parametric PDE families on grids.
WNO	2D/3D uniform grid (structured, wavelet-based)	Wavelet Neural Operator. - Replaces FFT with wavelet transforms for local/multi-scale features. - Still data-driven, typically no direct PDE boundary labeling.
DiffusionNet	2D/3D unstructured mesh (often surface, can be volumetric)	Graph-like diffusion approach. - Requires explicit `vertices`, `faces`, adjacency. - Good for manifold PDEs or shape analysis on complex geometries.
MeshGraphNet	2D/3D unstructured mesh (surface or volume)	Graph Neural Network PDE solver. - Node-edge message passing. - Handles complex domain connectivity (fluid-structure interaction, etc.).
GraphCast	2D/3D unstructured (often spherical/climate grids)	Large-scale GNN for weather/climate. - Time-evolving data, geodesic tiling. - Designed for planet-scale PDE forecasting.
PointNet	2D/3D point cloud (no adjacency)	Classification/regression on raw points. - Not inherently PDE-oriented, but can adapt if PDE fields are stored as scattered points.
PINNs	Collocation points in 1D/2D/3D + boundary/initial condition info	Physics-Informed Neural Networks. - PDE constraints in the loss. - Works well even with minimal labeled data, focusing on PDE residuals at domain points.

4.2. Four Quick Examples

Here’s a brief demonstration of how four of these models might specify their “accepted_formats,” plus a minimal example descriptor that satisfies each:

A. AFNO (Adaptive Fourier Neural Operator)

model_name: "AFNO"
accepted_formats:
  - dimension: [2, 3]
    geometry_type: "grid"
    uniform: true

User Dataset Example:

data_structure:
  dimension: 2
  geometry_type: "grid"
  uniform: true
  representation:
    array_layout: "[N, H, W, C]"
  boundary: false
  channels: 2

Result: Perfect match, AFNO can directly train on a 2D uniform grid with 2 channels (e.g., velocity, pressure).

B. PINNs

model_name: "PINNs"
accepted_formats:
  - geometry_type: "point"
    dimension: [1, 2, 3]
    uniform: false
    boundary: true

User Dataset Example:

data_structure:
  dimension: 3
  geometry_type: "point"
  uniform: false
  boundary: true
  representation:
    array_layout: "[N, 3]"
    boundary_points: "[Nb, 3]"

Result: PINNs rely on PDE residual enforcement at collocation points + boundary. If your data is basically a set of scattered points in 3D, with boundary labels, it’s fully compatible.

C. WNO (Wavelet Neural Operator)

model_name: "WNO"
accepted_formats:
  - dimension: [2, 3]
    geometry_type: "grid"
    uniform: true

User Dataset Example:

data_structure:
  dimension: 2
  geometry_type: "grid"
  uniform: true
  representation:
    array_layout: "[N, H, W, C]"
  boundary: false
  channels: 1

Result: Another grid-based approach; as long as the domain is uniform, WNO can apply wavelet transforms to capture multi-scale features.

D. DiffusionNet

model_name: "DiffusionNet"
accepted_formats:
  - dimension: [2, 3]
    geometry_type: "mesh"
    uniform: false
    # adjacency or faces required

User Dataset Example:

data_structure:
  dimension: 2
  geometry_type: "mesh"
  uniform: false
  representation:
    vertices: (N, 3)
    faces: (M, 3)
    adjacency: "list"
  boundary: false
  channels: 1

Result: A manifold in 3D (surface mesh) with adjacency info is exactly what DiffusionNet needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Core Taxonomy & Ontology for PDE Data and Model Declarations in NVIDIA Modulus #757

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Proposal: Core Taxonomy & Ontology for PDE Data and Model Declarations in NVIDIA Modulus #757

Uh oh!

YGMaerz Jan 11, 2025

Context

Idea

Key Motivations

Why This Matters

Friction Points So Far

Envisioned Changes

Benefits

Request & Next Steps

Appendix

1. Motivation, Use Cases, and Benefits

1.1 Automated Model Selection

1.2 Data Transformations and Interchange

1.3 Multi-Model Experimentation

1.4 Reproducibility and Collaboration

1.5 Extensibility for New Models

1.6 Support for Industrial & HPC Workflows

1. Motivation, Use cases, and Benefits

2. Core Idea and Proposed Solution Outline

2.A. The Concept: “Descriptors” for Dataset

2.B. Model Declarations: “Accepted Formats”

2.C. Benefits of This Approach

2.D. Implementation Sketch

3. Deep-Dive on Taxonomy & Ontology

3.1. Taxonomy Fields

3.2. Data Representation

3.2.2. How This Connects to the Taxonomy

3.3 External Tools: “Adapters”

4. Overview of Model classification (Master Table of Models vs. Accepted Data + Four Example Models)

4.1. Master Table

4.2. Four Quick Examples

A. AFNO (Adaptive Fourier Neural Operator)

B. PINNs

C. WNO (Wavelet Neural Operator)

D. DiffusionNet

Replies: 0 comments

YGMaerz
Jan 11, 2025