A parameter-free method for estimating material novelty is introduced, leveraging mutual information to analyze inter-material distances along chemical and structural axes. This approach derives data-driven weight functions from the mutual information profile, enabling the computation of quantitative novelty scores based on local density without requiring preset cutoffs. The methodology is validated on diverse materials datasets, demonstrating its effectiveness in identifying and differentiating chemical and structural novelty to guide materials discovery.
For detailed methodology, validation results, and theoretical background, see the peer-reviewed paper: "Mutual Information-Informed Novelty Estimation Of Materials Along Chemical And Structural Axes" published in Digital Discovery (2025).
# Clone the repository
git clone https://github.com/AndrewFalkowski/MINov.git
cd MINov
# Create and activate conda environment
conda env create -f environment.yml
conda activate MINOV
All required libraries (numpy, pandas, scipy, scikit-learn, matplotlib, matminer, pymatgen) and their versions are specified in environment.yml
.
Core functionality is provided through the scripts contained in the MINOV
folder. Novelty can be copmuted over a dataframe of pymatgen structure objects by calling the compute_MI_novelty
function as shown in the code snippet below. Variables isolating specific distance metrics, loading precomputed internal or external distance metrics, and specifying saving paths are provided.
from MINOV.novelty import compute_MI_novelty
# Where 'mat_data' is a pandas DataFrame with 'structure' and 'formula' columns
# containing pymatgen structure objects and formula strings, respectively
data, mi_data = compute_MI_novelty(
data = mat_data, # df of pymatgen structure objects and formulae
compute_metrics = ['lostop'], # list of distance metrics to compute
precomputed_metrics={"elmd": "perovskite_dataset_elmd_dm.npy"}, # load precomputed
data_dir="precomputed", # path to folder with precomputed metrics
data_prefix="perovskite_dataset", # prefix for labeling purposes
)
# outputs:
# data - a df containing material information and computed densities for each metric
# mi_data - a dictionary with computed MI profile data for each metric
Further usage examples are available within the jupyter notebooks described below.
Three Jupyter notebooks demonstrate the application of this method:
-
perovskite_novelty.ipynb
: Shows the method applied to a controlled dataset containing three distinct perovskite cyrstal systems: cubic, tetragonal, and orthorhombic. The data for this notebook is available indata/perovskite_dataset
. -
diverse_novelty.ipynb
: Demonstrates the method using a structurally diverse dataset of materials with varying degrees of similarity. Shows how the method distinguishes between different types of novelty. The data for this notebook is available indata/diverse_dataset
. -
Li_novelty.ipynb
: Applies the method to analyze some lithium-containing compounds from the GNOME dataset relative to known materials in the Materials Project database. The data for this notebook is available indata/MP_Li_dataset
anddata/GNOME_Li_dataset
.
NOTE: The GNOME and MP datasets are pulled from v2023.11.1 of the database. As the database has since changed, we provide all structure files used in the analysis in the /data
folder.
@article{falkowski2025mutual,
title={Mutual Information Informed Novelty Estimation of Materials Along Chemical and Structural Axes},
author={Falkowski, Andrew R and Sparks, Taylor D},
journal={Digital Discovery},
year={2025},
publisher={RSC}
}