This repository contains python code accompanying the paper Assessing data-driven predictions of band gap and electrical conductivity for transparent conducting materials.
- Clone this repository:
git clone https://github.com/fedeotto/tcms
- Install a new
conda
environment fromenv.yml
:conda env create -f env.yml
- Activate the new environment:
conda activate tcms
Change the .env.template
file to .env
, modifying the necessary paths accordingly.
Due to licensing restrictions associated with MPDS data and confidentiality agreements tied to funding, we are unable to publicly release the full experimental datasets used in this study (band gap and conductivity). However, we provide representative data that can be used as demonstration to run the proposed pipeline.
First, you can download the updated version of UCSB dataset at the following link (te_expt.xlsx
) and move it into the data
folder. Then, run the script prepare_data.py
via:
python -m data.prepare_data
This will automatically create conductivity.xlsx
and bandgap.xlsx
datasets in the data
folder that can be used to run the code. Band gap data is automatically extracted using matminer
(matbench_expt_gap
).
We provide access to trained models (CrabNet and Random Forest) on the full data presented in the paper at the following link (GDrive).
Trained models can be used to predict electrical conductivity and band gap from arbitrary chemical compositions. You can reproduce the results illustrated in Table 4
of the paper using trained models via
python main.py action=screen model=crabnet ++screen.screen_path=datasets/tcms.xlsx
We include jupyter notebooks illustrating the analysis presented in the paper:
-
att_coeff.ipynb
: contains the analysis relative to the attention coefficients of CrabNet in the task of leave-one-TCM-out described in the paper. -
bandgap_prediction.ipynb
: illustrates the comparison on the task of band gap prediction across different settings, in particular Random Forest, CrabNet with no fine-tuning and CrabNet fine-tuned on Materials project band gap data. -
parity.ipynb
contains additional visualization and plotting corresponding to electrical conductivity and band gap predictions.
It is also possible to evaluate/fit new machine learning models on available data. For example, to fit a new CrabNet model on conductivity data you could simply do it via:
python main.py action=fit model=crabnet data=conductivity