This repository contains a custom Python script designed to perform dimensionality reduction analysis for the molecular geometries stored in the datasets of the WS22 database hosted in the ZENODO repository (https://doi.org/10.5281/zenodo.6985377).
The script works in three steps:
First, a built-in function is used to convert the Cartesian coordinates of the molecular geometries into
a pairwise distance descriptor of size
To run this script, the following packages should be installed:
- python3 (tested with version 3.8.6)
- glob
- numpy
- pandas
- sklearn
After downloading the desired NPZ datasets from the ZENODO repository to a local directory, one can run the script directly from a Linux terminal as follows:
python dimred.py
The output of the script is a zipped csv file containing two columns storing the calculated principal components for each molecular dataset together with an additional column with the corresponding labels for the molecular conformations taken from the original datasets.