Machine learning for phase prediction in HEAs
Welcome to the repository for our High Entropy Alloys (HEAs) dataset! This manual will guide you through the organization and structure of the repository, as well as provide essential information on the dataset's content and processing.
Folder Structure The repository is organized into several folders, each serving a specific purpose. Let's start by exploring the Dataset folder, which is central to our project.
Inside the Dataset folder, you will find the following key files and information:
Dataset_HEAs.xlsx: This Excel file serves as the initial repository for our HEA data. It contains a compilation of information gathered from various known and available literature sources.
dataset11252_79.csv.csv: After consolidating data from the Excel file, we've created this CSV file, which now holds a total of 11,252 records of HEAs. These records represent a comprehensive collection of HEA data from diverse sources.
Data Cleaning Process Our dataset has undergone rigorous cleaning to ensure its quality and reliability. The cleaning process consists of multiple stages, with each stage aimed at refining the dataset. Here's a summary of the cleaning process:
First Stage Cleaning (as explained in the article): In this initial cleaning stage, we applied specific techniques to enhance data quality. As a result, the dataset was refined, reducing the number of records to 10,387. The Excel file after the first step is stored in the repository under the name of dataset10387_70.csv
Subsequent Stages (Implemented in Python Script): Following the initial cleaning, further data refinement steps were performed using a Python script. These additional stages addressed various aspects of data quality and consistency. After completing these stages, our dataset now contains 5,677 high-quality records of HEAs.