Embrace the world of large language models!

Description

This repository stores the source code and data for the paper 'A Prompt-Engineered Large Language Model, Deep Learning Workflow for Materials Classification' published in Materials Today.
arXiv link: https://arxiv.org/abs/2401.17788
Materials Today link: https://www.sciencedirect.com/science/article/abs/pii/S1369702124002001?via%3Dihub

There are five folders here, namely the metallic glasses database source data folder, the large language model folder, the classification model folder, the model interpretation and visualization folder and the supplemetary_data_for_revision folder.

Here are some steps for setting up the configuration.

Step1: Configure Python environment and libraries

All code is recommended to run in a Python virtual environment.

If you have not installed Python before, it is recommended to follow the following link for installation: Anaconda Installation

To create and activate a new conda environment, use the following command:

conda create --name bmg python=3.10
conda activate bmg

Then please use the following code to install the required Python packages:

pip install -r requirements.txt

Step2: Register Gemini API from Google

If you also want to generate text data through Gemini, please apply for a free API from Google Dev first.

Then copy and paste it to the .env file in llm folder:

GOOGLE_API_KEY='xxxxx'

Step3: Download Huggingface Pre-trained Models

Our classification model is fine tuned from pre-trained models. So if you want to repeat the training process by yourself, at least you need to obtain the model files.

You can directly load the model according to the official guide.

In case you want to download a pre-trained model from Huggingface, use the following command:

from huggingface_hub import snapshot_download
snapshot_download(repo_id="xxx", local_dir="xxx")

Replace repo_id and local_dir with the name of the model you want to download and the folder you want to store.

repo_id of MatSciBERT: m3rg-iitd/matscibert
repo_id of Longformer: allenai/longformer-base-4096
repo_id of BERT: bert-base-cased

If you just want to do inference with MgBERT, you can use the model weights file in the checkpoint folder:

cd classification_models/different_BERT/checkpoint

and load it with the inference_template file in interpretability_and_visualization folder.

Quick experience way:

Single composition:
1. Download MgBERT weight to root dir: https://figshare.com/articles/software/MgBERT_pth/26879239
2. Configure the environment as shown in Step 1.
3. Run all blocks in MgBERT_LLM_Classification_for_Materials_Science/single_inference_test.ipynb.
4. (Optional) in this case we test the result of composition Mg59.5Cu22.9Ag6.6Gd11. If you want to test other composition, just replace the content in MgBERT_LLM_Classification_for_Materials_Science/test.txt by AI generated description through our prompt template.
Multiple compositions:
1. Download MgBERT weight to root dir: https://figshare.com/articles/software/MgBERT_pth/26879239
2. Configure the environment as shown in Step 1.
3. Run all blocks in MgBERT_LLM_Classification_for_Materials_Science/multiple_inferences.ipynb.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Embrace the world of large language models!

Description

Step1: Configure Python environment and libraries

Step2: Register Gemini API from Google

Step3: Download Huggingface Pre-trained Models

Quick experience way:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
classification_models		classification_models
interpretability_and_visualization		interpretability_and_visualization
llm		llm
matscibert		matscibert
original_data		original_data
supplemetary_data_for_revision		supplemetary_data_for_revision
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
multi_label_examples.csv		multi_label_examples.csv
multiple_inferences.ipynb		multiple_inferences.ipynb
requirements.txt		requirements.txt
single_inference_test.ipynb		single_inference_test.ipynb
test.txt		test.txt
vocab_mappings.txt		vocab_mappings.txt

License

Grenzlinie/MgBERT_LLM_Classification_for_Materials_Science

Folders and files

Latest commit

History

Repository files navigation

Embrace the world of large language models!

Description

Step1: Configure Python environment and libraries

Step2: Register Gemini API from Google

Step3: Download Huggingface Pre-trained Models

Quick experience way:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages