Contains the notebooks Prompting.ipynb
RAG_new.ipynb
used for creating datasets.
The test sets are test_metal_hydride_250_gemma2_9B.xlsx
and test_metal_hydride_250_llama3_8B.xlsx
.
Final dataset is final_extracted_1611.xlsx
A subset of the generated dataset using this framework showing hydrogen storage properties of various alloys and compounds
extracted from the paper’s abstracts.
The paper is available here link.
Open RAG_new.ipynb
in Google Colab.
Connect to T4 GPU.
After installing the libraries open the terminal.
#open the terminal
!pip install colab-xterm
%load_ext colabxterm
%xterm
Run the following commands in the terminal to load the LLM.
curl -fsSL https://ollama.com/install.sh | sh
ollama serve & ollama pull llama3
ollama serve & ollama run llama3
Load the file containing the abstracts.
selected_df = pd.read_csv("selected_df.xls")
Define the query.
query = """
Describe all the parameters of the material discussed in the text.
If no information is available just write "N/A".
The output should be concise and in the format as below:
Name of Alloy :
Hydrogen storage capacity :
Temperature :
Pressure :
Experimental Conditions :
"""
Follow the notebook to extract the data.
Open Prompting.ipynb
in Google Colab
Load the LLM.
model, tokenizer = FastLanguageModel.from_pretrained(
model_name = "unsloth/llama-3-8b-bnb-4bit",
max_seq_length = max_seq_length,
dtype = dtype,
load_in_4bit = load_in_4bit,
)
Use this prompt or modify it.
text_instruct = """
Describe all the parameters of the material discussed in the text.
If no information is available just write "N/A".
The output should be concise and in the format as below:
Name of Alloy :
Hydrogen storage capacity :
Temperature :
Pressure :
Experimental Conditions :
"""
Load the file containing the abstracts.
selected_df = pd.read_csv("selected_df.xls")
Follow the notebook to extract the specified data.
We used some of the alloys extracted from the dataset to see whether they can be used for ML models.
For this purpose we used HYST Paper
The predictions of the models are provided in Dataset_testing_HYST.csv