USAGE

Textual

Code Evaluate

The data after evaluating n times will be placed in the "dataset_ntimes" folder, organized by model name, with the structure as follows:

dataset_ntimes/
├── model_1/
│   ├── model_1_nt.json
└── model_2/
    ├── model_2_nt.json
└── ...

STEP 1

Modify the dir_path=path/to/dataset_ntimes in 1_Split_filename.py and run the file to split different task files based on the "filename" field.

STEP 2

Modify the file_dir=path/to/dataset_ntimes in 2_Extract.py and run the file to extract the answers from the model responses and generate JSONL files, which will be placed in the result_ntimes folder with the following structure:

result_ntimes/
├── model_1/
│   ├── 0shot/
│   │   ├── 1t/
│   │   │   ├── task_dir_1/
│   │   │   │   ├──task_1.json
│   │   │   │   ├──task_1.jsonl
│   │   │   │   ├──task_2.json
│   │   │   │   ├──task_2.jsonl
│   │   │   │   ├──...
│   │   ├── 2t/
│   │   │   ├── task_dir_1/
│   │   │   │   ├──task_1.json
│   │   │   │   ├──task_1.jsonl
│   │   │   │   ├──...
│   ├── 3shot/
│   │   ├──...
└── ...

STEP 3

Modify the root_path=path/to/result_ntimes in 3_Evaluate.py and run the file to obtain the evaluation results.

LLM Evaluate

STEP 1

Modify the folder_path and output_file in 1_prompt_chem.py, then run the file to submit for LLM evaluation.

STEP 2

Modify the file_path in 2_L1_task_eval.py and run to obtain metrics for multiple-choice, true/false, fill-in-the-blank, short answer, and calculation tasks. The results will be displayed in an Excel file.

STEP 3

Modify the foler_path and excel_path in 3_other_task_eval.py,then run the file to obtain metrics for abstract writing, outlining, reaction intermediates, single-step synthesis, multi-step synthesis, and physicochemical property tasks. The results will be displayed in an Excel file.

Multimodal

STEP 1

Run 1_prompt_chem to obtain the evaluation data from the LLM.

STEP 2

Run 2_LLM_evaluate to obtain the evaluation results from the LLM.

STEP 3

Run 3_code_evaluate to obtain the evaluation results using code.

Data

https://huggingface.co/datasets/Ooo1/ChemEval

Licenses

The ChemEval dataset is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Citation

Please cite our paper if you use our dataset.

@article{huang2024chemeval,
  title={ChemEval: A Comprehensive Multi-Level Chemical Evaluation for Large Language Models},
  author={Huang, Yuqing and Zhang, Rongyang and He, Xuesong and Zhi, Xuyang and Wang, Hao and Li, Xin and Xu, Feiyang and Liu, Deguang and Liang, Huadong and Li, Yi and others},
  journal={arXiv preprint arXiv:2409.13989},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Multimodel		Multimodel
Textual		Textual
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

USAGE

Textual

Code Evaluate

STEP 1

STEP 2

STEP 3

LLM Evaluate

STEP 1

STEP 2

STEP 3

Multimodal

STEP 1

STEP 2

STEP 3

Data

Licenses

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

USTC-StarTeam/ChemEval

Folders and files

Latest commit

History

Repository files navigation

USAGE

Textual

Code Evaluate

STEP 1

STEP 2

STEP 3

LLM Evaluate

STEP 1

STEP 2

STEP 3

Multimodal

STEP 1

STEP 2

STEP 3

Data

Licenses

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages