Skip to content

cmmlu_en

ymcui edited this page Apr 29, 2024 · 3 revisions

CMMLU Inference Script

This project tests the related model effects on the CMMLU Evaluation Dataset, which includes 11K multiple-choice questions covering 67 subjects. The following will introduce the prediction method for the CMMLU dataset.

Data Preparation

Download the evaluation dataset from the CMMLU official specified path and unzip it to the data folder:

wget https://huggingface.co/datasets/haonan-li/cmmlu/resolve/main/cmmlu_v1_0_1.zip
unzip cmmlu_v1_0_1.zip -d data

Place the data folder under the scripts/cmmlu directory of this project.

Run the Prediction Script

Execute the following script:

model_path=path/to/llama-3-chinese
output_path=path/to/your_output_dir

cd scripts/cmmlu
python eval.py \
    --model_path ${model_path} \
    --few_shot False \
    --with_prompt True \
    --output_dir ${output_path} \
    --input_dir data

Parameter Explanation

  • model_path: Directory where the evaluation model is located (complete Llama-3-Chinese or Llama-3-Chinese-Instruct model, not LoRA)
  • few_shot: Whether to use few-shot
  • ntrain: When few_shot=True, specify the number of few-shot instances (5-shot: ntrain=5); not applicable when few_shot=False
  • with_prompt: Whether the model input includes instructions specifically for the Llama-3-Instruct model
  • n_times: Specify the number of repetitions for evaluation, which will create folders under output_dir for the specified number of times
  • load_in_4bit: Load the model in 4bit quantization
  • use_flash_attention_2: Use flash-attn2 for accelerated inference, otherwise use SDPA for acceleration.
  • output_dir: Specify the output path for the evaluation results
  • input_dir: Specify the path for the evaluation data

Evaluation Output

  • After the model prediction is complete, directories outputs/take* are generated, where * represents a number, ranging from 0 to n_times-1, storing the results of n_times decoding.

  • outputs/take* contains two JSON files, submission.json and summary.json.

  • submission.json is a file storing the model evaluation answers, formatted as:
{
    "arts": {
        "0": "A",
        "1": "B",
        ...
    },
    "nutrition": {
        "0": "B",
        "1": "A",
        ...
    },
    ...
}
  • summary.json includes the model's evaluation results under 67 topics, 5 major categories, and the overall average. For example, the All field in the JSON file will show the overall average performance:
  "All": {
    "score": 0.39984458642721465,
    "num": 11582,
    "correct": 4631.0
  }

where score is the accuracy, num is the total number of test samples, and correct is the number of correct answers.

Clone this wiki locally