Skip to content

evalscope 1.0版本 对于mmlu这种多个sub_list的评测集,会有一个默认值all,报错,且官网没有具体的使用方法 #876

@jingjingpiggy

Description

@jingjingpiggy

self = <evalscope.benchmarks.mmlu.mmlu_adapter.MMLUAdapter object at 0x7f6ce13b38f0>
load_func = functools.partial(<bound method DefaultDataAdapter.load_subset of <evalscope.benchmarks.mmlu.mmlu_adapter.MMLUAdapter object at 0x7f6ce13b38f0>>, data_loader=<class 'evalscope.api.dataset.loader.RemoteDataLoader'>)
is_fewshot = False
(Pdb) l
198 load_func (Callable[[str], Dataset]): Function to load individual subsets
199
200 Returns:
201 DatasetDict: Dictionary containing all loaded subsets
202 """
203 -> if self.reformat_subset:
204 # Load only the default subset
205 subset_data = load_func(self.default_subset)
206 # Reformat the subset to create multiple subsets based on sample keys
207 # NOTE: subset_list and limit is applied here if specified
208 limit = self.few_shot_num if is_fewshot else self.limit
(Pdb) n

/usr/local/lib/python3.12/site-packages/evalscope/api/benchmark/adapters/default_data_adapter.py(205)load_subsets()
-> subset_data = load_func(self.default_subset)
(Pdb) self.default_subset
'all'
(Pdb)
'all'
(Pdb) c
2025-10-15 20:21:24 - evalscope - INFO: Loading dataset modelscope/mmlu from modelscope > subset: all > split: test ...
Traceback (most recent call last):
File "/mnt/workspace/E2Eeval/eval_public.py", line 68, in
run_task(task_cfg=task_cfg)
File "/usr/local/lib/python3.12/site-packages/evalscope/run.py", line 28, in run_task
return run_single_task(task_cfg, run_time)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/run.py", line 41, in run_single_task
result = evaluate_model(task_cfg, outputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/run.py", line 144, in evaluate_model
res_dict = evaluator.eval()
^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/evaluator/evaluator.py", line 94, in eval
dataset_dict = self.benchmark.load_dataset()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/api/benchmark/adapters/default_data_adapter.py", line 62, in load_dataset
self.test_dataset, self.fewshot_dataset = self.load()
^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/api/benchmark/adapters/default_data_adapter.py", line 81, in load
return self.load_from_remote()
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/api/benchmark/adapters/default_data_adapter.py", line 93, in load_from_remote
test_dataset = self.load_subsets(test_load_func)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/api/benchmark/adapters/default_data_adapter.py", line 205, in load_subsets
subset_data = load_func(self.default_subset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/api/benchmark/adapters/default_data_adapter.py", line 252, in load_subset
dataset = loader.load()
^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/evalscope/api/dataset/loader.py", line 88, in load
dataset = MsDataset.load(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/modelscope/msdatasets/ms_dataset.py", line 308, in load
with load_dataset_with_ctx(
File "/usr/local/lib/python3.12/contextlib.py", line 137, in enter
return next(self.gen)
^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/modelscope/msdatasets/utils/hf_datasets_util.py", line 1475, in load_dataset_with_ctx
dataset_res = DatasetsWrapperHF.load_dataset(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/modelscope/msdatasets/utils/hf_datasets_util.py", line 991, in load_dataset
builder_instance = DatasetsWrapperHF.load_dataset_builder(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/modelscope/msdatasets/utils/hf_datasets_util.py", line 1164, in load_dataset_builder
builder_instance: DatasetBuilder = builder_cls(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 343, in init
self.config, self.config_id = self._create_builder_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/site-packages/datasets/builder.py", line 570, in _create_builder_config
raise ValueError(
ValueError: BuilderConfig 'all' not found. Available: ['high_school_european_history', 'business_ethics', 'clinical_knowledge', 'medical_genetics', 'high_school_us_history', 'high_school_physics', 'high_school_world_history', 'virology', 'high_school_microeconomics', 'econometrics', 'college_computer_science', 'high_school_biology', 'abstract_algebra', 'professional_accounting', 'philosophy', 'professional_medicine', 'nutrition', 'global_facts', 'machine_learning', 'security_studies', 'public_relations', 'professional_psychology', 'prehistory', 'anatomy', 'human_sexuality', 'college_medicine', 'high_school_government_and_politics', 'college_chemistry', 'logical_fallacies', 'high_school_geography', 'elementary_mathematics', 'human_aging', 'college_mathematics', 'high_school_psychology', 'formal_logic', 'high_school_statistics', 'international_law', 'high_school_mathematics', 'high_school_computer_science', 'conceptual_physics', 'miscellaneous', 'high_school_chemistry', 'marketing', 'professional_law', 'management', 'college_physics', 'jurisprudence', 'world_religions', 'sociology', 'us_foreign_policy', 'high_school_macroeconomics', 'computer_security', 'moral_scenarios', 'moral_disputes', 'electrical_engineering', 'astronomy', 'college_biology']

真不知道这个all是从哪里来了,为什么官网文档又不能很好维护,怎么用呢。。。

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions