-
Notifications
You must be signed in to change notification settings - Fork 327
Description
I am reporting a persistent and difficult-to-debug issue where helm-run fails with a ValueError: tokenizer_name is None when adding a new, standard Hugging Face model.
After a very lengthy debugging session, we have concluded that this is not a user configuration error but rather an unexpected behavior in how HELM handles the prod_env directory, which seems to be auto-created and then overrides otherwise correct configurations.
The Problem in Detail
When a new model is correctly defined in src/helm/config/model_deployments.yaml, running helm-run still fails. The traceback points to the WindowService receiving a None value for tokenizer_name.
The logs show that even if the prod_env directory is deleted or renamed beforehand, the helm-run script appears to find or recreate it, and then enters a "local mode with base path: prod_env". This action seems to invalidate the configurations loaded from src/helm/config/.
Environment
OS: v100 machine
Python Version: 3.10
HELM Installation Method: https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct
Steps to Reproduce
Start with a clean clone of the crfm-helm repository.
Bash
git clone https://github.com/stanford-crfm/helm.git
cd helm
Set up the environment (e.g., pip install -e .).
Ensure no prod_env directory exists.
Bash
rm -rf prod_env
Add the following complete and correct model deployment configuration to the end of src/helm/config/model_deployments.yaml:
YAML
- name: meta-llama/Llama-4-Scout-17B-16E-Instruct
model: meta-llama/Llama-4-Scout-17B-16E-Instruct
tokenizer_name: meta-llama/Llama-4-Scout-17B-16E-Instruct
client_spec:
class_name: helm.proxy.clients.huggingface_client.HuggingFaceClient
args:
pretrained_model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
trust_remote_code: true
Use any standard run spec, for example, by creating a file named my_eval.conf with the following content:
entries: [
{description: "MMLU test", priority: 1, groups: ["mmlu"]},
]
Execute the helm-run command:
Bash
helm-run --conf-paths my_eval.conf --suite my_suite --models-to-run meta-llama/Llama-4-Scout-17B-16E-Instruct --max-eval-instances 1
Expected Behavior
The helm-run command should start successfully, load the specified model, and run the evaluation without configuration errors.
Actual Behavior (The Bug)
The command fails with the ValueError: tokenizer_name is None traceback. Crucially, a prod_env directory is observed to be present during or after the run, and the log shows Running in local mode with base path: prod_env.
(Please paste the full traceback from your failed helm-run command here)
Evidence and Diagnostics (The "Smoking Gun")
To prove that the YAML configuration itself is loaded correctly by HELM's registry system, I ran a separate diagnostic script. This script successfully loads the configuration and finds the correct tokenizer_name. This proves the configuration file is correct, but the information is lost or ignored later in the helm-run process.
Output of the successful diagnostic script:
-Starting Final HELM Debug Script -
Here is the FINAL configuration HELM actually sees for your model:
{'name': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'client_spec': ClientSpec(class_name='helm.proxy.clients.huggingface_client.HuggingFaceClient', args={'pretrained_model_name_or_path': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'trust_remote_code': True}), 'model_name': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'tokenizer_name': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'window_service_spec': None, 'max_sequence_length': None, 'max_request_length': None, 'max_sequence_and_generated_tokens_length': None, 'deprecated': False}
Confirmed Workaround
The only way to make the run succeed is to manually disable the prod_env directory right before running the command:
Bash
mv prod_env prod_env_disabled
helm-run ... # This now works correctly
This confirms the issue is tied to the prod_env override logic.
Thank you for looking into this complex issue.
(helm) junyao@goofy-1:~/helm$ cat src/helm/config/model_deployments.yaml
model_deployments:
- name: meta-llama/Llama-4-Scout-17B-16E-Instruct
model: meta-llama/Llama-4-Scout-17B-16E-Instruct
tokenizer_name: meta-llama/Llama-4-Scout-17B-16E-Instruct
client_spec:
class_name: helm.proxy.clients.huggingface_client.HuggingFaceClient
args:
pretrained_model_name_or_path: meta-llama/Llama-4-Scout-17B-16E-Instruct
trust_remote_code: true
window_service_spec:
class_name: helm.benchmark.window_services.local_window_service.LocalWindowService
args:
tokenizer_name: meta-llama/Llama-4-Scout-17B-16E-Instruct
(helm) junyao@goofy-1:/helm$ cat src/helm/benchmark/run_specs/my_llama4_eval.conf/helm$ cat src/helm/config/tokenizer_configs.yaml
entries: [
{
name: "mmlu:computer_security",
description: "mmlu:subject=computer_security,method=multiple_choice_joint,model=meta-llama/Llama-4-Scout-17B-16E-Instruct",
priority: 1,
scenario_spec: {
class_name: "helm.benchmark.scenarios.mmlu_scenario.MMLUScenario",
args: {
subject: "computer_security"
}
},
adapter_spec: {
method: "multiple_choice_joint",
model: "meta-llama/Llama-4-Scout-17B-16E-Instruct",
model_deployment: "meta-llama/Llama-4-Scout-17B-16E-Instruct"
},
metric_specs: [
{
class_name: "helm.benchmark.metrics.basic_metrics.BasicGenerationMetric",
args: {
names: ["exact_match", "quasi_exact_match", "prefix_exact_match", "quasi_prefix_exact_match"]
}
},
{
class_name: "helm.benchmark.metrics.basic_metrics.BasicReferenceMetric"
},
{
class_name: "helm.benchmark.metrics.basic_metrics.InstancesPerSplitMetric"
}
]
}
]
(helm) junyao@goofy-1:
tokenizer_configs: - name: meta-llama/Llama-4-Scout-17B-16E-Instruct
tokenizer: meta-llama/Llama-4-Scout-17B-16E-Instruct
tokenizer_spec:
class_name: AutoTokenizer
tokenizer_name: meta-llama/Llama-4-Scout-17B-16E-Instruct
(helm) junyao@goofy-1:~/helm$ cat src/helm/config/model_metadata.yaml
models:
- name: meta-llama/Llama-4-Scout-17B-16E-Instruct
display_name: Llama‑4 Scout 17B
creator_organization_name: Meta
description: |
Instruction‑tuned Llama‑4 Scout 17B model (16 experts), multimodal (text+image input, text output).
access: gated
release_date: 2024-01-30
license: Llama 4 Community License
tags:- TEXT_MODEL_TAG
- INSTRUCTION_FOLLOWING_MODEL_TAG
tokenizer_name: meta-llama/Llama-4-Scout-17B-16E-Instruct
(helm) junyao@goofy-1:~/helm$