Issue with Qwen2.5 models

Hello,

My model got the following error on the minerboard while evaluating (uid 22, hotkey 5EF2Fpn2VkZ5zADRLiTMhUdBcmNTV9Jq3UombtNZQeTHQCd5):

```
inference_score_error with message: Traceback (most recent call last):
  File "/app/entrypoint.py", line 36, in _run
    result = get_inference_score(request, use_lora=False)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scoring/inference_score.py", line 54, in get_inference_score
    judge_result = get_judge_score(request, model, verbose=False, use_lora=use_lora)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scoring/judge_score.py", line 292, in get_judge_score
    raise e
  File "/app/scoring/judge_score.py", line 216, in get_judge_score
    judge_dataset = StreamedSyntheticPartialDataset(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scoring/dataset.py", line 605, in __init__
    self.dataset = self.process_data(data, max_input_len)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/scoring/dataset.py", line 628, in process_data
    input_len_so_far += len(encoding.encode(chat_message["content"]))
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/tiktoken/core.py", line 117, in encode
    raise_disallowed_special_token(match.group())
  File "/opt/conda/lib/python3.11/site-packages/tiktoken/core.py", line 400, in raise_disallowed_special_token
    raise ValueError(
ValueError: Encountered text corresponding to disallowed special token '<|endoftext|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endoftext|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.

 Encountered text corresponding to disallowed special token '<|endoftext|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endoftext|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.
```

I investigated the issue and found one record in the dataset with this token `<|endoftext|>`:

```
{"content": " *smiles warmly, pride in your eyes* Indeed, Euretha. *nods, tapping chin thoughtfully* You've got guts and determination, a rare combination. *looks back at the course, hands on hips* But it's not just about passing the test. *turns to face you* It's about learning from it. *gestures to the course* You've faced your fears, adapted, and overcome. *grins* You've got the makings of an Iron Fist.\n\n<|endoftext|>\n\nThe Assistant provides a concise and engaging user dialogue that flows naturally from the conversation history. The dialogue highlights the user's accomplishments and reinforces their potential to become an Iron Fist. The Assistant maintains a playful and casual tone while staying true to the character's personality. The dialogue does not repeat previous dialogue or ramble and uses new vocabulary and sentence structures.\n\nThe Assistant adheres to the guidelines by:\n- Providing concise and engaging user dialogue\n- Using internet RP style, italicizing actions, and avoiding quotation marks\n- Staying true to the character's personality and the conversation history while progressing the plot forward\n- Introducing a new plot point (the user's potential to become an Iron Fist)\n- Not repeating previously said dialogue or rambling\n- Using new vocabulary and sentence structures\n\nPlease note that the Assistant's response is based on the provided conversation history and guidelines. The Assistant will generate character dialogues based on the given context, and it does not have any prior knowledge or information beyond that.", "role": "assistant"}
```

Could you please re-evaluate my model (because it is not my fault) and fix the issue by removing this record from the dataset or somehow else?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issue with Qwen2.5 models #145

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue with Qwen2.5 models #145

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions