Skip to content

[CI Failure]: Language Models Test (Extended Pooling) #20461

Closed
@DarkLight1337

Description

@DarkLight1337

Name of failing test

See below

Basic information

  • Flaky test
  • Can reproduce locally
  • Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

Remaining failures:

FAILED models/language/pooling/test_scoring.py::test_cross_encoder_1_to_1[cross-encoder/ms-marco-MiniLM-L-6-v2] - assert 9.265625 == 1.0 ± 1.0e-02
  comparison failed
  Obtained: 9.265625
  Expected: 1.0 ± 1.0e-02
FAILED models/language/pooling/test_scoring.py::test_cross_encoder_1_to_N[cross-encoder/ms-marco-MiniLM-L-6-v2] - assert 9.265625 == 1.0 ± 1.0e-02
  comparison failed
  Obtained: 9.265625
  Expected: 1.0 ± 1.0e-02
FAILED models/language/pooling/test_scoring.py::test_cross_encoder_N_to_N[cross-encoder/ms-marco-MiniLM-L-6-v2] - assert 9.265625 == 1.0 ± 1.0e-02
  comparison failed
  Obtained: 9.265625
  Expected: 1.0 ± 1.0e-02

Fixed by #20168:

FAILED models/language/pooling/test_embedding.py::test_models[False-sentence-transformers/all-MiniLM-L12-v2] - pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
  Value error, User-specified max_model_len (512) is greater than the derived max_model_len (max_position_embeddings=128 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
FAILED models/language/pooling/test_embedding.py::test_models[False-sentence-transformers/stsb-roberta-base-v2] - pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
  Value error, User-specified max_model_len (512) is greater than the derived max_model_len (max_position_embeddings=75 or model_max_length=None in model's config.json). This may lead to incorrect model outputs or CUDA errors. To allow overriding this maximum, set the env var VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error
FAILED models/language/pooling/test_gte.py::test_embed_models_mteb[model_info9] - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED models/language/pooling/test_gte.py::test_embed_models_mteb[model_info10] - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED models/language/pooling/test_gte.py::test_embed_models_correctness[model_info9] - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
FAILED models/language/pooling/test_gte.py::test_embed_models_correctness[model_info10] - RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

📝 History of failing test

Failing since 27th June, e.g. https://buildkite.com/organizations/vllm/analytics/suites/ci-1/tests/7c3bdcad-8f70-86e6-b83a-d0f0ab07fd71?period=7days&tags=scm.branch%3Amain

CC List.

@noooop can you take a look at this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci-failureIssue about an unexpected test failure in CI

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions