Skip to content

[Bug]: Whisper not working on 0.9.2 docker image #20671

Open
@andrePankraz

Description

@andrePankraz

Your current environment

Docker image 0.9.2 on NVidia L40S.
(The docker image has to be modified, because librosa dependency is missing.)

services:
  vllm-whisper-large-v3:
    # Must modify image for <= v0.9.2
    # ImportError: Please install vllm[audio] for audio support
    # image: vllm/vllm-openai:v0.9.2
    image: vllm/vllm-openai-audio:v0.9.2
    build:
      context: .
    container_name: vllm-whisper-large-v3
    environment:
      - HF_TOKEN=$HF_TOKEN
      - VLLM_NO_USAGE_STATS=1
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['2']
              capabilities: [ gpu ]
    network_mode: host
    volumes:
      - /mnt/sda/huggingface:/root/.cache/huggingface
      - .:/opt/vllm
    command:
      - --port=8006
      - --disable-log-requests
      - --model=openai/whisper-large-v3
      - --gpu-memory-utilization=0.40
      - --swap-space=5
    restart: unless-stopped
# Use the base vLLM image
FROM vllm/vllm-openai:v0.9.2

RUN apt-get update \
    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
    libsndfile1 \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/*

RUN pip install --upgrade --no-cache-dir \
    #        "git+https://github.com/huggingface/transformers.git" \
    "librosa>=0.10,<0.11"

🐛 Describe the bug

  • Cannot start image, see log below.
  • Whisper worked on 0.9.1 (regression)
  • Would also be nice to add librosa to standard image - it doesnt make the image any larger (in relation to >20 GB...)

Log:

$ docker logs -f vllm-whisper-large-v3
INFO 07-09 02:08:41 [__init__.py:244] Automatically detected platform cuda.
INFO 07-09 02:08:44 [api_server.py:1395] vLLM API server version 0.9.2
INFO 07-09 02:08:44 [cli_args.py:325] non-default args: {'port': 8006, 'model': 'openai/whisper-large-v3', 'gpu_memory_utilization': 0.4, 'swap_space': 5.0, 'disable_log_requests': True}
INFO 07-09 02:08:49 [config.py:841] This model supports multiple tasks: {'generate', 'classify', 'reward', 'embed', 'transcription'}. Defaulting to 'transcription'.
INFO 07-09 02:08:49 [config.py:1472] Using max model len 448
WARNING 07-09 02:08:49 [arg_utils.py:1735] ['WhisperForConditionalGeneration'] is not supported by the V1 Engine. Falling back to V0.
INFO 07-09 02:08:50 [api_server.py:268] Started engine process with PID 266
INFO 07-09 02:08:53 [__init__.py:244] Automatically detected platform cuda.
INFO 07-09 02:08:54 [llm_engine.py:230] Initializing a V0 LLM engine (v0.9.2) with config: model='openai/whisper-large-v3', speculative_config=None, tokenizer='openai/whisper-large-v3', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=448, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=openai/whisper-large-v3, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":256,"local_cache_dir":null}, use_cached_outputs=True,
INFO 07-09 02:08:55 [cuda.py:363] Using Flash Attention backend.
INFO 07-09 02:08:55 [parallel_state.py:1076] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
INFO 07-09 02:08:55 [model_runner.py:1171] Starting to load model openai/whisper-large-v3...
INFO 07-09 02:08:56 [weight_utils.py:292] Using model weights format ['*.safetensors']
INFO 07-09 02:08:57 [weight_utils.py:345] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/3 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  33% Completed | 1/3 [00:03<00:07,  3.90s/it]
Loading safetensors checkpoint shards:  67% Completed | 2/3 [00:10<00:05,  5.75s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:12<00:00,  3.98s/it]
Loading safetensors checkpoint shards: 100% Completed | 3/3 [00:12<00:00,  4.27s/it]

INFO 07-09 02:09:09 [default_loader.py:272] Loading weights took 12.84 seconds
INFO 07-09 02:09:10 [model_runner.py:1203] Model loading took 2.8764 GiB and 13.830811 seconds
Process SpawnProcess-1:
ERROR 07-09 02:09:12 [engine.py:458] Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.
ERROR 07-09 02:09:12 [engine.py:458] Traceback (most recent call last):
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
ERROR 07-09 02:09:12 [engine.py:458]     engine = MQLLMEngine.from_vllm_config(
ERROR 07-09 02:09:12 [engine.py:458]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
ERROR 07-09 02:09:12 [engine.py:458]     return cls(
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
ERROR 07-09 02:09:12 [engine.py:458]     self.engine = LLMEngine(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 268, in __init__
ERROR 07-09 02:09:12 [engine.py:458]     self._initialize_kv_caches()
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 413, in _initialize_kv_caches
ERROR 07-09 02:09:12 [engine.py:458]     self.model_executor.determine_num_available_blocks())
ERROR 07-09 02:09:12 [engine.py:458]     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 104, in determine_num_available_blocks
ERROR 07-09 02:09:12 [engine.py:458]     results = self.collective_rpc("determine_num_available_blocks")
ERROR 07-09 02:09:12 [engine.py:458]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-09 02:09:12 [engine.py:458]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-09 02:09:12 [engine.py:458]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
ERROR 07-09 02:09:12 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-09 02:09:12 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 256, in determine_num_available_blocks
ERROR 07-09 02:09:12 [engine.py:458]     self.model_runner.profile_run()
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 07-09 02:09:12 [engine.py:458]     return func(*args, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/worker/enc_dec_model_runner.py", line 312, in profile_run
ERROR 07-09 02:09:12 [engine.py:458]     max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
ERROR 07-09 02:09:12 [engine.py:458]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 183, in get_max_multimodal_tokens
ERROR 07-09 02:09:12 [engine.py:458]     return sum(self.get_max_tokens_by_modality(model_config).values())
ERROR 07-09 02:09:12 [engine.py:458]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 170, in get_max_tokens_by_modality
ERROR 07-09 02:09:12 [engine.py:458]     mm_limits = self.get_mm_limits_per_prompt(model_config)
ERROR 07-09 02:09:12 [engine.py:458]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 206, in get_mm_limits_per_prompt
ERROR 07-09 02:09:12 [engine.py:458]     processor = self.create_processor(model_config, disable_cache=False)
ERROR 07-09 02:09:12 [engine.py:458]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 281, in create_processor
ERROR 07-09 02:09:12 [engine.py:458]     return factories.build_processor(ctx, cache=cache)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 88, in build_processor
ERROR 07-09 02:09:12 [engine.py:458]     return self.processor(info, dummy_inputs_builder, cache=cache)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing.py", line 1152, in __init__
ERROR 07-09 02:09:12 [engine.py:458]     self.data_parser = self._get_data_parser()
ERROR 07-09 02:09:12 [engine.py:458]                        ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 680, in _get_data_parser
ERROR 07-09 02:09:12 [engine.py:458]     feature_extractor = self.info.get_feature_extractor()
ERROR 07-09 02:09:12 [engine.py:458]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 643, in get_feature_extractor
ERROR 07-09 02:09:12 [engine.py:458]     hf_processor = self.get_hf_processor()
ERROR 07-09 02:09:12 [engine.py:458]                    ^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 637, in get_hf_processor
ERROR 07-09 02:09:12 [engine.py:458]     return self.ctx.get_hf_processor(WhisperProcessor)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 138, in get_hf_processor
ERROR 07-09 02:09:12 [engine.py:458]     return super().get_hf_processor(
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 96, in get_hf_processor
ERROR 07-09 02:09:12 [engine.py:458]     return cached_processor_from_config(
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 110, in cached_processor_from_config
ERROR 07-09 02:09:12 [engine.py:458]     return cached_get_processor(
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 72, in get_processor
ERROR 07-09 02:09:12 [engine.py:458]     processor = processor_factory.from_pretrained(
ERROR 07-09 02:09:12 [engine.py:458]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1308, in from_pretrained
ERROR 07-09 02:09:12 [engine.py:458]     return cls.from_args_and_dict(args, processor_dict, **kwargs)
ERROR 07-09 02:09:12 [engine.py:458]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1109, in from_args_and_dict
ERROR 07-09 02:09:12 [engine.py:458]     processor = cls(*args, **valid_kwargs)
ERROR 07-09 02:09:12 [engine.py:458]                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/transformers/models/whisper/processing_whisper.py", line 41, in __init__
ERROR 07-09 02:09:12 [engine.py:458]     super().__init__(feature_extractor, tokenizer)
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 551, in __init__
ERROR 07-09 02:09:12 [engine.py:458]     self.check_argument_for_proper_class(attribute_name, arg)
ERROR 07-09 02:09:12 [engine.py:458]   File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 569, in check_argument_for_proper_class
ERROR 07-09 02:09:12 [engine.py:458]     raise TypeError(
ERROR 07-09 02:09:12 [engine.py:458] TypeError: Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 460, in run_mp_engine
    raise e from None
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 446, in run_mp_engine
    engine = MQLLMEngine.from_vllm_config(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 133, in from_vllm_config
    return cls(
           ^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 87, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 268, in __init__
    self._initialize_kv_caches()
  File "/usr/local/lib/python3.12/dist-packages/vllm/engine/llm_engine.py", line 413, in _initialize_kv_caches
    self.model_executor.determine_num_available_blocks())
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 104, in determine_num_available_blocks
    results = self.collective_rpc("determine_num_available_blocks")
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/worker/worker.py", line 256, in determine_num_available_blocks
    self.model_runner.profile_run()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/worker/enc_dec_model_runner.py", line 312, in profile_run
    max_mm_tokens = self.mm_registry.get_max_multimodal_tokens(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 183, in get_max_multimodal_tokens
    return sum(self.get_max_tokens_by_modality(model_config).values())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 170, in get_max_tokens_by_modality
    mm_limits = self.get_mm_limits_per_prompt(model_config)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 206, in get_mm_limits_per_prompt
    processor = self.create_processor(model_config, disable_cache=False)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 281, in create_processor
    return factories.build_processor(ctx, cache=cache)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/registry.py", line 88, in build_processor
    return self.processor(info, dummy_inputs_builder, cache=cache)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/multimodal/processing.py", line 1152, in __init__
    self.data_parser = self._get_data_parser()
                       ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 680, in _get_data_parser
    feature_extractor = self.info.get_feature_extractor()
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 643, in get_feature_extractor
    hf_processor = self.get_hf_processor()
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/whisper.py", line 637, in get_hf_processor
    return self.ctx.get_hf_processor(WhisperProcessor)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 138, in get_hf_processor
    return super().get_hf_processor(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/inputs/registry.py", line 96, in get_hf_processor
    return cached_processor_from_config(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 110, in cached_processor_from_config
    return cached_get_processor(
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/transformers_utils/processor.py", line 72, in get_processor
    processor = processor_factory.from_pretrained(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1308, in from_pretrained
    return cls.from_args_and_dict(args, processor_dict, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 1109, in from_args_and_dict
    processor = cls(*args, **valid_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/whisper/processing_whisper.py", line 41, in __init__
    super().__init__(feature_extractor, tokenizer)
  File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 551, in __init__
    self.check_argument_for_proper_class(attribute_name, arg)
  File "/usr/local/lib/python3.12/dist-packages/transformers/processing_utils.py", line 569, in check_argument_for_proper_class
    raise TypeError(
TypeError: Received a CachedWhisperTokenizerFast for argument tokenizer, but a WhisperTokenizer was expected.
[rank0]:[W709 02:09:13.530824767 ProcessGroupNCCL.cpp:1476] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1495, in <module>
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1431, in run_server
    await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1451, in run_server_worker
    async with build_async_engine_client(args, client_config) as engine_client:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 291, in build_async_engine_client_from_engine_args
    raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions