-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Environment
pip install vllm==0.11.0
pip install PyMuPDF img2pdf einops easydict addict Pillow
pip install flash_attn==2.8.1 --no-build-isolationIn vLLM 0.11.0, the legacy v0 engine (AsyncLLMEngine/LLMEngine) has been internally redirected to the new v1 engine:
# async_llm_engine.py & llm_engine.py
from vllm.v1.engine.async_llm import AsyncLLM
AsyncLLMEngine = AsyncLLM # type: ignoreAdditionally, per-request logits processors are no longer supported directly. Instead, latest vLLM introduces global logit processors via the new AdapterLogitsProcessor interface (vllm.v1.sample.logits_processor.AdapterLogitsProcessor), which allows adaptation of per-request logic.
To enable DeepSeek-OCR compatibility with vLLM 0.11.0, the following changes are required:
1. Add v1-Compatible Logits Processor Adapter
Create ngram_norepeat_v1_adapter.py in DeepSeek-OCR-vllm/process/:
from .ngram_norepeat import NoRepeatNGramLogitsProcessor
from vllm.v1.sample.logits_processor import AdapterLogitsProcessor
class NoRepeatNGramAdaptor(AdapterLogitsProcessor):
def is_argmax_invariant(self) -> bool:
return True
def new_req_logits_processor(self, params):
return NoRepeatNGramLogitsProcessor(
ngram_size=params.extra_args["ngram_size"],
window_size=params.extra_args["window_size"],
whitelist_token_ids=params.extra_args["whitelist_token_ids"],
)2. Update DeepSeek-OCR Core Code
a. Handle v0/v1 SamplingMetadata import (line 14 in deepseek_ocr.py)
try:
from vllm.model_executor import SamplingMetadata
except ImportError:
from vllm.v1.sample.metadata import SamplingMetadatab. Update _call_hf_processor signatures (lines ~154 and ~231)
Both instances should accept **kwargs to align with v1’s tokenizer kwargs handling:
def _call_hf_processor(
self,
prompt: str,
mm_data: Mapping[str, object],
mm_kwargs: Mapping[str, object],
**kwargs, # tokenizer kwargs in v1
) -> BatchFeature:
...c. Propagate **kwargs in _cached_apply_hf_processor (lines ~231–254)
Ensure kwargs are forwarded in the overridden caching method:
def _cached_apply_hf_processor(
self,
prompt: Union[str, list[int]],
mm_data_items: MultiModalDataItems,
hf_processor_mm_kwargs: Mapping[str, object],
**kwargs # forward to underlying processor
) -> tuple[list[int], MultiModalKwargs, bool]:
if mm_data_items.get_count("image", strict=False) > 2:
return self._apply_hf_processor_main(
prompt=prompt,
mm_items=mm_data_items,
hf_processor_mm_kwargs=hf_processor_mm_kwargs,
enable_hf_prompt_update=True,
**kwargs
)
return super()._cached_apply_hf_processor(
prompt=prompt,
mm_data_items=mm_data_items,
hf_processor_mm_kwargs=hf_processor_mm_kwargs,
**kwargs
)3. Update Inference Script for v1 Engine
In run_dpsk_ocr_image.py:
a. Enable v1 engine explicitly (optional but recommended):
import os
os.environ['VLLM_USE_V1'] = '1'b. Initialize engine with v1-compatible logits processor:
engine_args = AsyncEngineArgs(
model=MODEL_PATH,
hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
block_size=256,
max_model_len=8192,
enforce_eager=False,
trust_remote_code=True,
tensor_parallel_size=1,
gpu_memory_utilization=0.75,
logits_processors=["process.ngram_norepeat_v1_adapter:NoRepeatNGramAdaptor"],
)
engine = AsyncLLMEngine.from_engine_args(engine_args)c. Pass n-gram parameters via extra_args in SamplingParams:
sampling_params = SamplingParams(
temperature=0.0,
max_tokens=8192,
skip_special_tokens=False,
extra_args={
"ngram_size": 30,
"window_size": 90,
"whitelist_token_ids": {128821, 128822}
}
)Important: The OCR processor must not be invoked before the vLLM engine is fully initialized. Specifically, avoid calling
DeepseekOCRProcessor().tokenize_with_images(...)before engine startup. This issue does not occur with the v0 engine.
Instead, ensure engine initialization completes first, then process the image:
engine = AsyncLLMEngine.from_engine_args(engine_args)
# After engine is created
if '<image>' in PROMPT:
image_features = DeepseekOCRProcessor().tokenize_with_images(
images=[image], bos=True, eos=True, cropping=CROP_MODE
)
else:
image_features = ''