Enable DeepSeek-OCR support in latest vLLM 0.11.0 (v1 Engine) with custom modifications

### Environment

```bash
pip install vllm==0.11.0
pip install PyMuPDF img2pdf einops easydict addict Pillow
pip install flash_attn==2.8.1 --no-build-isolation
```

In **vLLM 0.11.0**, the legacy `v0` engine (`AsyncLLMEngine`/`LLMEngine`) has been internally redirected to the new **v1 engine**:

```python
# async_llm_engine.py & llm_engine.py
from vllm.v1.engine.async_llm import AsyncLLM
AsyncLLMEngine = AsyncLLM  # type: ignore
```

Additionally, per-request logits processors are no longer supported directly. Instead, latest vLLM introduces **global logit processors** via the new `AdapterLogitsProcessor` interface (`vllm.v1.sample.logits_processor.AdapterLogitsProcessor`), which allows adaptation of per-request logic.

To enable **DeepSeek-OCR** compatibility with vLLM 0.11.0, the following changes are required:

---

### 1. Add v1-Compatible Logits Processor Adapter

Create `ngram_norepeat_v1_adapter.py` in `DeepSeek-OCR-vllm/process/`:

```python
from .ngram_norepeat import NoRepeatNGramLogitsProcessor
from vllm.v1.sample.logits_processor import AdapterLogitsProcessor


class NoRepeatNGramAdaptor(AdapterLogitsProcessor):
    def is_argmax_invariant(self) -> bool:
        return True

    def new_req_logits_processor(self, params):
        return NoRepeatNGramLogitsProcessor(
            ngram_size=params.extra_args["ngram_size"],
            window_size=params.extra_args["window_size"],
            whitelist_token_ids=params.extra_args["whitelist_token_ids"],
        )
```

---

### 2. Update DeepSeek-OCR Core Code

#### a. Handle v0/v1 SamplingMetadata import (line 14 in `deepseek_ocr.py`)

```python
try:
    from vllm.model_executor import SamplingMetadata
except ImportError:
    from vllm.v1.sample.metadata import SamplingMetadata
```

#### b. Update `_call_hf_processor` signatures (lines ~154 and ~231)

Both instances should accept `**kwargs` to align with v1’s tokenizer kwargs handling:

```python
def _call_hf_processor(
    self,
    prompt: str,
    mm_data: Mapping[str, object],
    mm_kwargs: Mapping[str, object],
    **kwargs,  # tokenizer kwargs in v1
) -> BatchFeature:
    ...
```

#### c. Propagate `**kwargs` in `_cached_apply_hf_processor` (lines ~231–254)

Ensure kwargs are forwarded in the overridden caching method:

```python
def _cached_apply_hf_processor(
    self,
    prompt: Union[str, list[int]],
    mm_data_items: MultiModalDataItems,
    hf_processor_mm_kwargs: Mapping[str, object],
    **kwargs  # forward to underlying processor
) -> tuple[list[int], MultiModalKwargs, bool]:
    if mm_data_items.get_count("image", strict=False) > 2:
        return self._apply_hf_processor_main(
            prompt=prompt,
            mm_items=mm_data_items,
            hf_processor_mm_kwargs=hf_processor_mm_kwargs,
            enable_hf_prompt_update=True,
            **kwargs
        )
    return super()._cached_apply_hf_processor(
        prompt=prompt,
        mm_data_items=mm_data_items,
        hf_processor_mm_kwargs=hf_processor_mm_kwargs,
        **kwargs
    )
```

---

### 3. Update Inference Script for v1 Engine

In `run_dpsk_ocr_image.py`:

#### a. Enable v1 engine explicitly (optional but recommended):

```python
import os
os.environ['VLLM_USE_V1'] = '1'
```

#### b. Initialize engine with v1-compatible logits processor:

```python
engine_args = AsyncEngineArgs(
    model=MODEL_PATH,
    hf_overrides={"architectures": ["DeepseekOCRForCausalLM"]},
    block_size=256,
    max_model_len=8192,
    enforce_eager=False,
    trust_remote_code=True,
    tensor_parallel_size=1,
    gpu_memory_utilization=0.75,
    logits_processors=["process.ngram_norepeat_v1_adapter:NoRepeatNGramAdaptor"],
)
engine = AsyncLLMEngine.from_engine_args(engine_args)
```

#### c. Pass n-gram parameters via `extra_args` in `SamplingParams`:

```python
sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=8192,
    skip_special_tokens=False,
    extra_args={
        "ngram_size": 30,
        "window_size": 90,
        "whitelist_token_ids": {128821, 128822}
    }
)
```

> **Important**: The OCR processor must **not** be invoked before the vLLM engine is fully initialized. Specifically, avoid calling `DeepseekOCRProcessor().tokenize_with_images(...)` before engine startup. This issue does not occur with the v0 engine.

Instead, ensure engine initialization completes first, then process the image:

```python
engine = AsyncLLMEngine.from_engine_args(engine_args)
# After engine is created
if '<image>' in PROMPT:
    image_features = DeepseekOCRProcessor().tokenize_with_images(
        images=[image], bos=True, eos=True, cropping=CROP_MODE
    )
else:
    image_features = ''
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable DeepSeek-OCR support in latest vLLM 0.11.0 (v1 Engine) with custom modifications #231

Environment

1. Add v1-Compatible Logits Processor Adapter

2. Update DeepSeek-OCR Core Code

a. Handle v0/v1 SamplingMetadata import (line 14 in `deepseek_ocr.py`)

b. Update `_call_hf_processor` signatures (lines ~154 and ~231)

c. Propagate `**kwargs` in `_cached_apply_hf_processor` (lines ~231–254)

3. Update Inference Script for v1 Engine

a. Enable v1 engine explicitly (optional but recommended):

b. Initialize engine with v1-compatible logits processor:

c. Pass n-gram parameters via `extra_args` in `SamplingParams`:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable DeepSeek-OCR support in latest vLLM 0.11.0 (v1 Engine) with custom modifications #231

Description

Environment

1. Add v1-Compatible Logits Processor Adapter

2. Update DeepSeek-OCR Core Code

a. Handle v0/v1 SamplingMetadata import (line 14 in deepseek_ocr.py)

b. Update _call_hf_processor signatures (lines ~154 and ~231)

c. Propagate **kwargs in _cached_apply_hf_processor (lines ~231–254)

3. Update Inference Script for v1 Engine

a. Enable v1 engine explicitly (optional but recommended):

b. Initialize engine with v1-compatible logits processor:

c. Pass n-gram parameters via extra_args in SamplingParams:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

a. Handle v0/v1 SamplingMetadata import (line 14 in `deepseek_ocr.py`)

b. Update `_call_hf_processor` signatures (lines ~154 and ~231)

c. Propagate `**kwargs` in `_cached_apply_hf_processor` (lines ~231–254)

c. Pass n-gram parameters via `extra_args` in `SamplingParams`: