vLLM: make <|ref|>…</|ref|> locate prompts work in run_dpsk_ocr_image.py #117

aryanrahar · 2025-10-23T13:23:25Z

Summary
This PR updates the vLLM image runner to correctly handle locate/“rec” prompts that use <|ref|>…</|ref|> by wiring the per-request n-gram logits processor and preserving special tokens only when needed. It also replaces remaining eval(...) usages with ast.literal_eval(...) for safer parsing.

Why
Users running locate/“rec” prompts via the script were not getting expected results because special tokens were stripped and the n-gram per-request logits processor wasn’t attached. The change keeps default OCR behavior unchanged while enabling the reference mode when requested.

What’s changed
Add CLI flags:
--prompt to pass a prompt without editing config.py
--ref-mode to force reference/locate behavior
In stream_generate(...):
Detect reference mode if --ref-mode is set or the prompt contains <|ref|> and </|ref|>
When in reference mode, attach NoRepeatNGramLogitsProcessor(ngram_size=30, window_size=90, whitelist_token_ids={128821,128822}) and set skip_special_tokens=False
Otherwise, leave defaults (no logits processor; skip_special_tokens=True)
Replace unsafe eval(...) with ast.literal_eval(...) where coordinates/geometry are parsed
Minor robustness: initialize final_output and avoid duplicate image loads

Files touched
DeepSeek-OCR-master/DeepSeek-OCR-vllm/run_dpsk_ocr_image.py

How to use
python DeepSeek-OCR-master/DeepSeek-OCR-vllm/run_dpsk_ocr_image.py
--input path/to/your.png
--prompt "\nLocate <|ref|>title<|/ref|> in the image."
--ref-mode

Testing
Argparse/parse sanity (CPU): python -m py_compile DeepSeek-OCR-master/DeepSeek-OCR-vllm/run_dpsk_ocr_image.py
Functional (GPU): run the command above; outputs include result_ori.mmd, result.mmd, images/*.jpg (crops), and result_with_boxes.jpg.

Backward compatibility
Default OCR behavior is unchanged unless --ref-mode is provided or <|ref|>…</|ref|> is detected in the prompt.

Fixes #<114>

…ts; use literal_eval Signed-off-by: Aryan Rahar <aryanrahar1@gmail.com>

Eliezermga · 2025-10-23T13:48:21Z

Good improvement overall! It might help to add short docstrings or inline comments around the reference mode detection logic (if --ref-mode or <|ref|>), to clarify how the n-gram logits processor interacts with special tokens. That would make the intent clearer for future contributors.

vllm: add --ref-mode/--prompt; enable n-gram logits for <|ref|> promp…

37be783

…ts; use literal_eval Signed-off-by: Aryan Rahar <aryanrahar1@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM: make <|ref|>…</|ref|> locate prompts work in run_dpsk_ocr_image.py #117

vLLM: make <|ref|>…</|ref|> locate prompts work in run_dpsk_ocr_image.py #117

Uh oh!

aryanrahar commented Oct 23, 2025

Uh oh!

Eliezermga commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vLLM: make <|ref|>…</|ref|> locate prompts work in run_dpsk_ocr_image.py #117

Are you sure you want to change the base?

vLLM: make <|ref|>…</|ref|> locate prompts work in run_dpsk_ocr_image.py #117

Uh oh!

Conversation

aryanrahar commented Oct 23, 2025

Uh oh!

Eliezermga commented Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants