Add add_logger API to AsyncLLM #20952

eicherseiji · 2025-07-14T23:41:42Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Problem: The current implementation forces users to choose between default loggers or custom loggers, but not both. It's possible to recreate the default logger list (and append a custom logger), but this creates a manual maintenance effort if the default loggers change.

Based on discussion in #14661, I see this was deprioritized to start, but thought it worth raising again as a minimally invasive ergonomics change. Would love to get thoughts on this. Thanks!

Implements feature request #17702.

Ref:

# vllm/vllm/v1/metrics/loggers.py
def setup_default_loggers(
    vllm_config: VllmConfig,
    log_stats: bool,
    engine_num: int,
    custom_stat_loggers: Optional[list[StatLoggerFactory]] = None,
) -> list[list[StatLoggerBase]]:
    """Setup logging and prometheus metrics."""
    if not log_stats:
        return []

    factories: list[StatLoggerFactory]
    if custom_stat_loggers is not None:
        factories = custom_stat_loggers
    else:
        factories = [PrometheusStatLogger]
        if logger.isEnabledFor(logging.INFO):
            factories.append(LoggingStatLogger)

Test Plan

pytest -vs v1/metrics/test_engine_logger_apis.py

Test Result

(base) ray@ip-10-0-251-217:~/default/work/vllm/tests$ pytest -vs v1/metrics/test_engine_logger_apis.py
INFO 07-15 17:16:59 [__init__.py:253] Automatically detected platform cuda.
=================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.11, pytest-8.4.1, pluggy-1.5.0 -- /home/ray/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/ray/default/work/vllm
configfile: pyproject.toml
plugins: asyncio-1.0.0, anyio-3.7.1
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                                                                          

v1/metrics/test_engine_logger_apis.py::test_async_llm_add_logger INFO 07-15 17:17:07 [config.py:3485] Downcasting torch.float32 to torch.float16.
INFO 07-15 17:17:07 [config.py:1561] Using max model len 1024
INFO 07-15 17:17:08 [config.py:2380] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 07-15 17:17:08 [cuda.py:103] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 07-15 17:17:08 [core.py:526] Waiting for init message from front-end.
INFO 07-15 17:17:08 [core.py:69] Initializing a V1 LLM engine (v0.1.dev7731+g480beba) with config: model='distilbert/distilgpt2', speculative_config=None, tokenizer='distilbert/distilgpt2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=distilbert/distilgpt2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
INFO 07-15 17:17:11 [parallel_state.py:1090] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 07-15 17:17:11 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 07-15 17:17:11 [gpu_model_runner.py:1742] Starting to load model distilbert/distilgpt2...
INFO 07-15 17:17:11 [gpu_model_runner.py:1747] Loading model from scratch...
INFO 07-15 17:17:11 [cuda.py:290] Using Flash Attention backend on V1 engine.
INFO 07-15 17:17:11 [weight_utils.py:296] Using model weights format ['*.safetensors']
INFO 07-15 17:17:11 [weight_utils.py:349] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 12.30it/s]

INFO 07-15 17:17:11 [default_loader.py:272] Loading weights took 0.09 seconds
INFO 07-15 17:17:12 [gpu_model_runner.py:1773] Model loading took 0.1547 GiB and 0.619774 seconds
INFO 07-15 17:17:13 [gpu_worker.py:244] Available KV cache memory: 19.34 GiB
INFO 07-15 17:17:13 [kv_cache_utils.py:728] GPU KV cache size: 1,126,912 tokens
INFO 07-15 17:17:13 [kv_cache_utils.py:732] Maximum concurrency for 1,024 tokens per request: 1100.50x
INFO 07-15 17:17:13 [core.py:172] init engine (profile, create kv cache, warmup model) took 1.24 seconds
PASSED

=================================================================================================== 1 passed in 12.93s ====================================================================================================

(Optional) Documentation Update

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

github-actions · 2025-07-14T23:41:50Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how statistical loggers are integrated into the AsyncLLM engine by introducing a dedicated asynchronous API for adding them. This change provides more flexibility in managing loggers and includes built-in validation to prevent common misconfigurations.

Highlights

New API for Logger Management: Introduced a new asynchronous method, add_logger, to the AsyncLLM class. This method allows for dynamically adding statistical loggers to the engine after its initialization.
Refactored Logger Initialization: The way statistical loggers are added to AsyncLLM has been changed. Instead of passing stat_loggers directly to the AsyncLLM.from_engine_args constructor, loggers are now added via the new add_logger method post-instantiation.
Logger Validation and Duplication Prevention: The add_logger method includes checks to ensure that stat logging is enabled before adding a logger. It also prevents the addition of multiple loggers of the same type, raising a KeyError if a duplicate is detected.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new add_logger API to AsyncLLM, allowing for dynamic addition of stat loggers after engine initialization. The change is well-contained and includes a corresponding update to the tests to use the new API.

I've found a critical issue in the implementation of the duplicate logger check within the new add_logger method, which I've detailed in a specific comment. Addressing this will ensure the new feature works as intended.

vllm/v1/engine/async_llm.py

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

robertgshaw2-redhat · 2025-07-16T00:25:34Z

Why do we want this? I think we should be careful about adding a new API like this before 1.0

eicherseiji · 2025-07-16T00:36:48Z

Why do we want this? I think we should be careful about adding a new API like this before 1.0

Hi @robertgshaw2-redhat! Thanks for taking a look. I updated the PR body with a more detailed problem statement. Basically my perspective is that it should be possible to add additional loggers without implicitly removing default ones.

Totally understand if this is not a safe change to make just yet, though. Are there docs/discussion somewhere I can reference for more background on our unstable/alpha API designs pre-1.0? Or maybe just extra context why this is too early?

Add add_logger API to AsyncLLM

485bfed

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

gemini-code-assist bot reviewed Jul 14, 2025

View reviewed changes

mergify bot added the v1 label Jul 14, 2025

gemini-code-assist bot reviewed Jul 14, 2025

View reviewed changes

vllm/v1/engine/async_llm.py Outdated Show resolved Hide resolved

eicherseiji closed this Jul 15, 2025

eicherseiji reopened this Jul 15, 2025

Simplify by avoiding duplicate logger checks

ba96afc

Signed-off-by: Seiji Eicher <seiji@anyscale.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add add_logger API to AsyncLLM #20952

Add add_logger API to AsyncLLM #20952

eicherseiji commented Jul 14, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jul 16, 2025

Uh oh!

eicherseiji commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

Add add_logger API to AsyncLLM #20952

Are you sure you want to change the base?

Add add_logger API to AsyncLLM #20952

Conversation

eicherseiji commented Jul 14, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 14, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

robertgshaw2-redhat commented Jul 16, 2025

Uh oh!

eicherseiji commented Jul 16, 2025

Uh oh!

Uh oh!

eicherseiji commented Jul 14, 2025 •

edited by github-actions bot

Loading