Skip to content

Add add_logger API to AsyncLLM #20952

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

eicherseiji
Copy link
Contributor

@eicherseiji eicherseiji commented Jul 14, 2025

Essential Elements of an Effective PR Description Checklist

  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

Problem: The current implementation forces users to choose between default loggers or custom loggers, but not both. It's possible to recreate the default logger list (and append a custom logger), but this creates a manual maintenance effort if the default loggers change.

Based on discussion in #14661, I see this was deprioritized to start, but thought it worth raising again as a minimally invasive ergonomics change. Would love to get thoughts on this. Thanks!

Implements feature request #17702.

Ref:

# vllm/vllm/v1/metrics/loggers.py
def setup_default_loggers(
    vllm_config: VllmConfig,
    log_stats: bool,
    engine_num: int,
    custom_stat_loggers: Optional[list[StatLoggerFactory]] = None,
) -> list[list[StatLoggerBase]]:
    """Setup logging and prometheus metrics."""
    if not log_stats:
        return []

    factories: list[StatLoggerFactory]
    if custom_stat_loggers is not None:
        factories = custom_stat_loggers
    else:
        factories = [PrometheusStatLogger]
        if logger.isEnabledFor(logging.INFO):
            factories.append(LoggingStatLogger)

Test Plan

pytest -vs v1/metrics/test_engine_logger_apis.py

Test Result

(base) ray@ip-10-0-251-217:~/default/work/vllm/tests$ pytest -vs v1/metrics/test_engine_logger_apis.py
INFO 07-15 17:16:59 [__init__.py:253] Automatically detected platform cuda.
=================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.11.11, pytest-8.4.1, pluggy-1.5.0 -- /home/ray/anaconda3/bin/python
cachedir: .pytest_cache
rootdir: /home/ray/default/work/vllm
configfile: pyproject.toml
plugins: asyncio-1.0.0, anyio-3.7.1
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 1 item                                                                                                                                                                                                          

v1/metrics/test_engine_logger_apis.py::test_async_llm_add_logger INFO 07-15 17:17:07 [config.py:3485] Downcasting torch.float32 to torch.float16.
INFO 07-15 17:17:07 [config.py:1561] Using max model len 1024
INFO 07-15 17:17:08 [config.py:2380] Chunked prefill is enabled with max_num_batched_tokens=2048.
WARNING 07-15 17:17:08 [cuda.py:103] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used
INFO 07-15 17:17:08 [core.py:526] Waiting for init message from front-end.
INFO 07-15 17:17:08 [core.py:69] Initializing a V1 LLM engine (v0.1.dev7731+g480beba) with config: model='distilbert/distilgpt2', speculative_config=None, tokenizer='distilbert/distilgpt2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=1024, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=distilbert/distilgpt2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=False, pooler_config=None, compilation_config={"level":0,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":[],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"use_cudagraph":true,"cudagraph_num_of_warmups":0,"cudagraph_capture_sizes":[],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"max_capture_size":0,"local_cache_dir":null}
INFO 07-15 17:17:11 [parallel_state.py:1090] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
WARNING 07-15 17:17:11 [topk_topp_sampler.py:59] FlashInfer is not available. Falling back to the PyTorch-native implementation of top-p & top-k sampling. For the best performance, please install FlashInfer.
INFO 07-15 17:17:11 [gpu_model_runner.py:1742] Starting to load model distilbert/distilgpt2...
INFO 07-15 17:17:11 [gpu_model_runner.py:1747] Loading model from scratch...
INFO 07-15 17:17:11 [cuda.py:290] Using Flash Attention backend on V1 engine.
INFO 07-15 17:17:11 [weight_utils.py:296] Using model weights format ['*.safetensors']
INFO 07-15 17:17:11 [weight_utils.py:349] No model.safetensors.index.json found in remote.
Loading safetensors checkpoint shards:   0% Completed | 0/1 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 12.30it/s]

INFO 07-15 17:17:11 [default_loader.py:272] Loading weights took 0.09 seconds
INFO 07-15 17:17:12 [gpu_model_runner.py:1773] Model loading took 0.1547 GiB and 0.619774 seconds
INFO 07-15 17:17:13 [gpu_worker.py:244] Available KV cache memory: 19.34 GiB
INFO 07-15 17:17:13 [kv_cache_utils.py:728] GPU KV cache size: 1,126,912 tokens
INFO 07-15 17:17:13 [kv_cache_utils.py:732] Maximum concurrency for 1,024 tokens per request: 1100.50x
INFO 07-15 17:17:13 [core.py:172] init engine (profile, create kv cache, warmup model) took 1.24 seconds
PASSED

=================================================================================================== 1 passed in 12.93s ====================================================================================================

(Optional) Documentation Update

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @eicherseiji, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors how statistical loggers are integrated into the AsyncLLM engine by introducing a dedicated asynchronous API for adding them. This change provides more flexibility in managing loggers and includes built-in validation to prevent common misconfigurations.

Highlights

  • New API for Logger Management: Introduced a new asynchronous method, add_logger, to the AsyncLLM class. This method allows for dynamically adding statistical loggers to the engine after its initialization.
  • Refactored Logger Initialization: The way statistical loggers are added to AsyncLLM has been changed. Instead of passing stat_loggers directly to the AsyncLLM.from_engine_args constructor, loggers are now added via the new add_logger method post-instantiation.
  • Logger Validation and Duplication Prevention: The add_logger method includes checks to ensure that stat logging is enabled before adding a logger. It also prevents the addition of multiple loggers of the same type, raising a KeyError if a duplicate is detected.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the v1 label Jul 14, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new add_logger API to AsyncLLM, allowing for dynamic addition of stat loggers after engine initialization. The change is well-contained and includes a corresponding update to the tests to use the new API.

I've found a critical issue in the implementation of the duplicate logger check within the new add_logger method, which I've detailed in a specific comment. Addressing this will ensure the new feature works as intended.

@eicherseiji eicherseiji reopened this Jul 15, 2025
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@robertgshaw2-redhat
Copy link
Collaborator

Why do we want this? I think we should be careful about adding a new API like this before 1.0

@eicherseiji
Copy link
Contributor Author

Why do we want this? I think we should be careful about adding a new API like this before 1.0

Hi @robertgshaw2-redhat! Thanks for taking a look. I updated the PR body with a more detailed problem statement. Basically my perspective is that it should be possible to add additional loggers without implicitly removing default ones.

Totally understand if this is not a safe change to make just yet, though. Are there docs/discussion somewhere I can reference for more background on our unstable/alpha API designs pre-1.0? Or maybe just extra context why this is too early?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants