Skip to content

[CI] Tests for graph comparison between vllm and AFTU #286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

wallashss
Copy link
Collaborator

@wallashss wallashss commented Jul 8, 2025

Description

This PR adds tests that compare graphs generated by vLLM vs AFTU (aiu-fms-testing-utils). The tests do an inference with vLLM and execute inference.py from AFTU. For both execution, the tests dump the compiled graphs to files by setting the env variable DEE_DUMP_GRAPHS=name. Then, they compare the corresponding graphs generated for both method.

The main limitation of this implementation is AFTU is not a proper package. So, I had to make some workarounds:

  • I added an environment variable: VLLM_SPYRE_TEST_AFTU_SCRIPTS_DIR. If set, we can use to inform where is the scripts dirs (where it has the inference.py) to copy from there and execute the script within the tests.
  • If the above env var is not set, then we try to get from the install by importing the module and using the aiu_fms_testing_utils.__file__ variable to find the directory.

I added a marker aftu to ease select the tests of this PR, and also to filter out from other test suite.

wallashss added 3 commits July 7, 2025 15:49
Signed-off-by: Wallas Santos <wallashss@ibm.com>
…-compare-graph

Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Copy link

github-actions bot commented Jul 8, 2025

👋 Hi! Thank you for contributing to vLLM support on Spyre.
Just a reminder: Make sure that your code passes all the linting checks, otherwise your PR won't be able to be merged. To do so, first install the linting requirements, then run format.sh and commit the changes. This can be done with uv directly:

uv sync --frozen --group lint --active --inexact

Or this can be done with pip:

uv pip compile --group lint > requirements-lint.txt
pip install -r requirements-lint.txt
bash format.sh

Now you are good to go 🚀

@wallashss wallashss marked this pull request as draft July 8, 2025 01:48
wallashss added 5 commits July 7, 2025 22:52
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Wallas Santos <wallashss@ibm.com>
@wallashss wallashss marked this pull request as ready for review July 8, 2025 13:31
@wallashss wallashss requested a review from ckadner as a code owner July 8, 2025 13:31
@@ -150,6 +151,7 @@ dev = [
"pytest-timeout==2.3.1",
"requests==2.32.3",
"sentence-transformers==3.4.1",
"aiu_fms_testing_utils@git+https://github.com/foundation-model-stack/aiu-fms-testing-utils.git#1a77f630104a5661fff554164c1e536ea08393e3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if we need to update the uv.lock file as well 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Signed-off-by: Wallas Santos <wallashss@ibm.com>
Comment on lines +52 to +53
# Get G1 graphs, it assumes the input_dir has the folder export_dtcompiler
# where are the files
Copy link
Collaborator

@rafvasq rafvasq Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: something like

Suggested change
# Get G1 graphs, it assumes the input_dir has the folder export_dtcompiler
# where are the files
# Get G1 graphs.
# Assumes the 'input_dir' contains 'export_dtcompiler' with the files.

Comment on lines +29 to +35
# Silly regex to find all s#.
# We are only considered those that surrounds by space (whole word)
# or started with space and terminated with comma
# examples:
# ' s1 '
# ' s1,'
# ' s1 s2 '
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nit: comment could be a bit more succinct, something like:

Suggested change
# Silly regex to find all s#.
# We are only considered those that surrounds by space (whole word)
# or started with space and terminated with comma
# examples:
# ' s1 '
# ' s1,'
# ' s1 s2 '
# Regex to find all 's#' patterns surrounded by spaces,
# or starting with a space and ending with a comma.
# Examples: ' s1 ', ' s1,', ' s1 s2 '


@pytest.mark.aftu
@pytest.mark.parametrize("model", get_spyre_model_list())
@pytest.mark.parametrize("backend", ["sendnn"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this test can only run on the sendnn backend then we don't need to parameterize it, we can just set the backend as sendnn in the test.

Also, we should mark all tests running with the sendnn backend with pytest.mark.spyre for consistency with the rest of the test suite



def get_aftu_script_dir() -> str:
# TODO: since AFTU is not a lib yet, this function does the best
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does AFTU is not a lib yet mean here? Is the problem that inference.py is not shipped with the AFTU wheel?

@@ -131,6 +131,7 @@ markers = [
"multi: Tests that require >1 cards",
"utils: Tests for utility functions",
"worker: Tests for worker logic",
"aftu: Tests to compare graphs from aiu-fms-testing-utils",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid creating more custom markers unless it's completely necessary. (Unrelated but it looks like utils and worker are unused and we should delete them as well)

These tests seem to be important to catch problems early so I do want them running with our default set of markers if possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants