Skip to content

[core] Deflake test_runtime_env_pip_and_conda_4.py #52750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 4, 2025

Conversation

edoakes
Copy link
Collaborator

@edoakes edoakes commented May 2, 2025

Test was timing out sometimes -- let's make it faster.

Updated the slowest test condition to avoid restarting ray each time, which allows the runtime_env cache to be hit and not have to install the env 3 times.

Before:

================= 8 passed, 1 skipped in 96.56s (0:01:36) ==================

After:

================= 8 passed, 1 skipped in 62.14s (0:01:02) ==================

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes requested a review from a team May 2, 2025 19:29
@edoakes edoakes added the go add ONLY when ready to merge, run all tests label May 2, 2025
reason="Requires PR wheels built in CI, so only run on linux CI machines.",
)
@pytest.mark.parametrize("field", ["pip"])
def test_pip_ray_is_overwritten(start_cluster, field):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange diff rendering because I took this out of the TestGC nesting

edoakes added 2 commits May 2, 2025 14:39
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes
Copy link
Collaborator Author

edoakes commented May 2, 2025

Changed to use ray_start_regular_shared, now completes in 19.06s

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes
Copy link
Collaborator Author

edoakes commented May 2, 2025

edoakes added 2 commits May 2, 2025 17:50
Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@@ -116,7 +115,7 @@ class TestGC:
reason="Needs PR wheels built in CI, so only run on linux CI machines.",
)
@pytest.mark.parametrize("field", ["conda", "pip"])
@pytest.mark.parametrize("spec_format", ["file", "python_object"])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few minor speedups in this file. no need to test GC logic against the file and object behavior and the sleep was unneeded

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you need to test against file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no meaningful difference in the GC implementation across the two conditions

@@ -139,9 +138,6 @@ def f():

# Ensure that the runtime env has been installed.
assert ray.get(f.remote())
# Sleep some seconds before checking that we didn't GC. Otherwise this
# check may spuriously pass.
time.sleep(2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm assuming the sleep existed to make sure some code ran after the get. Otherwise the following assert will always pass if you run directly after, removing the sleep makes the test not test that behavior

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check on the following line doesn't really need to be there at all IMO except as a sanity check for the testing utils themselves. it's asserting that we don't GC runtime_envs for active jobs. note that:

  • if we did, many other test cases would fail as this is very basic functionality.
  • this is not really a reliable way to test for the behavior. the GC can be arbitrarily delayed so in order to be sure this is checking what we intend, the sleep needs to be arbitrarily long :)

out of an abundance of caution, I updated the PR to perform the check in a more deterministic way: wait for the task to be marked FINISHED, then perform the check a few times in a loop. this should provide the same level of guarantee without the nondeterminism/delay of the sleep

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool makes sense

@@ -116,7 +115,7 @@ class TestGC:
reason="Needs PR wheels built in CI, so only run on linux CI machines.",
)
@pytest.mark.parametrize("field", ["conda", "pip"])
@pytest.mark.parametrize("spec_format", ["file", "python_object"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you need to test against file?

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
@edoakes edoakes enabled auto-merge (squash) May 3, 2025 12:26
@edoakes edoakes disabled auto-merge May 4, 2025 11:39
@edoakes edoakes merged commit 4ba139e into ray-project:master May 4, 2025
5 of 6 checks passed
vickytsang pushed a commit to ROCm/ray that referenced this pull request May 5, 2025
Test was [timing
out](https://buildkite.com/ray-project/postmerge/builds/9892#019691bf-f352-4fbf-a92c-ff277cf7a901/176-1944)
sometimes -- let's make it faster.

Updated the slowest test condition to avoid restarting ray each time,
which allows the runtime_env cache to be hit and not have to install the
env 3 times.

Before:
```bash
================= 8 passed, 1 skipped in 96.56s (0:01:36) ==================
```

After:
```bash
================= 8 passed, 1 skipped in 62.14s (0:01:02) ==================
```

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
GokuMohandas pushed a commit that referenced this pull request May 8, 2025
Test was [timing
out](https://buildkite.com/ray-project/postmerge/builds/9892#019691bf-f352-4fbf-a92c-ff277cf7a901/176-1944)
sometimes -- let's make it faster.

Updated the slowest test condition to avoid restarting ray each time,
which allows the runtime_env cache to be hit and not have to install the
env 3 times.

Before:
```bash
================= 8 passed, 1 skipped in 96.56s (0:01:36) ==================
```

After:
```bash
================= 8 passed, 1 skipped in 62.14s (0:01:02) ==================
```

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
zhaoch23 pushed a commit to Bye-legumes/ray that referenced this pull request May 14, 2025
Test was [timing
out](https://buildkite.com/ray-project/postmerge/builds/9892#019691bf-f352-4fbf-a92c-ff277cf7a901/176-1944)
sometimes -- let's make it faster.

Updated the slowest test condition to avoid restarting ray each time,
which allows the runtime_env cache to be hit and not have to install the
env 3 times.

Before:
```bash
================= 8 passed, 1 skipped in 96.56s (0:01:36) ==================
```

After:
```bash
================= 8 passed, 1 skipped in 62.14s (0:01:02) ==================
```

---------

Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: zhaoch23 <c233zhao@uwaterloo.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants