Skip to content

Commit 8e2bff8

Browse files
committed
Add a warning about the flakyness of XFAILS in the README
1 parent 68470c6 commit 8e2bff8

File tree

1 file changed

+27
-4
lines changed

1 file changed

+27
-4
lines changed

README.md

Lines changed: 27 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -247,14 +247,37 @@ jobs:
247247
pytest -v -rxXfE --ci -o xfail_strict=True --xfails-file ${GITHUB_WORKSPACE}/your-array-library/array-api-tests-xfails.txt array_api_tests/
248248
```
249249
250+
> **Warning**
251+
> XFAIL tests that use Hypothesis (basically every test in the test suite except
252+
> those in test_has_names.py) can be flaky, due to the fact that Hypothesis
253+
> might not always run the test with an input that causes the test to fail.
254+
> There are several ways to avoid this problem:
255+
>
256+
> - Increase the maximum number of examples, e.g., by adding `--max-examples
257+
> 1000` to the test command (the default is `100`, see below). This will
258+
> make it more likely that the failing case will be found.
259+
> - Don't use `-o xfail_strict=True`. This will make it so that if an XFAIL
260+
> test passes, it will alert you in the test summary but will not cause the
261+
> test run to register as failed.
262+
> - Use skips instead of XFAILS. The difference between XFAIL and skip is that
263+
> a skipped test is never run at all, whereas an XFAIL test is always run
264+
> but ignored if it fails.
265+
> - Save the [Hypothesis examples
266+
> database](https://hypothesis.readthedocs.io/en/latest/database.html)
267+
> persistently on CI. That way as soon as a run finds one failing example,
268+
> it will always re-run future runs with that example. But note that the
269+
> Hypothesis examples database may be cleared when a new version of
270+
> Hypothesis or the test suite is released.
271+
250272
#### Max examples
251273

252274
The tests make heavy use
253275
[Hypothesis](https://hypothesis.readthedocs.io/en/latest/). You can configure
254-
how many examples are generated using the `--max-examples` flag, which defaults
255-
to 100. Lower values can be useful for quick checks, and larger values should
256-
result in more rigorous runs. For example, `--max-examples 10_000` may find bugs
257-
where default runs don't but will take much longer to run.
276+
how many examples are generated using the `--max-examples` flag, which
277+
defaults to `100`. Lower values can be useful for quick checks, and larger
278+
values should result in more rigorous runs. For example, `--max-examples
279+
10_000` may find bugs where default runs don't but will take much longer to
280+
run.
258281

259282

260283
## Contributing

0 commit comments

Comments
 (0)