@@ -247,14 +247,37 @@ jobs:
247
247
pytest -v -rxXfE --ci -o xfail_strict=True --xfails-file ${GITHUB_WORKSPACE}/your-array-library/array-api-tests-xfails.txt array_api_tests/
248
248
` ` `
249
249
250
+ > **Warning**
251
+ > XFAIL tests that use Hypothesis (basically every test in the test suite except
252
+ > those in test_has_names.py) can be flaky, due to the fact that Hypothesis
253
+ > might not always run the test with an input that causes the test to fail.
254
+ > There are several ways to avoid this problem:
255
+ >
256
+ > - Increase the maximum number of examples, e.g., by adding ` --max-examples
257
+ > 1000` to the test command (the default is `100`, see below). This will
258
+ > make it more likely that the failing case will be found.
259
+ > - Don't use `-o xfail_strict=True`. This will make it so that if an XFAIL
260
+ > test passes, it will alert you in the test summary but will not cause the
261
+ > test run to register as failed.
262
+ > - Use skips instead of XFAILS. The difference between XFAIL and skip is that
263
+ > a skipped test is never run at all, whereas an XFAIL test is always run
264
+ > but ignored if it fails.
265
+ > - Save the [Hypothesis examples
266
+ > database](https://hypothesis.readthedocs.io/en/latest/database.html)
267
+ > persistently on CI. That way as soon as a run finds one failing example,
268
+ > it will always re-run future runs with that example. But note that the
269
+ > Hypothesis examples database may be cleared when a new version of
270
+ > Hypothesis or the test suite is released.
271
+
250
272
# ### Max examples
251
273
252
274
The tests make heavy use
253
275
[Hypothesis](https://hypothesis.readthedocs.io/en/latest/). You can configure
254
- how many examples are generated using the ` --max-examples` flag, which defaults
255
- to 100. Lower values can be useful for quick checks, and larger values should
256
- result in more rigorous runs. For example, `--max-examples 10_000` may find bugs
257
- where default runs don't but will take much longer to run.
276
+ how many examples are generated using the `--max-examples` flag, which
277
+ defaults to `100`. Lower values can be useful for quick checks, and larger
278
+ values should result in more rigorous runs. For example, `--max-examples
279
+ 10_000` may find bugs where default runs don't but will take much longer to
280
+ run.
258
281
259
282
260
283
# # Contributing
0 commit comments