Description
Summary
TL;DR:
We observed that, when re-creating new Faker\Generator
s between test cases, a "random" PHP garbage collector run happening on object allocation triggers Faker\Generator::__destruct
which changes the mt_rand
seed, breaking further fixture generation in a non-deterministic way.
Context
I am not sure if I would consider this an actual bug of Faker
. It's more that the specific use case makes it break in some (rare) conditions. We do not use Faker itself directly, but through the symfony integration of Alice
.
We use it to generate test data for phpunit test. Alice populates our test database, using faker to fill in fields that need data, but not predefined specific values. Works great for snapshot test and thelike.
The Problem
With one test, we ran into an issue where the test would sometimes, on some systems (mainly CI) fail due to Faker generating different data for that run. Not always the "same different" data, not always the exact same test case, and not on every run.
Setup
The way its setup looks like this:
For each test case, we "boot" a fresh Symfony stack, recreating all services. In the test configuration, Faker\Generator
is a service in that stack, used by Alice to populate the test data.
So on each test, we create all these instances, including a Faker\Generator
, which also gets a predefined seed,
use Alice to read our fixture definitions and create objects from it, using Faker to populate properties,
start a fresh db transaction, insert that data into our test db, run tests against that DB, and compare the result with snapshots.
Then we rollback the whole transaction, teardown the symfony stack, and start the next test with a fresh stack, (that calls seed
again) and a fresh clean DB.
probable root cause
After some debugging we found that sometimes Faker\Generator::__destruct
is called while Alice creates the test data, on random object creations. I think that this is the PHP GC kicking in, claiming the Faker\Generator
instance used in the previous test run.
This would also fit on the behaviour seen in #272
It also explains why we only see these problems in some cases, only on some hosts, and never when just running one single test alone.
Building a minimal test case I could reproduce that problem, and commenting out the seed()
call in the __destruct
also "fixed" the issue.
Conclusion
Our current workaround is to call gc_collect_cycles
in the phpunit setUp
method, before starting the new Symfony stack. This, as expected triggers Faker\Generator::__destruct
and thereby seed
before the new instance is created and sets the expected seed.
From our observation, that seems to have fixed our breaking tests.
But I am not sure if that really what we want to do here, and in all tests that use Alice/Faker.
The reason these exact test cases failed, and all others using the same mechanism did not, I can only guess.
Most likely it is just the amount of data. We usually try to limit test data to a minimum, so do not created tens or hundreds of entities with it.
For the new one, we required more data, and therefor I figure we just pushed it "over the edge" where this test case would have a considerably higher chance of triggering a GC while building fixtures than every other test.
So for me, there are a few different ways this can be resolved:
- either add
gc_collect_cycles
to every test to be run before building the new stack, or even add it to the kernel itself. - or use Faker differently, preventing multiple instances of if in the first place. Not trivial with the way symfony works in the test cases, and not sure if I like all implications of that.
- do not
seed
on__destruct
. Breaking change, not sure about all implications of that, seems like rats nest of problems. - ?? maybe we are holding something very wrong and all of this is just me being stupid? Always an option!
I file the issue here, and not in on of the upstream projects, because I have to start somewhere and Faker seemed like the correct place.
Versions
Version | |
---|---|
PHP | >8.2.9 (also noticed with 8.2.16 and 8.3.2) |
fakerphp/faker |
1.23.1 |
Self-enclosed code snippet for reproduction
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
$pn1 = null;
$pn2 = null;
for ($i = 0; $i < 3; $i++) {
$faker = Faker\Factory::create('en_US');
$faker->seed(1234567);
$n1 = $faker->numberBetween(1000, 10000);
if ($pn1 !== null && $pn1 !== $n1) {
echo sprintf('DELTA in N1: previous: %d, this time: %d, iteration: %d', $pn1, $n1, $i) . PHP_EOL;
}
$pn1 = $n1;
gc_collect_cycles(); // force GC run
$n2 = $faker->numberBetween(1000, 10000);
if ($pn2 !== null && $pn2 !== $n2) {
echo sprintf('DELTA in N2: previous: %d, this time: %d, iteration: %d', $pn2, $n2, $i) . PHP_EOL;
}
$pn2 = $n2;
var_dump([$n1, $n2]);
}
I use gc_collect_cycles
to urge PHP to collect the dangling Generator
instance here. In reality, this happens "somewhere" down the line as more and more new object are allocated.
Expected output (with PHP 8.3 and thus MT_RAND_MT19937)
array(2) {
[0]=>
int(2429)
[1]=>
int(5714)
}
array(2) {
[0]=>
int(2429)
[1]=>
int(5714)
}
array(2) {
[0]=>
int(2429)
[1]=>
int(5714)
}
Actual output
array(2) {
[0]=>
int(2429)
[1]=>
int(5714)
}
DELTA in N2: previous: 5714, this time: 3052, iteration: 1
array(2) {
[0]=>
int(2429)
[1]=>
int(3052)
}
DELTA in N2: previous: 3052, this time: 2717, iteration: 2
array(2) {
[0]=>
int(2429)
[1]=>
int(2717)
}