Skip to content

Commit 19e887e

Browse files
authored
Fuzzer: Avoid constantly growing the wasm each time (#7511)
Imagine we start with a fuzz file, then mutate it (using it as initial content and letting the fuzzer make changes, perhaps with `--preserve-imports-and-exports`). We can mutate it again and again, which is what harnesses like Fuzzilli do. It is somewhat bad if we keep growing the file, because then when we explore the space of wasm files, the further we travel, the more we focus on big files for no good reason. Running `--fuzz-passes` is one way to trim the wasm (it might end up running `-O3`), but not reliable. This PR goes through places that always add code and tries to at least give a chance to not do so. As long as there is a chance to not grow, then a harness can see two testcases of equal coverage and pick the smaller one. Specific changes: * If a data segment exists already, do not always add others. * If an exnref table exists already, reuse it (like we do with funcref). * Don't always add hashMemory support (which is large). * Avoid errors in the fuzzer on wasm files without exports. That is now possible, since we no longer always add at least 1 export.
1 parent b5a4e36 commit 19e887e

File tree

6 files changed

+179
-148
lines changed

6 files changed

+179
-148
lines changed

scripts/fuzz_opt.py

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1627,7 +1627,11 @@ def handle(self, wasm):
16271627
class ClusterFuzz(TestCaseHandler):
16281628
frequency = 0.1
16291629

1630-
def handle(self, wasm):
1630+
# Use handle_pair over handle because we don't use these wasm files anyhow,
1631+
# we generate our own using run.py. If we used handle, we'd be called twice
1632+
# for each iteration (once for each of the wasm files we ignore), which is
1633+
# confusing.
1634+
def handle_pair(self, input, before_wasm, after_wasm, opts):
16311635
self.ensure()
16321636

16331637
# run.py() should emit these two files. Delete them to make sure they
@@ -1651,6 +1655,9 @@ def handle(self, wasm):
16511655
assert os.path.exists(fuzz_file)
16521656
assert os.path.exists(flags_file)
16531657

1658+
# We'll use the fuzz file a few times below in commands.
1659+
fuzz_file = os.path.abspath(fuzz_file)
1660+
16541661
# Run the testcase in V8, similarly to how ClusterFuzz does.
16551662
cmd = [shared.V8]
16561663
# The flags are given in the flags file - we do *not* use our normal
@@ -1664,20 +1671,27 @@ def handle(self, wasm):
16641671
cmd += get_v8_extra_flags()
16651672
# Run the fuzz file, which contains a modified fuzz_shell.js - we do
16661673
# *not* run fuzz_shell.js normally.
1667-
cmd.append(os.path.abspath(fuzz_file))
1674+
cmd.append(fuzz_file)
16681675
# No wasm file needs to be provided: it is hardcoded into the JS. Note
16691676
# that we use run_vm(), which will ignore known issues in our output and
16701677
# in V8. Those issues may cause V8 to e.g. reject a binary we emit that
16711678
# is invalid, but that should not be a problem for ClusterFuzz (it isn't
16721679
# a crash).
16731680
output = run_vm(cmd)
16741681

1675-
# Verify that we called something. The fuzzer should always emit at
1676-
# least one exported function (unless we've decided to ignore the entire
1682+
# Verify that we called something, if the fuzzer emitted a func export
1683+
# (rarely, none might exist), unless we've decided to ignore the entire
16771684
# run, or if the wasm errored during instantiation, which can happen due
1678-
# to a testcase with a segment out of bounds, say).
1685+
# to a testcase with a segment out of bounds, say.
16791686
if output != IGNORE and not output.startswith(INSTANTIATE_ERROR):
1680-
assert FUZZ_EXEC_CALL_PREFIX in output
1687+
# Do the work to find if there were function exports: extract the
1688+
# wasm from the JS, and process it.
1689+
run([sys.executable,
1690+
in_binaryen('scripts', 'clusterfuzz', 'extract_wasms.py'),
1691+
fuzz_file,
1692+
'extracted'])
1693+
if get_exports('extracted.0.wasm', ['func']):
1694+
assert FUZZ_EXEC_CALL_PREFIX in output
16811695

16821696
def ensure(self):
16831697
# The first time we actually run, set things up: make a bundle like the

src/tools/fuzzing.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ class TranslateToFuzzReader {
174174
Name exnrefTableName;
175175

176176
std::unordered_map<Type, Name> logImportNames;
177+
Name hashMemoryName;
177178
Name throwImportName;
178179
Name tableGetImportName;
179180
Name tableSetImportName;

src/tools/fuzzing/fuzzing.cpp

Lines changed: 52 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,6 @@
2626

2727
namespace wasm {
2828

29-
namespace {
30-
31-
} // anonymous namespace
32-
3329
TranslateToFuzzReader::TranslateToFuzzReader(Module& wasm,
3430
std::vector<char>&& input,
3531
bool closedWorld)
@@ -395,9 +391,12 @@ void TranslateToFuzzReader::setupMemory() {
395391

396392
auto& memory = wasm.memories[0];
397393
if (wasm.features.hasBulkMemory()) {
398-
size_t memCovered = 0;
394+
size_t numSegments = upTo(8);
399395
// need at least one segment for memory.inits
400-
size_t numSegments = upTo(8) + 1;
396+
if (wasm.dataSegments.empty() && !numSegments) {
397+
numSegments = 1;
398+
}
399+
size_t memCovered = 0;
401400
for (size_t i = 0; i < numSegments; i++) {
402401
auto segment = builder.makeDataSegment();
403402
segment->setName(Names::getValidDataSegmentName(wasm, Name::fromInt(i)),
@@ -417,19 +416,21 @@ void TranslateToFuzzReader::setupMemory() {
417416
wasm.addDataSegment(std::move(segment));
418417
}
419418
} else {
420-
// init some data
421-
auto segment = builder.makeDataSegment();
422-
segment->memory = memory->name;
423-
segment->offset =
424-
builder.makeConst(Literal::makeFromInt32(0, memory->addressType));
425-
segment->setName(Names::getValidDataSegmentName(wasm, Name::fromInt(0)),
426-
false);
427-
auto num = upTo(fuzzParams->USABLE_MEMORY * 2);
428-
for (size_t i = 0; i < num; i++) {
429-
auto value = upTo(512);
430-
segment->data.push_back(value >= 256 ? 0 : (value & 0xff));
419+
// init some data, especially if none exists before
420+
if (!oneIn(wasm.dataSegments.empty() ? 10 : 2)) {
421+
auto segment = builder.makeDataSegment();
422+
segment->memory = memory->name;
423+
segment->offset =
424+
builder.makeConst(Literal::makeFromInt32(0, memory->addressType));
425+
segment->setName(Names::getValidDataSegmentName(wasm, Name::fromInt(0)),
426+
false);
427+
auto num = upTo(fuzzParams->USABLE_MEMORY * 2);
428+
for (size_t i = 0; i < num; i++) {
429+
auto value = upTo(512);
430+
segment->data.push_back(value >= 256 ? 0 : (value & 0xff));
431+
}
432+
wasm.addDataSegment(std::move(segment));
431433
}
432-
wasm.addDataSegment(std::move(segment));
433434
}
434435
}
435436

@@ -588,17 +589,27 @@ void TranslateToFuzzReader::setupTables() {
588589
// When EH is enabled, set up an exnref table.
589590
if (wasm.features.hasExceptionHandling()) {
590591
Type exnref = Type(HeapType::exn, Nullable);
591-
Address initial = upTo(10);
592-
Address max = oneIn(2) ? initial + upTo(4) : Memory::kUnlimitedSize;
593-
auto tablePtr =
594-
builder.makeTable(Names::getValidTableName(wasm, "exnref_table"),
595-
exnref,
596-
initial,
597-
max,
598-
Type::i32); // TODO: wasm64
599-
tablePtr->hasExplicitName = true;
600-
table = wasm.addTable(std::move(tablePtr));
601-
exnrefTableName = table->name;
592+
auto iter =
593+
std::find_if(wasm.tables.begin(), wasm.tables.end(), [&](auto& table) {
594+
return table->type == exnref;
595+
});
596+
if (iter != wasm.tables.end()) {
597+
// Use the existing one.
598+
exnrefTableName = iter->get()->name;
599+
} else {
600+
// Create a new exnref table.
601+
Address initial = upTo(10);
602+
Address max = oneIn(2) ? initial + upTo(4) : Memory::kUnlimitedSize;
603+
auto tablePtr =
604+
builder.makeTable(Names::getValidTableName(wasm, "exnref_table"),
605+
exnref,
606+
initial,
607+
max,
608+
Type::i32); // TODO: wasm64
609+
tablePtr->hasExplicitName = true;
610+
table = wasm.addTable(std::move(tablePtr));
611+
exnrefTableName = table->name;
612+
}
602613
}
603614
}
604615

@@ -1073,6 +1084,11 @@ void TranslateToFuzzReader::addImportSleepSupport() {
10731084
}
10741085

10751086
void TranslateToFuzzReader::addHashMemorySupport() {
1087+
// Don't always add this.
1088+
if (oneIn(2)) {
1089+
return;
1090+
}
1091+
10761092
// Add memory hasher helper (for the hash, see hash.h). The function looks
10771093
// like:
10781094
// function hashMemory() {
@@ -1107,13 +1123,13 @@ void TranslateToFuzzReader::addHashMemorySupport() {
11071123
}
11081124
contents.push_back(builder.makeLocalGet(0, Type::i32));
11091125
auto* body = builder.makeBlock(contents);
1110-
auto name = Names::getValidFunctionName(wasm, "hashMemory");
1126+
hashMemoryName = Names::getValidFunctionName(wasm, "hashMemory");
11111127
auto* hasher = wasm.addFunction(builder.makeFunction(
1112-
name, Signature(Type::none, Type::i32), {Type::i32}, body));
1128+
hashMemoryName, Signature(Type::none, Type::i32), {Type::i32}, body));
11131129

1114-
if (!preserveImportsAndExports) {
1130+
if (!preserveImportsAndExports && !wasm.getExportOrNull("hashMemory")) {
11151131
wasm.addExport(
1116-
builder.makeExport(hasher->name, hasher->name, ExternalKind::Function));
1132+
builder.makeExport("hashMemory", hasher->name, ExternalKind::Function));
11171133
// Export memory so JS fuzzing can use it
11181134
if (!wasm.getExportOrNull("memory")) {
11191135
wasm.addExport(builder.makeExport(
@@ -1321,7 +1337,7 @@ Expression* TranslateToFuzzReader::makeImportSleep(Type type) {
13211337
}
13221338

13231339
Expression* TranslateToFuzzReader::makeMemoryHashLogging() {
1324-
auto* hash = builder.makeCall(std::string("hashMemory"), {}, Type::i32);
1340+
auto* hash = builder.makeCall(hashMemoryName, {}, Type::i32);
13251341
return builder.makeCall(logImportNames[Type::i32], {hash}, Type::none);
13261342
}
13271343

@@ -2019,7 +2035,7 @@ void TranslateToFuzzReader::addInvocations(Function* func) {
20192035
}
20202036
invocations.push_back(invoke);
20212037
// log out memory in some cases
2022-
if (oneIn(2)) {
2038+
if (hashMemoryName && oneIn(2)) {
20232039
invocations.push_back(makeMemoryHashLogging());
20242040
}
20252041
}
@@ -2177,7 +2193,7 @@ Expression* TranslateToFuzzReader::_makeConcrete(Type type) {
21772193
Expression* TranslateToFuzzReader::_makenone() {
21782194
auto choice = upTo(100);
21792195
if (choice < LOGGING_PERCENT) {
2180-
if (choice < LOGGING_PERCENT / 2) {
2196+
if (!hashMemoryName || choice < LOGGING_PERCENT / 2) {
21812197
return makeImportLogging();
21822198
} else {
21832199
return makeMemoryHashLogging();
Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,35 @@
11
Metrics
22
total
3-
[exports] : 77
4-
[funcs] : 111
5-
[globals] : 21
6-
[imports] : 5
3+
[exports] : 65
4+
[funcs] : 93
5+
[globals] : 7
6+
[imports] : 4
77
[memories] : 1
8-
[memory-data] : 5
9-
[table-data] : 45
8+
[memory-data] : 23
9+
[table-data] : 25
1010
[tables] : 1
1111
[tags] : 0
12-
[total] : 9163
13-
[vars] : 296
14-
Binary : 620
15-
Block : 1503
16-
Break : 288
17-
Call : 580
18-
CallIndirect : 101
19-
Const : 1500
20-
Drop : 136
21-
GlobalGet : 772
22-
GlobalSet : 562
23-
If : 478
24-
Load : 126
25-
LocalGet : 630
26-
LocalSet : 487
27-
Loop : 166
28-
Nop : 78
29-
RefFunc : 45
30-
Return : 87
31-
Select : 75
32-
Store : 60
33-
Switch : 2
34-
Unary : 588
35-
Unreachable : 279
12+
[total] : 6800
13+
[vars] : 256
14+
Binary : 454
15+
Block : 1201
16+
Break : 196
17+
Call : 205
18+
CallIndirect : 61
19+
Const : 1131
20+
Drop : 88
21+
GlobalGet : 635
22+
GlobalSet : 487
23+
If : 378
24+
Load : 88
25+
LocalGet : 406
26+
LocalSet : 341
27+
Loop : 148
28+
Nop : 107
29+
RefFunc : 25
30+
Return : 58
31+
Select : 52
32+
Store : 41
33+
Switch : 1
34+
Unary : 451
35+
Unreachable : 246
Lines changed: 30 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,35 @@
11
Metrics
22
total
3-
[exports] : 36
4-
[funcs] : 64
5-
[globals] : 7
6-
[imports] : 4
3+
[exports] : 37
4+
[funcs] : 59
5+
[globals] : 4
6+
[imports] : 6
77
[memories] : 1
8-
[memory-data] : 29
9-
[table-data] : 19
8+
[memory-data] : 20
9+
[table-data] : 28
1010
[tables] : 1
1111
[tags] : 0
12-
[total] : 10813
13-
[vars] : 208
14-
Binary : 791
15-
Block : 1709
16-
Break : 412
17-
Call : 382
18-
CallIndirect : 73
19-
Const : 1793
20-
Drop : 77
21-
GlobalGet : 898
22-
GlobalSet : 637
23-
If : 561
24-
Load : 199
25-
LocalGet : 908
26-
LocalSet : 616
27-
Loop : 248
28-
Nop : 169
29-
RefFunc : 19
30-
Return : 89
31-
Select : 91
32-
Store : 80
33-
Switch : 2
34-
Unary : 747
35-
Unreachable : 312
12+
[total] : 9402
13+
[vars] : 189
14+
Binary : 651
15+
Block : 1534
16+
Break : 332
17+
Call : 296
18+
CallIndirect : 91
19+
Const : 1666
20+
Drop : 64
21+
GlobalGet : 650
22+
GlobalSet : 582
23+
If : 506
24+
Load : 149
25+
LocalGet : 827
26+
LocalSet : 497
27+
Loop : 232
28+
Nop : 114
29+
RefFunc : 28
30+
Return : 81
31+
Select : 75
32+
Store : 71
33+
Switch : 7
34+
Unary : 657
35+
Unreachable : 292

0 commit comments

Comments
 (0)