Skip to content

Add syntactic filtering of side conditions after simplification #3870

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 3, 2024

Conversation

geo2a
Copy link
Contributor

@geo2a geo2a commented May 15, 2024

This PR introduces a heuristic in the equation application process:

  • after simplifying the predicates in the requires clause, see if an of them is syncatically present in the path condition (which is known to be true) and delete them

@geo2a geo2a self-assigned this May 15, 2024
@geo2a
Copy link
Contributor Author

geo2a commented May 17, 2024

I've deleted the KEVM performance table as out-of-date. Will post a new one.

@geo2a geo2a force-pushed the georgy/second-side-condition-filtering branch from 92e726b to 8d363a8 Compare May 21, 2024 07:22
@geo2a
Copy link
Contributor Author

geo2a commented May 21, 2024

I have three recently re-enabled KEVM proofs failing on the feature branch:

FAILED src/tests/integration/test_prove.py::test_pyk_prove[mcd/flopper-tick-pass-rough-spec.k]
FAILED src/tests/integration/test_prove.py::test_pyk_prove[mcd/flopper-dent-guy-diff-tic-not-0-pass-rough-spec.k]
FAILED src/tests/integration/test_prove.py::test_pyk_prove[mcd/flopper-dent-guy-same-pass-rough-spec.k]

Will investigate mcd/flopper-tick-pass-rough-spec.k, as it seems to fail relatively quickly (140.20s)

@geo2a
Copy link
Contributor Author

geo2a commented May 21, 2024

The mcd/flopper-tick-pass-rough-spec.k fails because Booster gets unknwon from z3 when trying to decide a branch condition:

[booster][execute][term c784633][rewrite 0ee7a2c][detail] EVM.jumpi.true
[booster][execute][term c784633][rewrite 0ee7a2c][match][success] rewrite 0ee7a2c
...
[booster][execute][term c784633][rewrite 0ee7a2c][smt] Checking 1 predicates, given 34 assertions and a substitution of size 0
[booster][execute][term c784633][rewrite 0ee7a2c][smt] Predicates to check: _==Int_(_/Int_(chop(_)_WORD_Int_Int(_*Int_(#lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, "5"), #lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, _+Int_(keccak(_)_SERIALIZATION_Int_Bytes(_+Bytes__BYTES-HOOKED_Bytes_Bytes_Bytes(#buf(_,_)_BUF-SYNTAX_Bytes_Int_Int("32", VarABI_id:SortInt{}), "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL...truncated")), "1")))), #lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, _+Int_(keccak(_)_SERIALIZATION_Int_Bytes(_+Bytes__BYTES-HOOKED_Bytes_Bytes_Bytes(#buf(_,_)_BUF-SYNTAX_Bytes_Int_Int("32", VarABI_id:SortInt{}), "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL...truncated")), "1"))), #lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, "5"))
[booster][execute][term c784633][rewrite 0ee7a2c][smt] Check of Given ∧ P and Given ∧ !P produced (Sat,Unknown)
[Error#booster] SMT returned `Unknown'

and crashing. I've opened an issue to add retries in Booster.

The proof can go on if given a longer timeout, and we see in the logs that the predicate actually would get actually:

...
[booster][execute][term c784633][rewrite 0ee7a2c][smt] Predicates to check: _==Int_(_/Int_(chop(_)_WORD_Int_Int(_*Int_(#lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, "5"), #lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, _+Int_(keccak(_)_SERIALIZATION_Int_Bytes(_+Bytes__BYTES-HOOKED_Bytes_Bytes_Bytes(#buf(_,_)_BUF-SYNTAX_Bytes_Int_Int("32", VarABI_id:SortInt{}), "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL...truncated")), "1")))), #lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, _+Int_(keccak(_)_SERIALIZATION_Int_Bytes(_+Bytes__BYTES-HOOKED_Bytes_Bytes_Bytes(#buf(_,_)_BUF-SYNTAX_Bytes_Int_Int("32", VarABI_id:SortInt{}), "\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL\NUL...truncated")), "1"))), #lookup(_,_)_EVM-TYPES_Int_Map_Int(VarACCT_ID_STORAGE:SortMap{}, "5"))
[booster][execute][term c784633][rewrite 0ee7a2c][smt] Check of Given ∧ P and Given ∧ !P produced (Sat,Unsat)

@geo2a
Copy link
Contributor Author

geo2a commented May 22, 2024

After increasing the timeout to 1 second, mcd/flopper-tick-pass-rough-spec.k still fails. The other two tests pass.

I think we should discuss this further and possible wait on merging until we have #3884

@geo2a
Copy link
Contributor Author

geo2a commented May 22, 2024

KEVM performance numbers, excluding mcd/flopper-tick-pass-rough-spec.k:

Test georgy-second-side-condition-filterin time master-0f3148308 time (georgy-second-side-condition-filterin/master-0f3148308) time
mcd/vat-muluu-pass-spec.k 105.48 116.19 0.9078233927188226
mcd/flopper-kick-pass-rough-spec.k 153.49 169.05 0.9079562259686483
mcd/end-pack-pass-rough-spec.k 195.32 212.04 0.921146953405018
benchmarks/requires01-a0gt0-spec.k 50.91 54.0 0.9427777777777777
examples/solidity-code-spec.md 122.11 127.34 0.958928851892571
benchmarks/encode-keccak00-spec.k 59.6 62.02 0.9609803289261528
mcd/end-cash-pass-rough-spec.k 303.58 315.39 0.9625542978534513
benchmarks/keccak00-spec.k 56.98 59.15 0.9633136094674556
kontrol/test-expectcalltest-testexpectstaticcall-0-spec.k 54.44 56.39 0.9654194006029437
benchmarks/storagevar02-nooverflow-spec.k 52.1 50.17 1.0384692047040063
erc20/ds/transferFrom-success-2-spec.k 98.47 94.56 1.041349407783418
mcd/flopper-cage-pass-spec.k 62.85 60.23 1.0434999169848913
kontrol/test-allowchangestest-testfailallowcallstoaddress-0-spec.k 58.01 55.58 1.0437207628643397
erc20/ds/transferFrom-success-1-spec.k 103.61 97.09 1.0671541868369554
erc20/ds/approve-success-spec.k 75.24 70.16 1.072405929304447
mcd/vat-addui-pass-spec.k 141.23 131.34 1.0753007461550175
erc20/ds/transfer-success-1-spec.k 86.57 79.04 1.0952682186234817
erc20/ds/transfer-success-2-spec.k 83.56 75.88 1.1012124406958357
mcd/vat-addui-fail-rough-spec.k 81.57 69.68 1.170637198622273
erc20/hkg/approve-spec.k 61.74 52.58 1.1742107265119819
erc20/hkg/transferFrom-success-1-spec.k 79.44 67.55 1.1760177646188008
erc20/hkg/transferFrom-success-2-spec.k 75.37 63.91 1.1793146612423722

@geo2a
Copy link
Contributor Author

geo2a commented May 22, 2024

Kontrol performance:

Test georgy-second-side-condition-filterin time master-0f3148308 time (georgy-second-side-condition-filterin/master-0f3148308) time
FreshCheatcodes.test_address() 43.23 58.98 0.7329603255340793
AddrTest.test_notBuiltinAddress_symbolic(address) 55.83 65.31 0.8548461185117133
StructTypeTest.test_vars((uint8,uint32,bytes32)) 35.72 40.84 0.8746327130264445
ConstructorTest.run_constructor() 96.39 109.73 0.8784288708648501
BlockParamsTest.testFee(uint256) 38.48 42.76 0.8999064546304958
BlockParamsTest.testChainId(uint256) 38.68 42.89 0.9018419211937514
AccountParamsTest.testDealSymbolic(uint256) 43.01 47.68 0.9020553691275167
BlockParamsTest.testRoll(uint256) 38.48 42.54 0.9045604137282557
ExpectRevertTest.testFail_expectRevert_false() 134.45 145.31 0.9252632303351455
FreshBytesTest.test_symbolic_bytes_length 103.04 109.31 0.9426401976031471
BytesTypeTest.test_bytes4(bytes4) 32.35 34.22 0.9453535943892462
BlockParamsTest.testWarp(uint256)-trace_options2 30.48 32.09 0.9498286070426923
HevmTests.prove_require_assert_false 70.82 74.2 0.9544474393530996
CounterTest.testSetNumber(uint256) 145.76 152.02 0.9588212077358241
AccountParamsTest.testNonceSymbolic(uint64) 94.58 98.61 0.9591319338809451
Setup2Test.test_setup() 31.46 32.8 0.9591463414634147
FeeTest.test_fee_setup() 65.07 67.73 0.9607264137014615
StoreTest.testGasLoadColdVM() 39.03 40.46 0.9646564508156203
SetUpTest.testSetupData() 79.38 76.51 1.0375114364135405
StoreTest.testLoadNonExistent() 38.73 36.92 1.0490249187432286
HevmTests.prove_revert 23.31 21.93 1.0629274965800273
ExpectRevertTest.testFail_ExpectRevert_failAndSuccess() 115.08 105.97 1.085967726715108
HevmTests.proveFail_all_branches 71.89 64.9 1.107704160246533
TOTAL 1465.2499999999998 1543.7100000000003 0.9491743915631818

test_foundry_prove.py::test_foundry_refute_node fails on both feature and master branches 🤷

@geo2a geo2a force-pushed the georgy/second-side-condition-filtering branch from 69d0d4f to 3203683 Compare May 22, 2024 14:35
@geo2a
Copy link
Contributor Author

geo2a commented May 30, 2024

Measured KEVM performance again: no big difference, long-running proofs seem to benefit though.

Test georgy-second-side-condition-filterin time master-b6ab7a905 time (georgy-second-side-condition-filterin/master-b6ab7a905) time
mcd/flopper-tick-pass-rough-spec.k 200.67 225.68 0.8891793690180786
mcd/flopper-kick-pass-rough-spec.k 147.07 163.6 0.898960880195599
mcd/end-pack-pass-rough-spec.k 187.88 203.43 0.9235609300496485
mcd/vat-muluu-pass-spec.k 105.56 114.24 0.9240196078431373
benchmarks/requires01-a0gt0-spec.k 51.15 55.28 0.9252894356005789
benchmarks/requires01-a0le0-spec.k 51.0 54.45 0.9366391184573002
erc20/ds/transferFrom-failure-1-d-spec.k 55.1 58.04 0.9493452791178498
benchmarks/overflow00-nooverflow-spec.k 51.71 54.27 0.9528284503408881
benchmarks/structarg01-spec.k 63.09 65.92 0.9570691747572816
examples/erc20-spec.md 179.97 187.77 0.9584598178622783
mcd/dstoken-approve-fail-rough-spec.k 96.62 100.74 0.9591026404605917
benchmarks/storagevar01-spec.k 49.24 51.15 0.9626588465298144
bihu/forwardToHotWallet-success-2-spec.k 60.67 62.89 0.9647002703132453
bihu/functional-spec.k 50.7 48.84 1.038083538083538
erc20/ds/transferFrom-success-2-spec.k 95.69 90.9 1.0526952695269527
erc20/ds/transferFrom-success-1-spec.k 98.98 93.41 1.0596295899796597
erc20/ds/approve-success-spec.k 72.96 68.09 1.0715229842855043
mcd/vat-addui-pass-spec.k 139.59 129.88 1.0747613181398215
erc20/hkg/approve-spec.k 59.66 54.34 1.0979020979020977
erc20/ds/transfer-success-1-spec.k 84.08 75.2 1.1180851063829786
erc20/ds/transfer-success-2-spec.k 81.59 72.85 1.1199725463280714
erc20/hkg/transferFrom-success-2-spec.k 72.77 62.23 1.1693716856821468
erc20/hkg/transferFrom-success-1-spec.k 77.54 66.17 1.1718301345020403
mcd/vat-addui-fail-rough-spec.k 82.32 69.49 1.1846308821413152
TOTAL 2215.6100000000006 2228.86 0.9940552569474981

@geo2a
Copy link
Contributor Author

geo2a commented May 31, 2024

There seems to be an improvement in Kontrol performance, although the proofs are rather short-running:

Test georgy-second-side-condition-filterin time master-b6ab7a905 time (georgy-second-side-condition-filterin/master-b6ab7a905) time
FreshCheatcodes.test_address() 42.9 57.59 0.7449209932279909
AddrTest.test_notBuiltinAddress_symbolic(address) 55.29 63.67 0.8683838542484686
StructTypeTest.test_vars((uint8,uint32,bytes32)) 35.57 39.31 0.9048588145510048
BlockParamsTest.testFee(uint256) 37.71 41.67 0.9049676025917927
BlockParamsTest.testRoll(uint256) 37.7 41.3 0.9128329297820824
AccountParamsTest.testDealSymbolic(uint256) 42.11 46.06 0.9142422926617455
BlockParamsTest.testChainId(uint256) 37.76 41.12 0.9182879377431906
AssertTest.prove_assert_true() 29.39 31.19 0.9422891952548894
FreshBytesTest.test_symbolic_bytes_length 101.17 106.05 0.953983969825554
StoreTest.testLoadNonExistent() 36.65 38.22 0.9589220303506018
ConstructorTest.run_constructor() 91.72 95.23 0.9631418670587
AssertTest.test_assert_true() 34.9 33.35 1.0464767616191903
AssumeTest.test_assume_true(uint256,uint256) 39.85 38.06 1.0470310036784025
HevmTests.proveFail_all_branches 72.68 69.1 1.0518089725036182
HevmTests.prove_revert 29.63 22.4 1.3227678571428572
TOTAL 725.0300000000001 764.3200000000002 0.948594829390831

@geo2a
Copy link
Contributor Author

geo2a commented May 31, 2024

No difference in mx-backend performance.

@geo2a geo2a marked this pull request as ready for review June 3, 2024 06:57
@geo2a geo2a added the automerge label Jun 3, 2024
@rv-jenkins rv-jenkins merged commit 1c85ef0 into master Jun 3, 2024
6 checks passed
@rv-jenkins rv-jenkins deleted the georgy/second-side-condition-filtering branch June 3, 2024 14:42
@ehildenb
Copy link
Member

ehildenb commented Jun 3, 2024

@geo2a what example prompted this change? Was it something on one of our commercial engagements? If so, then we should make sure that example gets upstreamed into one of the relevant test-suites, so that we know we are actually testing the new feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants