kimchi: Improve memory usage and parallelism of `expr.evaluations` #3127

Fizzixnerd · 2025-04-01T00:11:54Z

I will be rewriting history in the coming days to ensure it's not so terrible and makes sense from a "story" perspective when reviewing this PR.

I have a couple of checkboxes, for reviewers. If they are for you, please feel free to edit and check them (if GH allows you to do that), otherwise leave a comment specifically mentioning the checkbox you are approving and I will do so.

history rewritten (for @Fizzixnerd)

Goals

Reduce memory usage of expr.evaluations without impacting the performance of proving speed negatively.

Testing

Runtime

cargo bench expr and cargo test expr are your go-to commands. The second ensures the original implementation matches the new one by calling e.evaluations_iter(&env).collect::<Vec<_>>() on the new iterator interface and comparing it to the original e.evaluations(&env), using the main Kimchi expression. I have only observed memory usage from top at the moment, but I will add a proper massif analysis in the coming days, as per @marcbeunardeau88's request.

The first may take ~30-60 minutes to do a full run of 100 samples for each version, but roughly can be understood from running a single sample and dividing if you don't have time for that. I am including screenshots below of my runs.

Memory

Memory is a little tricker. First run cargo bench --bench expr and it will print out the name of the binary it's going to run at the start, something like target/release/deps/expr-19c2d0431de8290d. Control-c to stop it, then run valgrind --tool=massif target/release/deps/expr-19c2d0431de8290d expr_evals_vec for the old implementation. This should run in a couple of minutes.

run valgrind --tool=massif target/release/deps/expr-19c2d0431de8290d expr_evals_par for the new. This will take roughly 10x-15x the time of the original implementation, due to massif suppressing paralellism.

Runtime Results

For the iterator-based runs, runtime-wise: on a 10-core with hyperthreading circa 2022 core i9 something laptop

(If you do plan to benchmark, close Steam especially. I had it open at first, and those are the red dots near the start at the bottom. Oops! I reran and we use the blue dots for analysis.)

For the original implementation, on the same computer:

Runtime Analysis

The mean runtime, wallclock-wise, of the new implementation is comparable to the old.

CPU-time wise, it is enormously in favour of the old implementation.

@mrmr1993 acknowledges this
@marcbeunardeau88 acknowledges this

The CPU time saturates the cores of my machine in the new implementation but does not in the old. This means three important facts: runtime could get worse on less capable machines; specific WASM (cc our WASM expert @hattyhattington17) testing for rayon is required before merging into o1js; and runtime could get better on more capable machines.

I recommend someone check the performance characteristics across different CPUs (Macs, etc) more carefully than I have been able to. Specifically, if you have < 10 cores, or a M1+ Mac, or an ARM-based computer, I would be interested in hearing from you. If the reviewers desire, I can test these changes on an old server I have that has 72 CPU cores, and on my modern desktop (that I think has like 12 or something).

@mrmr1993 would like @Fizzixnerd to test on their other computers
@marcbeunardeau88 would like @Fizzixnerd to test on their other computers

@mrmr1993 would like someone (other than @Fizzixnerd) to test on a Mac and less capable computers before shipping this in o1js
@marcbeunardeau88 would like someone (other than @Fizzixnerd) to test on a Mac and less capable computers before shipping this in o1js

Memory Results

fill in this section with massif stuff (@Fizzixnerd)

MASSIF RESULTS

I used massif-visualizer to make these graphs.

First, the old implementation:

Next the new:

Note that the new a lot longer to run because massif prevents parallelism, it seems.

OLD TOP RESULTS

Peak memory usage from top for the old implementation was ~11-12% of my memory (32GB). This number started low and grew until the evaluations were returned. Peak memory usage from top for the new implementation was ~1.6% of my memory. This number is constant throughout the calculation.

For comparison, Discord uses ~1.8% for me!

Memory Analysis

MASSIF RESULTS

peak old: 3.6 GB
peak new: 0.53 GB

new / old = 0.147

Therefore: peak usage is approximately 15% of what it was originally.

OLD TOP ANALYSIS

Peak memory usage is enormously in favour of the new implementation, using just ~13% of the memory of the original, and without massive spikes in allocation.

Fizzixnerd

Intial self review.

kimchi/src/circuits/berkeley_columns.rs

Fizzixnerd · 2025-04-01T00:16:11Z

kimchi/src/circuits/expr.rs

@@ -582,11 +582,11 @@ pub enum FeatureFlag {

 impl FeatureFlag {
    fn is_enabled(&self) -> bool {
-        todo!("Handle features")
+        true


@mrmr1993 I don't think I ever got an answer of what to do here. Currently, this just unconditionally enables all features (since I assume the kimchi expression is correct). An inspection of the IfFeature branch of value_/value indicates this shouldn't be broken, but I don't know whether to revert this or whatever.

kimchi/src/circuits/expr.rs

kimchi/benches/expr.rs

Fizzixnerd · 2025-04-01T00:28:45Z

kimchi/tests/test_expr.rs

+    assert_eq!(evals1.len(), evals2.len());
+    assert_eq!(evals1[0], evals2[0]);


Sanity checks.

Fizzixnerd · 2025-04-01T00:28:51Z

kimchi/tests/test_expr.rs

+
+    assert_eq!(evals1.len(), evals2.len());
+    assert_eq!(evals1[0], evals2[0]);
+    assert_eq!(evals1, evals2);


The real deal.

Fizzixnerd · 2025-04-02T16:24:14Z

Just gonna fix up CI.

…ad code

Fizzixnerd · 2025-04-02T17:27:15Z

The most recent commit removed the tests, since it was now checking that x==x, essentially.

kimchi/src/circuits/expr.rs

Copilot

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (1)

kimchi/src/circuits/constraints.rs:278

The word 'Iterativelly' is misspelled; consider correcting it to 'Iteratively'.

/// 2. Iterativelly invoke any desired number of steps: `public(), lookup(), runtime(), precomputations()`

marcbeunardeau88 · 2025-04-08T16:33:59Z

kimchi/src/circuits/expr.rs

@@ -1198,25 +1198,37 @@ fn value<
    env: &Environment,
    cache: &HashMap<CacheId, Evaluations<F, D<F>>>,
    row: usize,
+    inferred_domain: Option<Domain>,


The name is a bit confusing as when this is None the domain is infered by the function.
How about something like res_domain or final_domain ?

dannywillems · 2025-04-09T07:52:29Z

I would recommend to keep this PR unmerged until we merge different commits fixing the CI of o1js. @Trivo25 and @querolita are on it.

dannywillems

Requesting changes from now, as 0fa4e82 has been added recently, and to avoid merging it by error after @marcbeunardeau88 approval - we do not request a new set of reviews after a new commit has been added. I give a deeper look within the day.

dannywillems · 2025-04-09T18:57:04Z

kimchi/benches/expr.rs

@@ -0,0 +1,80 @@
+use criterion::{criterion_group, criterion_main, Criterion};


This file, in addition to the change in kimchi/Cargo.toml, can be extracted from this PR, gathered in a single commit, and provide a good starting point for the changes that you try to introduce in this patch.
A reviewer could go after that on the next commits and gradually use the commands you recommend to use to confirm your claims.
In addition to that, it follows the engineering practices we try to enforce. For instance, this practice has been followed in this PR, and it ended up very useful to have atomic commits compiling and passing the whole CI when we had to revert commits.

It seems you started doing in the first commit, but there are comments that you're removing later. Would you mind trying to squash the different commits touching this file to get a clean first commit?

Fizzixnerd requested review from mrmr1993, dannywillems and marcbeunardeau88 April 1, 2025 00:11

Fizzixnerd self-assigned this Apr 1, 2025

Fizzixnerd added this to Kimchi Apr 1, 2025

Fizzixnerd commented Apr 1, 2025

View reviewed changes

Fizzixnerd added benchmark starts the bench action refactoring optimization perfomance labels Apr 1, 2025

Fizzixnerd force-pushed the fizzixnerd/expr-bench branch from dcbc4f6 to 7b36bcc Compare April 2, 2025 14:36

Fizzixnerd added 10 commits April 2, 2025 10:46

kimchi: benchmark expression framework

1d31dd8

kimchi: fix typo

4d81a31

kimchi: enable feature flags unconditionally

5c27623

kimchi: refactor to use ToPrimitive

944c368

kimchi: fix typo

03a6594

kimchi: remove dyn and replace with impl

cfa8d6c

kimchi: always compute evaluations over D8

2afcd9f

kimchi: Challenge -> Challenges

935d29c

kimchi: add evaluations_iter

1c8f5e7

kimchi: test and bench new evaluations_iter

01dab68

Fizzixnerd force-pushed the fizzixnerd/expr-bench branch from 7b36bcc to 01dab68 Compare April 2, 2025 14:54

Fizzixnerd marked this pull request as ready for review April 2, 2025 14:59

Fizzixnerd added 6 commits April 2, 2025 12:28

kimchi: add par_collect to EvaluationsIter

180ac3f

kimchi: cargo fmt

1301abc

kimchi: improved interface to par_collect; clippy

8173fe0

kimchi: switch evaluations to new iterator version and resulting de…

3b53d40

…ad code

kimchi: cargo fmt

444fd1d

kimchi: cargo +nightly fmt

9926ccc

Remove useless tests

a79271e

Fizzixnerd added 3 commits April 2, 2025 13:33

kimchi: clippy appeasement

fa6dd49

kimchi: cargo fmt _again_

fe8d42c

kimchi: clippy and fmt test_expr.rs

7a8aace

marcbeunardeau88 reviewed Apr 3, 2025

View reviewed changes

kimchi/src/circuits/expr.rs Outdated Show resolved Hide resolved

Fizzixnerd added 9 commits April 3, 2025 10:38

update flake.lock

270ab89

kimchi: compute over different domains

b4e61ae

kimchi: remove value_ and compute domain better

e868d3b

kimchi: fix typo

30303a4

kimchi: remove unneeded &mut

12a6e23

kimchi: cargo fmt

384faad

kimchi: Fix rest of tests

b2270df

kimchi: cargo fmt

4b12394

Merge remote-tracking branch 'origin/master' into fizzixnerd/expr-bench

47f489f

Fizzixnerd requested a review from marcbeunardeau88 April 7, 2025 17:19

dannywillems requested a review from Copilot April 7, 2025 17:39

Copilot AI reviewed Apr 7, 2025

View reviewed changes

marcbeunardeau88 reviewed Apr 8, 2025

View reviewed changes

marcbeunardeau88 previously approved these changes Apr 8, 2025

View reviewed changes

WIP

0fa4e82

dannywillems requested changes Apr 9, 2025

View reviewed changes

dannywillems reviewed Apr 9, 2025

View reviewed changes

Fizzixnerd dismissed marcbeunardeau88’s stale review via 0fa4e82 May 6, 2025 06:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kimchi: Improve memory usage and parallelism of `expr.evaluations` #3127

kimchi: Improve memory usage and parallelism of `expr.evaluations` #3127

Uh oh!

Fizzixnerd commented Apr 1, 2025 •

edited

Loading

Uh oh!

Fizzixnerd left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fizzixnerd Apr 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fizzixnerd Apr 1, 2025

Uh oh!

Fizzixnerd Apr 1, 2025

Uh oh!

Fizzixnerd commented Apr 2, 2025

Uh oh!

Fizzixnerd commented Apr 2, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

marcbeunardeau88 Apr 8, 2025

Uh oh!

dannywillems commented Apr 9, 2025

Uh oh!

dannywillems left a comment •

edited

Loading

Uh oh!

dannywillems Apr 9, 2025

Uh oh!

Uh oh!

		assert_eq!(evals1.len(), evals2.len());
		assert_eq!(evals1[0], evals2[0]);

		@@ -0,0 +1,80 @@
		use criterion::{criterion_group, criterion_main, Criterion};

kimchi: Improve memory usage and parallelism of expr.evaluations #3127

Are you sure you want to change the base?

kimchi: Improve memory usage and parallelism of expr.evaluations #3127

Uh oh!

Conversation

Fizzixnerd commented Apr 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Goals

Testing

Runtime

Memory

Runtime Results

Runtime Analysis

Memory Results

MASSIF RESULTS

OLD TOP RESULTS

Memory Analysis

MASSIF RESULTS

OLD TOP ANALYSIS

Uh oh!

Fizzixnerd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fizzixnerd Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fizzixnerd Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Fizzixnerd Apr 1, 2025

Choose a reason for hiding this comment

Uh oh!

Fizzixnerd commented Apr 2, 2025

Uh oh!

Fizzixnerd commented Apr 2, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

marcbeunardeau88 Apr 8, 2025

Choose a reason for hiding this comment

Uh oh!

dannywillems commented Apr 9, 2025

Uh oh!

dannywillems left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dannywillems Apr 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kimchi: Improve memory usage and parallelism of `expr.evaluations` #3127

kimchi: Improve memory usage and parallelism of `expr.evaluations` #3127

Fizzixnerd commented Apr 1, 2025 •

edited

Loading

dannywillems left a comment •

edited

Loading