Skip to content

Conversation

@tgeoghegan
Copy link
Contributor

@tgeoghegan tgeoghegan commented Oct 22, 2025

We can't be certain that this works yet, because we don't yet have the Ligero prover set up which would produce a serialized proof we could compare against a test vector. Still, we can start code review. In parallel, I'm going to work on:

  • making sumcheck::Proof::new also emit the actual witness vector, so that we can extend the test on sumcheck::constraints::ProofConstraints to check that the constraints it generated are consistent with witness values
  • see if I can modify longfellow-zk to spit out a test vector of constraints (a vector of (c, j, k) triples for linear LHS, the b vector for linear RHS and a vector of (x, y, z) triples for the quad constraints) so we can test against that

I also suspect that in the near future, we'll want to move some modules or items out of the sumcheck module, because things like constraints and the witness layout will also be used by the Ligero prover and verifier. longfellow-zk has some of this stuff in a zk-common module, which could work for us.

@tgeoghegan tgeoghegan requested a review from a team as a code owner October 22, 2025 19:02
);

// Transcripts should have received the same sequence of writes.
assert!(proof_transcript.compare_state(&constraint_transcript));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is failing, so I must be missing a write to the transcript somewhere in the constraint generator. Or perhaps I'm wrong to expect the two transcripts to be in the same state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see why this assertion fails: on the sumcheck prover side, we write all inputs (public and private) to the transcript after doing the 3.1.3 transcript stuff. If I remove that write from Proof::new, then the transcript states sync up as expected.

So this means we have to resolve the inconsistency between sumcheck/prover.h and zk/zk_prover.h so that the proof test vector won't incorporate private inputs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I refactored the sumcheck prover so we can toggle whether it runs in zk_test.cc compat mode, enabling both tests to pass.

for (layer_index, (circuit_layer, proof_layer)) in
circuit.layers.iter().zip(proof.layers.iter()).enumerate()
{
// Choose alpha and beta for this layer
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bunch of this is exactly the same as in Proof::new, and could be factored out into something akin go longfellow-zk's begin_layer and end_layer methods. But then again it's nice to surface everything that's going on, and enable readers to compare it to the specification.

@tgeoghegan tgeoghegan force-pushed the timg/sumcheck-constraints branch from d6c6f7f to 6defa7e Compare October 27, 2025 14:33
@tgeoghegan tgeoghegan requested review from a team and divergentdave October 27, 2025 14:33
Copy link
Member

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Comment on lines 215 to 220
for term in &mut constraints.linear_constraint_lhs {
if term.constraint_number == layer_index {
term.constant_factor *=
FE::lagrange_basis_polynomial_i(FE::ONE, challenge[0]);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think mutating constraints after adding their constituent parts to the data structure makes this harder to follow and line up with the spec. What if we accumulated all the coefficients we need in a temporary data structure, and then only added the constraint after we had looped through all sumcheck rounds, so that we have computed all the known parts?

The per-layer constraint is claim = Q * VL * VR, where claim depends on all the polynomial evaluation pad values for this layer, plus the VL and VR pad values from the last layer, if applicable. Likewise, VL and VR depend on the VL and VR pad values from this layer, since they are the sum of the respective fields of the proof and pad. So, we could build up a representation of the symbolic claim expression as a dense list of 4*logw coefficients for polynomial evaluation pad values, plus 2 coefficients for VL/VR pad values, plus a constant term, to track the claim through the two nested loops. Then, we could combine that, along with a few more expressions for the VL and VR of the current layer, and add the whole layer constraint in one go. Note that we'd only need to compute and use the witness index values at the end as well, when adding to constraints. Does this sound like a useful refactoring?

Comment on lines 6 to 10
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct SymbolicExpression<FieldElement> {
constraint_number: usize,
terms: Vec<SymbolicTerm<FieldElement>>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, a potential footgun: the derived equality comparisons would say that two SymbolicExpression values are different, even if the mathematical expressions they represent are equal, if they distribute the constant term differently between the known fields of different SymbolicTerms. I assume we're only using the PartialEq implementation in tests, and they're passing, so this may not matter in practice. (Putting my optimization hat on, having only one FieldElement for the constant term in an expression would save memory)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we could say the same about how PartialEq is sensitive to the ordering of symbolic terms too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I only derived PartialEq, Eq here out of reflex, because I generally find that I want all four of these traits on every struct I define. But nothing breaks if we remove PartialEq, Eq: the tests check the result of the lhs_terms() and known() methods. So I removed the derived traits to avoid the semantic ambiguity you pointed out.

Putting my optimization hat on, having only one FieldElement for the constant term in an expression would save memory

Good call. I was able to achieve this by refactoring a bit more. SymbolicExpression now sums known parts into a single field element, and represents the symbolic parts using a new Symbolic struct, private to the symbolic module, so there's no change to the users.

Comment on lines 10 to 11
/// - polynomials at each layer (2 * 2 * logw elements per circuit layer)
/// - vl, vr and vl * vr for each layer of the circuit (three elements per circuit layer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification:

Suggested change
/// - polynomials at each layer (2 * 2 * logw elements per circuit layer)
/// - vl, vr and vl * vr for each layer of the circuit (three elements per circuit layer)
/// - one-time-pad for polynomials at each layer (2 * 2 * logw elements per circuit layer)
/// - one-time-pad for vl, vr and vl * vr for each layer of the circuit (three elements per circuit layer)

Comment on lines 294 to 307
final_claim += SymbolicTerm::from_known(
claims[0] + gamma * claims[1]
- public_inputs
.iter()
.zip(eq2.iter())
.fold(FE::ZERO, |sum, (public_input_i, eq2_i)| {
sum + *public_input_i * eq2_i
}),
);

constraints
.linear_constraint_lhs
.extend(final_claim.lhs_terms());
constraints.linear_constraint_rhs.push(final_claim.known());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be misleading to put this field element in the final_claim known part, then put it in the constraint RHS. While the end result is correct, the semantics of final_claim just before we destructure it don't match up with what the protocol is actually doing. The SymbolicExpression represents some $ax+b$, and if we require the expression to equal zero, then the resulting constraint would be $ax=-b$. We could either flip the sign of the constant term we add up front, and then negate final_claim.known() before adding it to the RHS, or we could just track the RHS separately, giving it its own name.

Suggested change
final_claim += SymbolicTerm::from_known(
claims[0] + gamma * claims[1]
- public_inputs
.iter()
.zip(eq2.iter())
.fold(FE::ZERO, |sum, (public_input_i, eq2_i)| {
sum + *public_input_i * eq2_i
}),
);
constraints
.linear_constraint_lhs
.extend(final_claim.lhs_terms());
constraints.linear_constraint_rhs.push(final_claim.known());
let rhs = (
claims[0] + gamma * claims[1]
- public_inputs
.iter()
.zip(eq2.iter())
.fold(FE::ZERO, |sum, (public_input_i, eq2_i)| {
sum + *public_input_i * eq2_i
}),
);
constraints
.linear_constraint_lhs
.extend(final_claim.lhs_terms());
constraints.linear_constraint_rhs.push(rhs);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is because here we're computing the final claim:

SUM_{i} (eq2[i + npub] * sym_private_inputs[i])
      - sym_layer_pad.vl
      - gamma * sym_layer_pad.vr
    =
      - SUM_{i} (eq2[i] * public_inputs[i])
      + claims[0]
      + gamma * claims[1]

...and the spec has already done the work for us of rearranging everything into an LHS of purely symbolic parts and an RHS of purely known parts. I think it'd fit more neatly into the SymbolicExpression abstraction if it were expressed as

SUM_{i} (eq2[i + npub] * sym_private_inputs[i]) - SUM_{i} (eq2[i] * public_inputs[i])
     + claims[0] - sym_layer_pad.vl
     + gamma * claims[1] - gamma * sym_layer_pad.vr
    = 0

...but then it'd be harder to line up our implementation with the specification, and I think that has a lot of value. It's possible that we could change the spec's descriptions of the per-layer and final constraints so that there's consistency across how they're specified and how we implement them. For now I'll go with your suggestion.

Copy link
Member

@jcjones jcjones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SymbolicExpression refactoring is great, well done. Feels way more maintainable, too.

The new outstanding issues look fine.

I think this is is good to go. Thanks for letting us tear into it so thoroughly, I'm pleased with the result!

@tgeoghegan tgeoghegan force-pushed the timg/sumcheck-constraints branch from f2c1da5 to 4be42ea Compare October 29, 2025 16:41
Implements generation of linear and quadratic constraints from a padded
sumcheck proof, specified in 6.6 [1] and 4.4 [2].

- `fields`: for efficient computation of Lagrange basis polynomials, we
  add trait `LagrangePolynomialFieldElement`, which defines methods for
  the basis polynomial denomimators and methods for evaluating basis
  polynomials 0, 1, and 2.
- `sumcheck/witness`: computes the layout of the witness vector, needed
  for symbolic manipulation of quantities in constraints and eventually
  for construction of Ligero commitment and proof.
- `sumcheck/symbolic`: allows computing symbolic expressions consisting
  of terms of the form `a + bx`, where `a` is a known quantity, `x` is
  some unknown element of the witness vector and `b` is a constant scale
  factor.
- `sumcheck/bind`: implements the `bindeq` function of 6.2 [3].
- `sumcheck/mod`: sumcheck prover now emits the witness vector and
  Fiat-Shamir transcript so they may be used to validate constraints.
- `sumcheck/constraints`: implements the constraint generation of 6.6.

In order to valiate the correctness of all this, we also modify
`longfellow-zk` to emit a Ligero commitment and constraints when running
`zk_test.cc` [4]. More work on test vector generation from C++ is
needed.

[1]: https://datatracker.ietf.org/doc/html/draft-google-cfrg-libzk-01#section-4.4
[2]: https://datatracker.ietf.org/doc/html/draft-google-cfrg-libzk-01#section-6.6
[3]: https://datatracker.ietf.org/doc/html/draft-google-cfrg-libzk-01#section-6.2
[4]: https://github.com/tgeoghegan/longfellow-zk/tree/constraint-test-vector
@tgeoghegan tgeoghegan force-pushed the timg/sumcheck-constraints branch from 4be42ea to 61bb982 Compare October 29, 2025 16:48
@tgeoghegan tgeoghegan merged commit 6530ff5 into main Oct 29, 2025
3 checks passed
@tgeoghegan tgeoghegan deleted the timg/sumcheck-constraints branch October 29, 2025 16:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants