Skip to content

Add a mathematical constraint system#8816

Open
kripken wants to merge 45 commits into
WebAssembly:mainfrom
kripken:constraint.by.itself
Open

Add a mathematical constraint system#8816
kripken wants to merge 45 commits into
WebAssembly:mainfrom
kripken:constraint.by.itself

Conversation

@kripken

@kripken kripken commented Jun 8, 2026

Copy link
Copy Markdown
Member

This allows defining constraints like { x >= 0 && x <= 100 } and to then check if they
imply something else is true or false, like { x >= 0 && x <= 100 } => { x < 9999 }
(example of a valid inference).

This is the minimal first part of such a system, focusing on ==, !=, and very simple
solving. Putting up for design feedback before I work in depth on the rest.

Next steps are to add >=, < etc., and to add a pass that uses this in a control-flow
aware way, that is, the goal is to optimize things like

if (x > 10) {
   assert(x > 0); // this can be removed
}

This is important to remove userspace bounds checks for Kotlin (and likely Java).

inplace_vector part here is from #8814 (will rebase once it lands).

@kripken kripken requested a review from a team as a code owner June 8, 2026 23:12
@kripken kripken requested review from stevenfontanella and removed request for a team June 8, 2026 23:12

@tlively tlively left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I highly recommend explicitly framing the constraint space as a lattice:

  1. Both and_ and fuzzyOr are effectively merging constraints. You want both (but especially fuzzyOr) to have all the properties of a lattice join operator: monotonicity, associativity, commutativity, idempotency, etc. You also want fuzzyOr to be as precise as possible; it has to lose some precision sometimes, but you only want it to lose as much precision as necessary given the representation of constraints. So you want it to be a least upper bound, i.e. a join.
  2. Making the constraint space a lattice will give you all the nice properties you want for using it in a program analysis: order-independence, guaranteed convergence, etc. It also reduces all the novelty and complexity to just generating the constraints in the first place; getting to the fixed point after that is just the classic worklist + graph traversal pattern.
  3. Making the constraint space a lattice will let you test it in the lattice fuzzer, which can do a better job than just unit tests alone of making sure it has all the properties we want, including that we do not unnecessarily lose precision in the merge operation.

Comment thread src/ir/constraint.h

// We limit constraints to a low number to ensure good performance even with
// simple brute-force solving.
// TODO: use a generic constraint solver..?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did have that POC for pulling in Z3. In the limit I guess that's what we'd want. 5c2bbb7

Comment thread src/ir/constraint.h Outdated
// { this } => { condition }
//
// https://en.wikipedia.org/wiki/Material_conditional#Truth_table
Result check(const Constraint& condition) const;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps proves or implies?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah. Another option is eval as @MaxGraey suggests?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more on this, I think that x.implies(y) is not quite right, as it reads as 'x implies y' i.e. x is not checking if it implies y, but sounds like a new constraint, that makes x somehow imply y. Ditto for x.proves(y).

So checkImplies might work, but is longer than check? I am leaning towards check or eval.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer some variant of proves or implies. I don't think those names sound like they're adding new constraints, but something like checkImplication would also be fine with me. eval does not suggest the correct operation to me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(FWIW, the tests use "proves" in their comments)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I don't feel strongly. Renamed to proves.

@MaxGraey

MaxGraey commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

That's awesome!

Have you considered more academic and conventional naming for lattice-like stuff?

Value -> Term
Result -> KnownTruth
ConstraintSet -> Conjunction

check(conj) -> eval(conj)
and_(conj) -> meet(conj) / meetWith(conj)
fuzzyOr(conj) -> join(conj) / joinWith(conj)

or something like this?

@kripken

kripken commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

@tlively Definitely making this a lattice would have benefits, but it would add overhead and complexity, I worry. Specifically, having a limited capacity (number of constraints in a set), as in the current design, is really nice for efficiency, but makes it not a lattice. Here is a concrete example. For a lattice we need this absorption law: (a ^ b) v b == b. Take

a = { x >= 10 && x <= 20 }  ;; span of numbers: 10, 11, .., to 20
b = { x & 1 }               ;; all odd numbers

a ^ b should be the set of odd numbers in that range, i.e., 11, 13, .., 19. However, that can't be written if the capacity is 2. So a ^ b loses something. That doesn't mean it isn't useful! We can define a ^ b to contain any 2 of the 3 constraints being combined (this can prove fewer things, but more than nothing). E.g. a ^ b = a (just ignore b). But then

(a ^ b) v b == a v b != b

which breaks the absorption rule.

(This is sort of parallel to the issue with multiple constants in possible-constants - we only support one constant, not an arbitrary number. An arbitrary number is necessary for all the nice mathematical properties we want, but the overhead isn't worth it in GUFA.)

@kripken

kripken commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

@MaxGraey

Value -> Term

Good idea, I think that makes sense.

Result -> KnownTruth

I think this is clear enough already, and shorter?

ConstraintSet -> Conjunction

I left this intentionally vague as this may expand in the future. A set of constraints is, atm, a conjunction, but if we find a nice way to allow OR and not just AND, we should add it. The idea is, conceptually, a set of constraints that can prove things.

@MaxGraey

MaxGraey commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Btw binaryen already has some basic semi and full lattices: https://github.com/WebAssembly/binaryen/blob/main/src/analysis/lattice.h and https://github.com/WebAssembly/binaryen/blob/main/src/analysis/lattices/abstraction.h infra. So how about this?

class LowerBound : Lattice { ... }
class UpperBound : FullLattice { ... }
class RangeBound : FullLattice { ... }

@tlively

tlively commented Jun 10, 2026

Copy link
Copy Markdown
Member

Ok, the draft PR adding a general Constraint lattice (which I ended up calling Bound) is #8827 . I unfortunately got if through my skull that there's no way to support EQ, and NE constraints and maintain a lattice, though. The draft PR that adds the final utility that approximates the one in this PR is #8828.

@tlively

tlively commented Jun 10, 2026

Copy link
Copy Markdown
Member

FWIW, I'm thoroughly convinced that there's no need to use a proper lattice here. I'd even be fine not using a join semilattice if we really wanted to support EQ and NE constraints. The main thing that would make me happy would be if we could compose this utility out of general parts, e.g. separating BoundedConjunction and its join/meet logic from Constraint/Bound and its join/meet logic. Being able to read, understand, and test each part in isolation is the benefit I'm after.

@kripken

kripken commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

Ok, thanks for that code, that definitely helps me understand your point of view.

Yeah, it's unfortunate we can't use a proper lattice here. But limited capacity is just necessary, and goes against the point of a lattice.

And I think we do need !=, == here. Supporting simple math like that will get us null checks and other stuff very easily.

So what is left is your point about

Being able to read, understand, and test each part in isolation is the benefit I'm after.

I agree to that in general. But does this PR not do that already? Look at how short and sweet constraint.h is. It defines a Term, a Constraint, and an eval operation. No more.

constraint.cpp is, similarly, very clear and obvious: evalPair evals a pair, just implementing the math basically. Which part is difficult to follow or in need of improvement?

Put another way, the current code in this PR is really, a minimal mathematical constraint system. It just does that, and in a simple and modular way.

I am not opposed to a larger amount of code, as in your PRs, if there is a benefit. I still don't see what it is, though? The key eval/implies code is not improved, in particular.

And, look: in the end, a mathematical constraint system must do math. Math is not perfectly separable into composed lattices. If we want to do x >= 0 && x != 0 => x > 0 then we need to handle interactions between >=, !=. We can definitely do that in a nice way using modular code, but lattices don't help afaict (other things in math might, though - though I doubt we will need to get into such complexity).

@MaxGraey

MaxGraey commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

The main benefit of a lattice idiom is it provides a well-defined merge/join operation for dataflow analysis, guarantees convergence and makes it easier to compose. It also gives a common API and semantically defined behaviour that can be reused for other lattices. On upstream's lattices folder alread implemented inverted lattice which can use for invert conditions such as if (!(x >= 5 && x <= 10)). That's a good case how lattices can be composed and stacked. Also it seems we already have machinery for abstract eval such compositions of sub-lattices: https://github.com/WebAssembly/binaryen/blob/main/src/analysis/lattices/abstraction.h#L34

@MaxGraey

Copy link
Copy Markdown
Contributor

@juj btw it would be interesting to hear what you think about all this

Comment thread src/ir/constraint.cpp Outdated
Comment thread src/ir/constraint.cpp
Comment on lines +115 to +124
// If this is already implied by current constraints, then it is redundant.
// E.g. if we are { x = 10 } and other is { x >= 0 } then all we need is
// { x >= 0 } as the result of the OR.
if (eval(other) == True) {
*this = other;
return;
}
if (other.eval(*this) == True) {
return;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't handle the case where the constraints can be relaxed in both directions separately. For example:

{ x >= 2 /\ x <= 4 } \/ { x >= 1 /\ x <= 3 }

This should give { x >= 1 /\ x <= 4 }, but right now it just gives up.

This might be included in the // TODO smarts below, but I think it's important to see the full complexity here so we can get it factored as nicely as possible.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "see the full complexity"? I'm not sure what you are asking this PR to do aside from have the existing TODO.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting we resolve the TODO :)

But I guess I can just say in advance that the simplest way to do this will be in terms of a fuzzyOr operation on a pair of Constraints, so I guess we don't need to do it now.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I am trying to keep this PR small.

I added to the comment.

Comment thread src/ir/constraint.h Outdated
// A constraint: some operation and some value, like "is equal to 17" or "is
// less than local 6".
struct Constraint {
Abstract::Op op = Abstract::Invalid;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's much value in making invalid constraints representable (nor in reusing the Abstract enum). How about using a new enum that can be specific to this use case that does not have an Invalid member?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reusing Abstract is useful because we have code to parse IR into it. E.g. we need to parse AddInt32 into Abstract::Add (the next PR does this).

Without this reuse, we'd need to duplicate that code, or add a mapping of Abstract into a new enum.

I think this is exactly what Abstract is meant for: an abstract operation, without the details of a Type. This is precisely the right level of abstraction for a mathematical constraint system, mirroring the == etc. notation in math.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to reuse the parsing code 👍

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still avoid adding Abstract::Invalid, though.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should the default value be, then? (0 seems risky, as it is a valid code)

Note that we already have such invalids everywhere else, so this change just brings Abstract into parity:

InvalidUnary

InvalidBinary

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not have a default value. Users that need some placeholder can use std::optional. I'd argue that having the invalids everywhere else is an anti-pattern that we should be trying to eliminate and we never would have introduced the invalids if we had had std::optional from the beginning.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point now, though I'm not sure I agree - the invalid values are also nice for debugging, when you use an uninitialized value by mistake for example.

Anyhow, this PR is consistent with the current state of the codebase - how about leaving it as is, and considering your change later, separately?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, since it is so small a change here, I guess I can remove it and keep the entire discussion for later. Done.

Comment thread src/ir/constraint.h Outdated
Comment thread src/ir/constraint.h
Comment thread src/ir/constraint.h Outdated
// { this } => { condition }
//
// https://en.wikipedia.org/wiki/Material_conditional#Truth_table
Result check(const Constraint& condition) const;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would strongly prefer some variant of proves or implies. I don't think those names sound like they're adding new constraints, but something like checkImplication would also be fine with me. eval does not suggest the correct operation to me.

Comment thread src/ir/constraint.h Outdated
Comment thread src/ir/constraint.h Outdated
Comment on lines +100 to +102
// Add a constraint to the set, ANDed with the others. The caller must make
// sure not to add too many (i.e. it is invalid to call this when full()).
void and_(const Constraint& c) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to avoid burdening the caller with making this decision. We could just arbitrarily drop extra constraints, or allow the user to supply a heuristic for determining which to keep. The end result will be the same if we remove this burden from the caller, since the caller will otherwise have to make the same decisions.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, yeah, maybe that is better. Should it then be fuzzyAnd..?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Maybe "bounded" or "approximate" instead of "fuzzy?" Or alternatively we could use "join" and "meet" language to differentiate these operations from the precise logical connectives.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's wrong with "fuzzy", as a short and clear way to put it?

I prefer something with and/or (over meet/join) as it makes the connection to the logical operator clear. I.e. mathematical logic is the right math here, not algebraic groups or set theory 😉

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Fuzzy logic" is a completely different thing: https://en.wikipedia.org/wiki/Fuzzy_logic.

This is more about soundly over- or under-approximating our knowledge as we analyze the program.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair point. Ok, renamed to approximate*

Comment thread src/ir/constraint.cpp
Comment thread src/ir/constraint.cpp Outdated
Comment on lines +102 to +105
if (currResult == Unknown) {
// If something is unknown, it all is.
return Unknown;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, proving one of the other constraints False can take precedence over Unknown.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed.

Comment thread src/ir/constraint.cpp
Comment on lines +115 to +124
// If this is already implied by current constraints, then it is redundant.
// E.g. if we are { x = 10 } and other is { x >= 0 } then all we need is
// { x >= 0 } as the result of the OR.
if (eval(other) == True) {
*this = other;
return;
}
if (other.eval(*this) == True) {
return;
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm suggesting we resolve the TODO :)

But I guess I can just say in advance that the simplest way to do this will be in terms of a fuzzyOr operation on a pair of Constraints, so I guess we don't need to do it now.

Comment thread src/ir/constraint.h Outdated
// A constraint: some operation and some value, like "is equal to 17" or "is
// less than local 6".
struct Constraint {
Abstract::Op op = Abstract::Invalid;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can still avoid adding Abstract::Invalid, though.

@tlively tlively left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % comment nits and other notes.

Comment thread src/ir/constraint.cpp
Comment on lines +122 to +124
// Otherwise, just do not add this one.
// TODO: We could try to be clever and see if one of the existing ones makes
// more sense to drop.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining a total order on contraints that determines which one to drop would also make the analysis result less dependent on visitation order.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was considering doing that. It would allow "canonicalizing" the order of constraints too, for simpler matching. I'll see how that works in later PRs.

Comment thread src/ir/constraint.h Outdated
Comment thread src/ir/constraint.h Outdated
Comment thread src/ir/constraint.h Outdated
Comment thread src/ir/constraint.h Outdated
Comment thread test/gtest/constraint.cpp
Comment on lines +18 to +19
// We can't infer anything using an empty set.
EXPECT_EQ(s.proves(c), Unknown);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a great place to use property-based testing with FuzzTest (but I wouldn't block landing on that).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will get fuzzed with the pass later, but yes, interesting idea.

Comment thread src/ir/constraint.cpp
Comment on lines +116 to +117
void AndedConstraintSet::approximateAnd(const Constraint& c) {
if (size() < MaxConstraints) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also check the new constraint to see if it is contradicted by any of the existing constraints. In that case we could replace the whole constraint set with a bottom sentinel. This could be useful for detecting code that cannot possibly be executed. But I guess you're expecting the user code to check proves(c) == False before it gets joined in?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we could assert on that here - a contradiction means something has gone horribly wrong. At least that is the case in the pass I'll be using this from. You can prove anything from a contradiction...

But I'm not sure it's worth the overhead to check exhaustively for a contradiction here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I would say "horribly wrong." For example there could be two br_ifs in a row, one branching if x != 5 and the other branching if x != 10. I could imagine the analysis pass being structured such that the analysis continues after the second branch with bottom constraints. Surely we want to recognize that all the following code is unreachable in such cases, and I fear that if we don't do it here in the constraint system, we'll just have to do it separately in the analysis instead.

Part of the benefit of having a bounded conjunction is that the exhaustive checking for a contradiction has bounded expense, so I'm not too worried about that part.

But this can all be considered as part of a follow-up, of course. I wouldn't want to complicate the representation to include a bottom element if it won't be used.

kripken and others added 5 commits June 16, 2026 16:48
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Co-authored-by: Thomas Lively <tlively123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants