Skip to content

Share UCS splits using new constructs#457

Closed
chengluyu wants to merge 25 commits into
hkust-taco:hkmc2from
chengluyu:ucs/let-split
Closed

Share UCS splits using new constructs#457
chengluyu wants to merge 25 commits into
hkust-taco:hkmc2from
chengluyu:ucs/let-split

Conversation

@chengluyu
Copy link
Copy Markdown
Member

@chengluyu chengluyu commented Apr 13, 2026

During UCS normalization, when the same alternative split appears on both the positively (+) and negatively (-) specialized splits of a match, the old implementation duplicated the code. This PR introduces a way to let-bind splits to share the common splits instead.

New Split constructs

Two new Split variants were introduced:

  • LetSplit(sym, tail): declares a named split. The symbol's body holds the shared split.
  • UseSplit(sym): references a previously declared named split.

A new symbol class SplitSymbol holds the shared body and an optional LabelSymbol for lowering.

Changes to Normalization

Method normalize

normalizeImpl returns (Split, Set[SplitSymbol]): the normalized split plus any SplitSymbols whose UseSplit references survive but whose LetSplit hasn't been created yet. This lets LetSplit be placed at the lowest common ancestor of all UseSplit references.

A LetSplit candidate is created only when:

  • The alternative is not trivial (End or UseSplit), and
  • The alternative doesn't reference the same scrutinee (which would make positive/negative specialization produce different results on each side).

After normalization, the existing patMatConsequentSharingThreshold policy decides whether the candidate is kept as LetSplit or inlined. The decision scales with the surviving UseSplit count, so small alternatives can still be shared when they are reused enough.

Method specialize

specialize returns Opt[Split] (N = unchanged, S(...) = changed), so UseSplit references are preserved when specialization doesn't change the split.

Changes to lowering

lowerSplit translates LetSplit to a Label block and UseSplit to Break. When the continuation is Ret/Thrw, the label is used directly to preserve tail-call position. Otherwise, an exit label with a temp variable is used.

What's removed?

  • createLabelsForDuplicatedBranches and the Labels class (old label-based deduplication) are gone!

What remains configurable?

  • patMatConsequentSharingThreshold and the :patMatConsequentSharingThreshold test command are still supported. They now control whether a normalized join-point candidate survives or is inlined.

# Conflicts:
#	hkmc2/shared/src/test/mlscript/block-staging/Functions.mls
#	hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls
#	hkmc2/shared/src/test/mlscript/ucs/general/LogicalConnectives.mls
#	hkmc2/shared/src/test/mlscript/ucs/normalization/Deduplication.mls
Copy link
Copy Markdown
Contributor

@LPTK LPTK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes, thanks!

Please avoid duplicating specialized splits when they're used several times with the same specialization. (Later, we might also try to preserve more sharing between specializations.)

Also, please add tests for these cases (sharing of specialized splits and lack thereof).

//│ set arg$Some$0$ = s⁰.value⁰;
//│ match arg$Some$0$
//│ 1 =>
//│ begin
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME why was there a begin left?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stale begin is no longer present in the current golden output; this block now lowers directly to the match/end shape shown in ClassMatching.mls.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not asking you what's currently the case. I'm asking you, LUYU @chengluyu (not ChatGPT) why we used to have a begin there. I just want to understand that, as it's surprising to me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but it looks like it's related to the previous consequent sharing logic.

At that time, how it handled pure expression ifs was to always first place the results into a tmp var, and then pass that tmp into subsequent tails.

But now we will check whether the consequents are actually used. In this case, since nothing uses 0 or 1, it simply won't create a block and place them into tmp var

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So do we still observe begins in such positions in some other cases?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking because these should probably have been flattened.

Copy link
Copy Markdown
Member Author

@chengluyu chengluyu May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Asking because these should probably have been flattened.

I don't know why they were not flattened.

So do we still observe begins in such positions in some other cases?

The begin here looks weird indeed. But it's fixed by this commit anyway. Do we really need to dig into this?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not.

Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Outdated
Comment thread hkmc2/shared/src/test/mlscript/block-staging/Functions.mls Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Outdated
Comment thread hkmc2/shared/src/test/mlscript/opt/AbortivePrefix.mls Outdated
Comment thread hkmc2/shared/src/test/mlscript/ucs/normalization/DeduplicationWhile.mls Outdated
Comment thread hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls Outdated
@LPTK LPTK mentioned this pull request Apr 15, 2026
9 tasks
chengluyu and others added 11 commits April 18, 2026 09:53
In `normalizeImpl`, when the alternative references the same scrutinee
as the current branch, the alternative is duplicated and specialized
separately in `+` and `-` modes. When both specializations agree, the
two duplicates could in principle be shared via a single `LetSplit`.
That detection is not yet implemented; mark the bail-out site with a
TODO and add `SpecializedSplitSharing.mls` covering both:

- cases where the duplication is wasteful (disjoint sibling classes
  like `Cat()`/`Dog()` — both specializations agree, sharing would
  suffice);
- cases where the duplication is genuine (refining patterns like
  `S(0)`/`S(_)` — positive and negative specialization yield
  different splits, sharing would be incorrect).

The latter group guards against a future over-eager sharing rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `Split.Else` branch of the `++` extension was annotated as
impossible but silently discarded `those`. Replace the comment with a
`softAssert(false)` so that the invariant is enforced at runtime and
visible in stack traces if it ever breaks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`JoinPointCtx` tracks the pending set of `SplitSymbol`s awaiting a
`LetSplit` placement — a piece of state genuinely local to the walk.
Stashing the sharing threshold in the same case class forced a
plumbing trick where an inner `normalize` call had to reconstruct the
context with `JoinPointCtx.withThreshold` just to carry a config
option across an invocation boundary.

The threshold is the only `Config` field `normalize` needs, and the
caller already has `Config` in scope. Just add `Config` to the class
constructor's `using` clause so all methods can read it via the
global `config` helper, drop the parameter from `apply`, and simplify
`JoinPointCtx` back to the symbol set it was originally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`lowerSplit` used to wrap every `LetSplit` whose continuation was not
`Ret`/`Thrw` in an exit-label + temp-var shim, so that a fall-through
value would be assigned and the original `cont` run once. When every
leaf of both the body and the tail already transfers control away
(explicit `return`/`throw`, `UseSplit` → `Break`, or `End` → `throw`),
no such fall-through value exists, and the wrapper only contributes
noise to the lowered IR.

Add a `Split.alwaysTerminates` helper and extend the tail-position
fast path to reuse the caller's `cont` whenever both sides are
abortive. The `opt/AbortivePrefix.mls` golden output drops its unused
outer `σ$x` label accordingly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After specialization the consequent often retains only one `UseSplit`
for the shared alternative. Emitting a `LetSplit` in that case costs a
label + break that buys nothing: the final split has one goto site in
the consequent and one fall-through at the tail, so inlining produces
exactly two copies of the body with no label overhead.

Count the surviving references with a new `Split.countUseSplit`
helper and gate the `shouldShare` decision on `refCount > 1` in
addition to the existing size-threshold check. When the guard fires
we inline via `inlineUseSplit`, matching the simpler output the
advisor flagged on `DeduplicationWhile.mls:20`.

Side effects on golden outputs:
* `TailRecFormerFailure.mls`: threshold-1 case now inlines, since
  `refCount` is 1 after specializing `Cons(_, Nil)`. Comment updated
  to explain why the low threshold no longer forces sharing.
* `ExcessiveDeduplication.mls`: the small duplicated-tail test is now
  inlined; the comment block is updated separately.
* `AbortivePrefix.mls`: the inner `σ` disappears for the same reason.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "what we want is a way of let-binding split tails" paragraph
described an aspiration that `LetSplit`/`UseSplit` now implement. The
"Currently, dedup' is implemented by hash-consing" preface is also
obsolete. Rewrite both to describe the current behavior: sharing
happens at normalization time, and refCount-1 cases (like this one)
are inlined rather than hoisted behind a label.

While here, replace the mis-leading "horrendous code (before
optimization)" label on the `if false do ()` case with a note that
the redundant `set scrut = false` survives lowering and is removed by
the IR optimizer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`ReflectionInstrumenter` does not yet support `Label` blocks, which
`LetSplit` lowers to. The existing `match2` test in this file no
longer triggers that path: the `refCount > 1` guard introduced with
the `LetSplit`-inlining rewrite keeps its single-use shared tail
inlined under any threshold.

Add a dedicated block that forces the path with
`:patMatConsequentSharingThreshold 1` and a pattern whose LetSplit
body is genuinely reachable from two places, so the known
staged-module limitation stays visible in the test suite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts:
#	hkmc2/shared/src/test/mlscript/block-staging/Functions.mls
#	hkmc2/shared/src/test/mlscript/deforest/basic.mls
#	hkmc2/shared/src/test/mlscript/deforest/determinism.mls
#	hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors UCS normalization’s consequent-sharing logic by introducing explicit join points in the Split IR (LetSplit / UseSplit) and a new SplitSymbol, replacing the prior “label-based deduplication after the fact” approach. It updates normalization/specialization and lowering accordingly, and refreshes many golden tests to match the new control-flow shape.

Changes:

  • Add join-point constructs to Split (LetSplit, UseSplit) plus SplitSymbol to represent shared split bodies.
  • Rework UCS normalization/specialization to preserve and share repeated specialized alternatives via join points.
  • Update lowering and a large set of snapshot tests to reflect the new Label/Break-based control flow.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Implements join-point creation, specialization preservation, and new lowering rules for LetSplit/UseSplit.
hkmc2/shared/src/main/scala/hkmc2/semantics/Symbol.scala Introduces SplitSymbol to carry shared split bodies and (during lowering) an optional LabelSymbol.
hkmc2/shared/src/main/scala/hkmc2/semantics/Split.scala Adds LetSplit/UseSplit variants and updates basic split utilities (clone/duplicate/fullness/printing/etc.).
hkmc2/shared/src/main/scala/hkmc2/semantics/Resolver.scala Extends term-gathering traversal to account for LetSplit/UseSplit.
hkmc2/shared/src/main/scala/hkmc2/invalml/InvalML.scala Updates split typing to handle join-point constructs.
hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala Extends split quasiquote construction (quoteSplit) with new split variants.
hkmc2/shared/src/test/mlscript/ucs/normalization/SpecializedSplitSharing.mls New test covering sharing of repeated specialized split bodies.
hkmc2/shared/src/test/mlscript/ucs/normalization/LetSplitJoinPoints.mls New test suite covering when join points should/shouldn’t be introduced and preserved.
hkmc2/shared/src/test/mlscript/ucs/normalization/SimplePairMatches.mls Snapshot update (JS output formatting / match-error throw formatting).
hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls Snapshot update and updated commentary reflecting join-point-based sharing.
hkmc2/shared/src/test/mlscript/ucs/normalization/DeduplicationWhile.mls Snapshot update for while-loop lowering shape under new join-point handling.
hkmc2/shared/src/test/mlscript/ucs/normalization/Deduplication.mls Snapshot updates for deduplication code shape and temp variable naming.
hkmc2/shared/src/test/mlscript/ucs/general/JoinPoints.mls Snapshot update (JS formatting).
hkmc2/shared/src/test/mlscript/opt/AbortivePrefix.mls Snapshot update reflecting removal of old split-root/default label structure.
hkmc2/shared/src/test/mlscript/llir/ControlFlow.mls Removes prior failing expectation and replaces with explicit LLIR output snapshot.
hkmc2/shared/src/test/mlscript/deforest/todos.mls Snapshot update reflecting new lowering shape (no split_root/default label blocks).
hkmc2/shared/src/test/mlscript/deforest/determinism.mls Snapshot update reflecting join-point-related lowering changes and symbol renaming.
hkmc2/shared/src/test/mlscript/deforest/basic.mls Snapshot update reflecting join-point-related lowering changes and symbol renaming.
hkmc2/shared/src/test/mlscript/codegen/TailRecFormerFailure.mls Snapshot + commentary update about tail-call behavior with new sharing rules.
hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls Snapshot updates to JS output shape and temp naming.
hkmc2/shared/src/test/mlscript/codegen/ClassMatching.mls Snapshot updates for lowered/optimized IR and JS output temp naming.
hkmc2/shared/src/test/mlscript/block-staging/Functions.mls Adds a :fixme documenting staged-module incompatibility with Label/Break emitted by join points.
hkmc2/shared/src/test/mlscript-compile/Predef.mjs Snapshot update reflecting different compilation output (temp/lambda naming and early returns).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/Split.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala
@LPTK LPTK added the slopfest label May 6, 2026
Copy link
Copy Markdown
Member Author

@chengluyu chengluyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Next, I will implement the deduplication for the consequent and alternative splits of disjunctive patterns. There is some logic over there that might be shared.

Comment thread hkmc2/shared/src/test/mlscript/codegen/CurriedClasses.mls
Comment thread hkmc2/shared/src/test/mlscript/codegen/CurriedFunctions.mls
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/Split.scala Outdated
Comment thread hkmc2/shared/src/main/scala/hkmc2/semantics/Split.scala Outdated
/** Whether every leaf of `split` unconditionally transfers control away
* (no fall-through). `End` compiles to `throw`, explicit `return`/`throw`
* terms are abortive, and `UseSplit` transfers to its referenced body. */
private def alwaysTerminates: Bool = split match
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is inadequate. Termination obviously cannot be determined syntactically.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Termination" is not appropriate here. It means either return or throw.

// `return`/`throw`, `UseSplit` → break, or `End` → throw), no value
// can flow into `cont`, so the exit-label wrapper is unnecessary even
// when `cont` is not a TailOp.
if (cont eq Ret) || (cont eq Thrw) || (form is IfLikeForm.While) ||
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this using brittle eq tests instead of isInstanceOf[TailOp]?

Comment on lines +457 to +460
// Other continuations (including ImplctRet, which generates `expr;`
// without `return`) can fall through the Label body into the rest.
// Wrap with an exit label and temp variable so every path stores its
// result, breaks to exitLabel, then the original cont runs once.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all the code and comments compressed to 80 columns? 100 is more appropriate.

dfltRewritten.fold(restRewritten)(Begin(_, restRewritten)) |> some, rest)
case _ => m
case b => b

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👎

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do a newline check in the end.

case Split.UseSplit(s) => if s eq sym then 1 else 0

/** Whether every leaf of `split` unconditionally transfers control away
* (no fall-through). `End` compiles to `throw`, explicit `return`/`throw`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this comment say that "End compiles to throw"? It's clearly not true. Does it want to say something more specific in this special context?

// sharing unsound). Duplicate and specialize separately.
// TODO: detect when both specializations agree and share even here.
val positiveSplit = consequent ++ alternative.duplicate
val (whenTrue, trueRefs) = normalize(specialize(positiveSplit, +, scrutinee, pattern).getOrElse(positiveSplit))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specialize is now specified to return N when no chnage was made. But this info is simply discarded here. Wasn't it supposed to be used to potentially share the split here? (I guess it would be possible when both positive and negative specialization return N.)

else if split.isFallback then
log(s"Case 1.1.3: $pattern is unrelated with $thatPattern")
rec(tail)
S(rec(tail).getOrElse(tail))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really the correct logic? Why do we discard the lack of change and replace it wirh S, here? This is an open question; I just want to understand.

// === Join point cases ===


// * Join point: non-trivial Else alternative on different scrutinee
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testcase comment says there is a join point, but I don't see one. What's going on?

//│ —————————————————| Output |—————————————————————————————————————————————————————————————————————————
//│ = "neither"

// * Join point: multi-branch alternative, different scrutinee
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

// * Join point: deeply nested class patterns share the default branch
data class S(value)

:sir
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

//│ —————————————————| Output |—————————————————————————————————————————————————————————————————————————
//│ = "default"

// * Join point: default branch with a computation should not be duplicated
Copy link
Copy Markdown
Contributor

@LPTK LPTK May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has nothing to do with the presence of a computation; what matters is the term size.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this test file should use bigger consequents to share or a different sharing threshold, as they don't demonstrate what they purport to.

Comment on lines +101 to +104
//│ return [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]
//│ end
//│ else
//│ return [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really supposed to be duplicated?

(This might be related with the future improvement what you told me before; if so, pls document it.)

@LPTK
Copy link
Copy Markdown
Contributor

LPTK commented May 12, 2026

Subsumed by #486.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants