Share UCS splits using new constructs#457
Conversation
# Conflicts: # hkmc2/shared/src/test/mlscript/block-staging/Functions.mls # hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls # hkmc2/shared/src/test/mlscript/ucs/general/LogicalConnectives.mls # hkmc2/shared/src/test/mlscript/ucs/normalization/Deduplication.mls
LPTK
left a comment
There was a problem hiding this comment.
Nice changes, thanks!
Please avoid duplicating specialized splits when they're used several times with the same specialization. (Later, we might also try to preserve more sharing between specializations.)
Also, please add tests for these cases (sharing of specialized splits and lack thereof).
| //│ set arg$Some$0$ = s⁰.value⁰; | ||
| //│ match arg$Some$0$ | ||
| //│ 1 => | ||
| //│ begin |
There was a problem hiding this comment.
FIXME why was there a begin left?
There was a problem hiding this comment.
The stale begin is no longer present in the current golden output; this block now lowers directly to the match/end shape shown in ClassMatching.mls.
There was a problem hiding this comment.
I'm not asking you what's currently the case. I'm asking you, LUYU @chengluyu (not ChatGPT) why we used to have a begin there. I just want to understand that, as it's surprising to me.
There was a problem hiding this comment.
I'm not sure, but it looks like it's related to the previous consequent sharing logic.
At that time, how it handled pure expression ifs was to always first place the results into a tmp var, and then pass that tmp into subsequent tails.
But now we will check whether the consequents are actually used. In this case, since nothing uses 0 or 1, it simply won't create a block and place them into tmp var
There was a problem hiding this comment.
So do we still observe begins in such positions in some other cases?
There was a problem hiding this comment.
Asking because these should probably have been flattened.
There was a problem hiding this comment.
Asking because these should probably have been flattened.
I don't know why they were not flattened.
So do we still observe
begins in such positions in some other cases?
The begin here looks weird indeed. But it's fixed by this commit anyway. Do we really need to dig into this?
In `normalizeImpl`, when the alternative references the same scrutinee as the current branch, the alternative is duplicated and specialized separately in `+` and `-` modes. When both specializations agree, the two duplicates could in principle be shared via a single `LetSplit`. That detection is not yet implemented; mark the bail-out site with a TODO and add `SpecializedSplitSharing.mls` covering both: - cases where the duplication is wasteful (disjoint sibling classes like `Cat()`/`Dog()` — both specializations agree, sharing would suffice); - cases where the duplication is genuine (refining patterns like `S(0)`/`S(_)` — positive and negative specialization yield different splits, sharing would be incorrect). The latter group guards against a future over-eager sharing rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The `Split.Else` branch of the `++` extension was annotated as impossible but silently discarded `those`. Replace the comment with a `softAssert(false)` so that the invariant is enforced at runtime and visible in stack traces if it ever breaks. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`JoinPointCtx` tracks the pending set of `SplitSymbol`s awaiting a `LetSplit` placement — a piece of state genuinely local to the walk. Stashing the sharing threshold in the same case class forced a plumbing trick where an inner `normalize` call had to reconstruct the context with `JoinPointCtx.withThreshold` just to carry a config option across an invocation boundary. The threshold is the only `Config` field `normalize` needs, and the caller already has `Config` in scope. Just add `Config` to the class constructor's `using` clause so all methods can read it via the global `config` helper, drop the parameter from `apply`, and simplify `JoinPointCtx` back to the symbol set it was originally. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`lowerSplit` used to wrap every `LetSplit` whose continuation was not `Ret`/`Thrw` in an exit-label + temp-var shim, so that a fall-through value would be assigned and the original `cont` run once. When every leaf of both the body and the tail already transfers control away (explicit `return`/`throw`, `UseSplit` → `Break`, or `End` → `throw`), no such fall-through value exists, and the wrapper only contributes noise to the lowered IR. Add a `Split.alwaysTerminates` helper and extend the tail-position fast path to reuse the caller's `cont` whenever both sides are abortive. The `opt/AbortivePrefix.mls` golden output drops its unused outer `σ$x` label accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After specialization the consequent often retains only one `UseSplit` for the shared alternative. Emitting a `LetSplit` in that case costs a label + break that buys nothing: the final split has one goto site in the consequent and one fall-through at the tail, so inlining produces exactly two copies of the body with no label overhead. Count the surviving references with a new `Split.countUseSplit` helper and gate the `shouldShare` decision on `refCount > 1` in addition to the existing size-threshold check. When the guard fires we inline via `inlineUseSplit`, matching the simpler output the advisor flagged on `DeduplicationWhile.mls:20`. Side effects on golden outputs: * `TailRecFormerFailure.mls`: threshold-1 case now inlines, since `refCount` is 1 after specializing `Cons(_, Nil)`. Comment updated to explain why the low threshold no longer forces sharing. * `ExcessiveDeduplication.mls`: the small duplicated-tail test is now inlined; the comment block is updated separately. * `AbortivePrefix.mls`: the inner `σ` disappears for the same reason. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The "what we want is a way of let-binding split tails" paragraph described an aspiration that `LetSplit`/`UseSplit` now implement. The "Currently, dedup' is implemented by hash-consing" preface is also obsolete. Rewrite both to describe the current behavior: sharing happens at normalization time, and refCount-1 cases (like this one) are inlined rather than hoisted behind a label. While here, replace the mis-leading "horrendous code (before optimization)" label on the `if false do ()` case with a note that the redundant `set scrut = false` survives lowering and is removed by the IR optimizer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`ReflectionInstrumenter` does not yet support `Label` blocks, which `LetSplit` lowers to. The existing `match2` test in this file no longer triggers that path: the `refCount > 1` guard introduced with the `LetSplit`-inlining rewrite keeps its single-use shared tail inlined under any threshold. Add a dedicated block that forces the path with `:patMatConsequentSharingThreshold 1` and a pattern whose LetSplit body is genuinely reachable from two places, so the known staged-module limitation stays visible in the test suite. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
# Conflicts: # hkmc2/shared/src/test/mlscript/block-staging/Functions.mls # hkmc2/shared/src/test/mlscript/deforest/basic.mls # hkmc2/shared/src/test/mlscript/deforest/determinism.mls # hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls
There was a problem hiding this comment.
Pull request overview
This PR refactors UCS normalization’s consequent-sharing logic by introducing explicit join points in the Split IR (LetSplit / UseSplit) and a new SplitSymbol, replacing the prior “label-based deduplication after the fact” approach. It updates normalization/specialization and lowering accordingly, and refreshes many golden tests to match the new control-flow shape.
Changes:
- Add join-point constructs to
Split(LetSplit,UseSplit) plusSplitSymbolto represent shared split bodies. - Rework UCS normalization/specialization to preserve and share repeated specialized alternatives via join points.
- Update lowering and a large set of snapshot tests to reflect the new Label/Break-based control flow.
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| hkmc2/shared/src/main/scala/hkmc2/semantics/ucs/Normalization.scala | Implements join-point creation, specialization preservation, and new lowering rules for LetSplit/UseSplit. |
| hkmc2/shared/src/main/scala/hkmc2/semantics/Symbol.scala | Introduces SplitSymbol to carry shared split bodies and (during lowering) an optional LabelSymbol. |
| hkmc2/shared/src/main/scala/hkmc2/semantics/Split.scala | Adds LetSplit/UseSplit variants and updates basic split utilities (clone/duplicate/fullness/printing/etc.). |
| hkmc2/shared/src/main/scala/hkmc2/semantics/Resolver.scala | Extends term-gathering traversal to account for LetSplit/UseSplit. |
| hkmc2/shared/src/main/scala/hkmc2/invalml/InvalML.scala | Updates split typing to handle join-point constructs. |
| hkmc2/shared/src/main/scala/hkmc2/codegen/Lowering.scala | Extends split quasiquote construction (quoteSplit) with new split variants. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/SpecializedSplitSharing.mls | New test covering sharing of repeated specialized split bodies. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/LetSplitJoinPoints.mls | New test suite covering when join points should/shouldn’t be introduced and preserved. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/SimplePairMatches.mls | Snapshot update (JS output formatting / match-error throw formatting). |
| hkmc2/shared/src/test/mlscript/ucs/normalization/ExcessiveDeduplication.mls | Snapshot update and updated commentary reflecting join-point-based sharing. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/DeduplicationWhile.mls | Snapshot update for while-loop lowering shape under new join-point handling. |
| hkmc2/shared/src/test/mlscript/ucs/normalization/Deduplication.mls | Snapshot updates for deduplication code shape and temp variable naming. |
| hkmc2/shared/src/test/mlscript/ucs/general/JoinPoints.mls | Snapshot update (JS formatting). |
| hkmc2/shared/src/test/mlscript/opt/AbortivePrefix.mls | Snapshot update reflecting removal of old split-root/default label structure. |
| hkmc2/shared/src/test/mlscript/llir/ControlFlow.mls | Removes prior failing expectation and replaces with explicit LLIR output snapshot. |
| hkmc2/shared/src/test/mlscript/deforest/todos.mls | Snapshot update reflecting new lowering shape (no split_root/default label blocks). |
| hkmc2/shared/src/test/mlscript/deforest/determinism.mls | Snapshot update reflecting join-point-related lowering changes and symbol renaming. |
| hkmc2/shared/src/test/mlscript/deforest/basic.mls | Snapshot update reflecting join-point-related lowering changes and symbol renaming. |
| hkmc2/shared/src/test/mlscript/codegen/TailRecFormerFailure.mls | Snapshot + commentary update about tail-call behavior with new sharing rules. |
| hkmc2/shared/src/test/mlscript/codegen/MergeMatchArms.mls | Snapshot updates to JS output shape and temp naming. |
| hkmc2/shared/src/test/mlscript/codegen/ClassMatching.mls | Snapshot updates for lowered/optimized IR and JS output temp naming. |
| hkmc2/shared/src/test/mlscript/block-staging/Functions.mls | Adds a :fixme documenting staged-module incompatibility with Label/Break emitted by join points. |
| hkmc2/shared/src/test/mlscript-compile/Predef.mjs | Snapshot update reflecting different compilation output (temp/lambda naming and early returns). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
chengluyu
left a comment
There was a problem hiding this comment.
LGTM.
Next, I will implement the deduplication for the consequent and alternative splits of disjunctive patterns. There is some logic over there that might be shared.
| /** Whether every leaf of `split` unconditionally transfers control away | ||
| * (no fall-through). `End` compiles to `throw`, explicit `return`/`throw` | ||
| * terms are abortive, and `UseSplit` transfers to its referenced body. */ | ||
| private def alwaysTerminates: Bool = split match |
There was a problem hiding this comment.
This name is inadequate. Termination obviously cannot be determined syntactically.
There was a problem hiding this comment.
"Termination" is not appropriate here. It means either return or throw.
| // `return`/`throw`, `UseSplit` → break, or `End` → throw), no value | ||
| // can flow into `cont`, so the exit-label wrapper is unnecessary even | ||
| // when `cont` is not a TailOp. | ||
| if (cont eq Ret) || (cont eq Thrw) || (form is IfLikeForm.While) || |
There was a problem hiding this comment.
Why is this using brittle eq tests instead of isInstanceOf[TailOp]?
| // Other continuations (including ImplctRet, which generates `expr;` | ||
| // without `return`) can fall through the Label body into the rest. | ||
| // Wrap with an exit label and temp variable so every path stores its | ||
| // result, breaks to exitLabel, then the original cont runs once. |
There was a problem hiding this comment.
Why are all the code and comments compressed to 80 columns? 100 is more appropriate.
| dfltRewritten.fold(restRewritten)(Begin(_, restRewritten)) |> some, rest) | ||
| case _ => m | ||
| case b => b | ||
|
|
There was a problem hiding this comment.
I do a newline check in the end.
| case Split.UseSplit(s) => if s eq sym then 1 else 0 | ||
|
|
||
| /** Whether every leaf of `split` unconditionally transfers control away | ||
| * (no fall-through). `End` compiles to `throw`, explicit `return`/`throw` |
There was a problem hiding this comment.
Why does this comment say that "End compiles to throw"? It's clearly not true. Does it want to say something more specific in this special context?
| // sharing unsound). Duplicate and specialize separately. | ||
| // TODO: detect when both specializations agree and share even here. | ||
| val positiveSplit = consequent ++ alternative.duplicate | ||
| val (whenTrue, trueRefs) = normalize(specialize(positiveSplit, +, scrutinee, pattern).getOrElse(positiveSplit)) |
There was a problem hiding this comment.
specialize is now specified to return N when no chnage was made. But this info is simply discarded here. Wasn't it supposed to be used to potentially share the split here? (I guess it would be possible when both positive and negative specialization return N.)
| else if split.isFallback then | ||
| log(s"Case 1.1.3: $pattern is unrelated with $thatPattern") | ||
| rec(tail) | ||
| S(rec(tail).getOrElse(tail)) |
There was a problem hiding this comment.
Is this really the correct logic? Why do we discard the lack of change and replace it wirh S, here? This is an open question; I just want to understand.
| // === Join point cases === | ||
|
|
||
|
|
||
| // * Join point: non-trivial Else alternative on different scrutinee |
There was a problem hiding this comment.
This testcase comment says there is a join point, but I don't see one. What's going on?
| //│ —————————————————| Output |————————————————————————————————————————————————————————————————————————— | ||
| //│ = "neither" | ||
|
|
||
| // * Join point: multi-branch alternative, different scrutinee |
| // * Join point: deeply nested class patterns share the default branch | ||
| data class S(value) | ||
|
|
||
| :sir |
| //│ —————————————————| Output |————————————————————————————————————————————————————————————————————————— | ||
| //│ = "default" | ||
|
|
||
| // * Join point: default branch with a computation should not be duplicated |
There was a problem hiding this comment.
It has nothing to do with the presence of a computation; what matters is the term size.
There was a problem hiding this comment.
Looks like this test file should use bigger consequents to share or a different sharing threshold, as they don't demonstrate what they purport to.
| //│ return [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112] | ||
| //│ end | ||
| //│ else | ||
| //│ return [101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112] |
There was a problem hiding this comment.
Is this really supposed to be duplicated?
(This might be related with the future improvement what you told me before; if so, pls document it.)
|
Subsumed by #486. |
During UCS normalization, when the same alternative split appears on both the positively (
+) and negatively (-) specialized splits of a match, the old implementation duplicated the code. This PR introduces a way to let-bind splits to share the common splits instead.New
SplitconstructsTwo new
Splitvariants were introduced:LetSplit(sym, tail): declares a named split. The symbol'sbodyholds the shared split.UseSplit(sym): references a previously declared named split.A new symbol class
SplitSymbolholds the shared body and an optionalLabelSymbolfor lowering.Changes to
NormalizationMethod
normalizenormalizeImplreturns(Split, Set[SplitSymbol]): the normalized split plus anySplitSymbols whoseUseSplitreferences survive but whoseLetSplithasn't been created yet. This letsLetSplitbe placed at the lowest common ancestor of allUseSplitreferences.A
LetSplitcandidate is created only when:EndorUseSplit), andAfter normalization, the existing
patMatConsequentSharingThresholdpolicy decides whether the candidate is kept asLetSplitor inlined. The decision scales with the survivingUseSplitcount, so small alternatives can still be shared when they are reused enough.Method
specializespecializereturnsOpt[Split](N= unchanged,S(...)= changed), soUseSplitreferences are preserved when specialization doesn't change the split.Changes to lowering
lowerSplittranslatesLetSplitto aLabelblock andUseSplittoBreak. When the continuation isRet/Thrw, the label is used directly to preserve tail-call position. Otherwise, an exit label with a temp variable is used.What's removed?
createLabelsForDuplicatedBranchesand theLabelsclass (old label-based deduplication) are gone!What remains configurable?
patMatConsequentSharingThresholdand the:patMatConsequentSharingThresholdtest command are still supported. They now control whether a normalized join-point candidate survives or is inlined.