Skip to content

Add per-node attribution to q2-debug AST view#122

Draft
shikokuchuo wants to merge 24 commits intomainfrom
feat/node-attribution
Draft

Add per-node attribution to q2-debug AST view#122
shikokuchuo wants to merge 24 commits intomainfrom
feat/node-attribution

Conversation

@shikokuchuo
Copy link
Copy Markdown
Member

@shikokuchuo shikokuchuo commented Apr 15, 2026

Closes #115.

Summary

Adds per-node authorship to the q2-debug AST view. Each node is coloured by the author who last edited the source bytes behind it, with a hover badge showing the author and a relative timestamp. Off by default — toggle "Authorship" in the Settings sidebar (q2-debug only).

How Automerge attribution meets the AST

Every AST node carries a sourceInfoId (the s field) that points into the parse's sourceInfoPool. SourceInfoReconstructor (from @quarto/annotated-qmd) resolves that id to a byte range { fileId, start, end } in the original source.

Separately, an attribution producer replays Automerge history and records, per character, who wrote it and when. Producers expose only a minimal query interface:

interface AttributionSource {
  queryByteRange(fileId, byteStart, byteEnd): { actor, time } | null;
}

Attributing a node is then: resolve its byte range via the reconstructor, hand that range to the source, and (for multi-byte ranges) take the most recent writer — "last touch wins". The interface is byte-indexed (UTF-8, as emitted by WASM); the Automerge-backed source keeps an internal byte→char map since Automerge text is UTF-16.

Cold start vs incremental updates

useAttribution(filePath, sourceText) owns the build lifecycle:

  • Cold start: buildRunListAttribution replays full Automerge history, 500 entries per chunk, yielding via requestIdleCallback before every chunk. The hook returns null while the build is pending, so the document paints un-attributed first.
  • Incremental: on each sourceText change (debounced 500 ms), updateRunListAttribution applies only the new history entries to the existing runs synchronously. On HistoryCompactedError, we fall back to a full rebuild.
  • Switching files aborts any in-flight build and clears state before the next cold start.

useNodeAttributionResolver(astContext, attributionCtx) then wraps the source + reconstructor into a memoised, cached getNodeAttribution(sid) → { actor, time, color, name } | null. color and name come from an identities map supplied by the Automerge sync layer; unmapped actors fall back to a short hex of the actor ID.

Rendering: text colour and hover tooltip

A node with a resolved attribution is wrapped in <span|div class="q2-attr-wrap" data-sid="…" style="color: <actor-color>">. The colour is plain inline style — no registry magic.

Hover is event-delegated: a single onMouseOver/onMouseOut pair on the container finds the nearest .q2-attr-wrap, reads data-sid, calls getNodeAttribution, and positions one floating <AttributionBadge> via getBoundingClientRect. One handler, one badge — not N of each.

Producer-swappable design (git-blame adapter)

AttributionSource is the only boundary that matters: anything above it (storage shape, producer) can change without touching anything below (hook, resolver, renderer).

The default producer is Automerge-backed and run-length encoded (attribution-runs.ts) — 4× faster updates, 5× faster queries, 20× smaller than the per-char prototype (attribution.ts) on realistic batched workloads.

attribution-gitblame.ts is a second, independent producer that parses git blame --porcelain, converts lines to byte-ranged runs using TextEncoder (so multi-byte UTF-8 lands on correct byte offsets), and returns an AttributionSource with binary-search queries. It touches no consumer code — concrete proof the boundary holds and Automerge is swappable.

Access from render components

Editor.tsx publishes AttributionContext ({ source, identities, sourceText }) to the render tree whenever attribution is on. Any renderer can opt in:

  1. useContext(AttributionContext) to get the source.
  2. useNodeAttributionResolver(ast.astContext, attributionCtx) for a getNodeAttribution(sid) function.
  3. Read sid from a node's s field and look up { actor, time, color, name }.

ReactAstDebugRenderer is currently the only consumer; any other renderer picks this up in three lines.

Enabling the feature

A persisted attributionEnabled preference (default false) is exposed as an "Authorship" toggle in the Settings sidebar, shown only when the current format is q2-debug.

Editor.tsx passes filePath = null to useAttribution when the toggle is off, which short-circuits the history replay — zero cost when disabled, and flipping off mid-session aborts any in-flight build.

Other notes

  • Minor type fixes in ts-packages/annotated-qmd so it compiles cleanly under hub-client's strict tsconfig.

@shikokuchuo shikokuchuo force-pushed the feat/node-attribution branch from 56b11b9 to 1a54428 Compare April 21, 2026 08:57
@shikokuchuo shikokuchuo marked this pull request as ready for review April 21, 2026 14:30
@shikokuchuo shikokuchuo force-pushed the feat/node-attribution branch 6 times, most recently from fd422b3 to 38390d8 Compare April 23, 2026 09:03
Replays Automerge history to build a per-character attribution map,
then colors each AST node by author with hover tooltips showing name
and relative time. Supports chunked async builds, incremental updates,
and graceful degradation when offline or without history.
The hub-client tsconfig enables erasableSyntaxOnly, verbatimModuleSyntax,
and noUnusedLocals which surface errors in annotated-qmd sources resolved
through the types field. Convert parameter properties to explicit field
declarations, split type-only imports, and remove unused variables.
Real Automerge diffs use splice/put/del actions (not insert/del from
the old Text API). Handle splice with value:string, put for field-level
init, and add @automerge/automerge as a direct hub-client dependency.
Also fix relative time display for second-precision timestamps.
The useAttribution hook now only runs when the document has
format: q2-debug AND attribution: true in its YAML frontmatter.
Without the flag, no Automerge history traversal occurs.
…ke test

Remove console.log/warn/error statements used during development,
drop unused _identities parameter from useAttribution() (identities
flow through AttributionContext), and delete the stale spike test
that documented the old Automerge Text API patch format.
Show a colored badge (author dot + name + relative time) on hover
instead of the browser's plain title tooltip. Badge uses a solid
white background with the author's color on border and text, and
appears below the hovered node. CSS is injected once from AstRenderer
rather than per-Node.
Cache node attribution lookups in a Map keyed by sourceInfoId so
re-renders that don't change attribution data make zero WASM calls.
Use byteToCharMap from AttributionContext instead of recomputing it
in the renderer. Replace per-node hidden AttributionBadge elements
with a single floating badge shown on hover via event delegation.
Attribution was previously opt-in via `attribution: true` in the qmd
YAML frontmatter. This is a viewing concern, not a document property,
so move it to a persistent UI toggle in the Settings sidebar instead.

The toggle appears only when the preview format is q2-debug and is
labeled "Authorship" for clarity. The preference persists across
sessions via localStorage.
Consumer-facing interfaces now expose CharAttribution[] entries
instead of the full AttributionMap (which carries Automerge-specific
processedHeads/processedHistoryIndex bookkeeping). This enables
swapping the attribution data source (e.g. to git blame) without
changing any consumer code.
Commit 1a54428 changed the consumer-facing type from AttributionMap to
CharAttribution[], but the flat per-character array was still the
boundary. Non-Automerge producers (git blame, LSP) had to flatten their
native range-based data into per-character entries — an expansion that
scales with document size rather than edit count.

Introduce AttributionSource, a single-method query interface:

    queryByteRange(fileId, byteStart, byteEnd) -> {actor, time} | null

The Automerge path keeps its existing entries + byteToCharMap internally
and wraps them via makeCharArraySource — behaviorally identical to the
prior scan. Consumers (Editor, ReactAstDebugRenderer, Node) depend only
on the query function; AttributionContext now carries { source,
identities, sourceText }.

The new attribution.gitblame.test.ts uses a run-based AttributionSource
that binary-searches byte-ranged records — no per-character array,
natively byte-indexed. It imports only getNodeAttribution,
AttributionSource, NodeAttribution, and ActorIdentity, demonstrating the
boundary is now representation-agnostic in practice.
Spread of >~118K elements into entries.splice overflowed V8's argument
stack. Chunk large patches into 10K-element splices.
Adds attribution-runs.ts behind the same AttributionSource boundary and
wires useAttribution to it. Adds a bench harness (npm run bench) that
A/B's char vs rle; RLE wins on realistic batched workloads (4× faster
updates, 5× faster queries, 20× smaller storage) and handles arbitrary
bulk-insert sizes without the splice-chunking workaround.
Moves parseBlamePorcelain / buildBlameRuns / makeGitBlameSource out
of the test file into attribution-gitblame.ts so the adapter can be
wired into production. Synthetic unit tests added alongside the
existing end-to-end.
Skip the first waitForIdle before processing any history, and pass
timeout: 100 to requestIdleCallback so the build isn't starved while
React is mounting.
Yield before the first chunk so cold start doesn't block the initial
paint, and clear stale source on file switch so prior attribution
doesn't flash against the new file's AST on re-navigation.
Move the SourceInfoReconstructor + getNodeAttribution cache logic out of
AstRenderer and into hooks/useAttribution.ts, alongside AttributionContext.
NodeAttributionContext moves too, since nothing about its shape is
debug-specific. Any renderer with an astContext can now opt in with one
hook call and a provider wrap instead of re-deriving the plumbing.
@shikokuchuo shikokuchuo force-pushed the feat/node-attribution branch from d1258b9 to aa7ac38 Compare April 24, 2026 14:14
@shikokuchuo shikokuchuo marked this pull request as draft April 24, 2026 15:02
@shikokuchuo shikokuchuo marked this pull request as draft April 24, 2026 15:02
@shikokuchuo
Copy link
Copy Markdown
Member Author

After discussion with Carlos we're going to move this to a transform step of the render pipeline, so attribution will be available to all consumers - cli as well as hub-client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Replay UI: show attribution of highlighted text

1 participant