feat(appkit): reference agent-app, dev-playground chat UI, docs, and template#306
Merged
Conversation
This was referenced Apr 21, 2026
162e970 to
29e3534
Compare
57cc1e4 to
4a441d2
Compare
7 tasks
29e3534 to
b462716
Compare
d16cdd5 to
e81d8bb
Compare
539487e to
dac73b5
Compare
e81d8bb to
829ad13
Compare
dac73b5 to
624f2a0
Compare
3386200 to
263f587
Compare
Removes the generated Interface.RegisteredPlugins.md page and updates TypeAlias.Plugins.md / index / sidebar to reflect the simplified Record<string, PluginToolkitProvider> shape. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…t/beta
The agents manifest had "hidden": true, which excluded it from the
template plugins.json — but the same flag is also used by
generate-plugin-entries.ts to gate the auto-generated barrel
beta-exports.generated.ts. Result: import { agents } from
"@databricks/appkit/beta" failed at type level, agents() resolved to
any in createApp's plugins tuple, and the entire PluginMap inference
collapsed (appkit.<TAB> showed nothing in onPluginsReady).
Drop "hidden": true. The codegen now emits export { agents } from
"./agents" into beta-exports.generated.ts, the playground import
resolves, and PluginMap inference is restored end-to-end.
Side effect: agents now also appears in template/appkit.plugins.json
so projects scaffolded via npx @databricks/appkit init pick it up
out of the box. That matches v6's intent of shipping the agents
plugin as part of the public beta surface.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Picks up the rewritten runAgent JSDoc with the new trust-boundary language and standalone-init contract notes. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Collapse the two-field `toolkits:` + `tools:` split in markdown agent
frontmatter into a single `tools:` list that mirrors the TS function
form (`tools(plugins) => { ...plugins.foo.toolkit(), get_weather: ... }`).
Reviewer feedback on the v2/4 PR called out the asymmetry: YAML had
two fields for what the TS surface treats as one record.
Each entry in `tools:` is now one of:
tools:
- plugin:analytics # all of analytics
- plugin:files: [uploads.read, uploads.list] # only filter
- plugin:genie: { except: [getConversation] } # full ToolkitOptions
- get_weather # ambient tool
Strings starting with `plugin:` are plugin references; strings without
the prefix are ambient tool names looked up in the agents() plugin's
config. Object entries are single-key mappings keyed by `plugin:NAME`
whose value is either an array (sugar for `{ only: [...] }`) or a full
ToolkitOptions record.
Same downstream resolution: plugin entries call `provider.toolkit(opts)`
and merge in the resulting ToolkitEntries; ambient entries look up
`ctx.availableTools[name]`. The unified list lets a single auto-inherit
gate ("declaring `tools:` opts out") cover both cases without the prior
double check.
The legacy `toolkits:` key is no longer accepted — `parseFrontmatter`
throws a focused migration error pointing users at the new prefix
syntax.
Migration:
- apps/dev-playground/config/agents/query/agent.md — moved its
`toolkits: - files: [...]` line into `tools:\n - plugin:files: [...]`
- docs/docs/plugins/agents.md Level 2 example rewritten with the
unified list and the four-shape rubric
Tests:
- parseFrontmatter "parses nested arrays" updated for the new shape;
added a "rejects legacy toolkits: with migration error" test
- loadAgentsFromDir tests covering: mixed plugin + ambient, `[only]`
shorthand, full ToolkitOptions object, rejection of non-`plugin:`
object keys, rejection of empty `plugin:` (which YAML parses as a
`{ plugin: null }` mapping)
- agents-plugin file-loaded auto-inherit test migrated to the new
syntax
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…rate Address two P0 findings from the PR #306 agentic review. 1. tool() runtime crash on the template `npx @databricks/appkit init` produces a helper agent whose tools call `tool({ description, schema, execute })` with no `name` field (because the record key — `current_time`, `count_words` — already names the tool). The previous shape required `name: string`, so `isFunctionTool` rejected the returned FunctionTool and agent registration failed with "Agent 'helper' tool 'current_time' has an unrecognized shape" — the very first `pnpm dev` after init. Make `name` optional on both ToolConfig and FunctionTool, drop the `typeof obj.name === "string"` check from `isFunctionTool`, and widen `execute` from `Promise<string> | string` to `unknown | Promise<unknown>` since the template's tools naturally return objects (`{ now: ... }`). The agents plugin already overrides `name` with the record key at index-build time, so the original field was functionally redundant in keyed-record contexts. Zod error messages fall back to a generic "tool" label when the name is omitted. Downstream: agents plugin already normalises non-string results via `normalizeToolResult`; DatabricksAdapter already `JSON.stringify`s non-string returns at line 438-439. No further changes needed. 2. Level 1 docs lied about auto-inherit `docs/docs/plugins/agents.md` Level 1 said "auto-inherits every registered ToolProvider plugin's tools" but the Configuration Reference correctly states `autoInheritTools` defaults to `{ file: false, code: false }`. New users following Level 1 hit zero tools and confusion. Rewrite Level 1 narrative to be truthful: "The agent starts with no tools. Tools are opt-in — declare them in frontmatter (Level 2) or opt into auto-inherit explicitly with `agents({ autoInheritTools: { file: true } })`. See 'Auto-inherit posture' below." Same doc still had stale `fromPlugin(...)` examples in Level 3 + the standalone runAgent section. Rewritten to use the `tools(plugins) => ...` function form that v5 actually ships, matching the rest of the codebase. The runAgent section also gets a sentence on the new init contract (eager attachContext + setup, shared cache across sub-agents) and the trust-boundary caveat (no OBO, no approval gate). Tests: +3 covering the new shape — name optional, non-string returns, zod-error fallback label. Updated the "returns false when name is missing" guard test to assert the new "returns true" since the record key wins. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Address four agentic-review findings on PR #306. All four were P1 risks under the MCP path or the markdown agent reload flow. 1. mcp callTool joined undefined text into 'undefined' literal McpToolCallResult.content[].text is optional per the spec; the old filter only matched `type === "text"`, so an entry like { type: "text" } (text undefined) flowed through and `Array.join('\n')` emitted the literal string "undefined" — visible to the agent and to the user. Tighten the filter on both the success and error paths with a type predicate that narrows on `typeof c.text === "string"`. The error path falls back to a generic "MCP tool call failed" message when no text entries survive. 2. mcp sendNotification ignored HTTP error status `notifications/initialized` and other fire-and-forget notifications completed `connect()` even when the server returned 4xx/5xx — a silent connect followed by mysterious tool-call failures. MCP spec says notifications don't acknowledge, so we keep the fire-and-forget contract but log a warning when the server clearly rejected the request. Wrap the fetch in try/catch so network failures surface in the logs without breaking the protocol. 3. mcp response body had no size cap `response.text()` and `response.json()` both buffer the entire body into memory. A misconfigured or malicious MCP server could stream unbounded data to exhaust client memory. New `readResponseTextCapped()` helper streams the body via the ReadableStream API and aborts once cumulative bytes cross MCP_RESPONSE_BODY_LIMIT_BYTES (1 MiB — far above any normal `initialize` / `tools/list` / `tools/call` JSON-RPC response). Used on both the SSE and plain-JSON branches in sendRpc. 4. agents.reload() closed the live mcpClient mid-stream Tool dispatch reads `this.mcpClient` at call time, not via capture. The old `reload()` called `await this.mcpClient.close()` and nulled the field; any in-flight stream that resolved the field afterwards crashed with "MCP client is closed" mid-conversation. Drop the synchronous close. The client owns only short-lived fetch handles, so a reload doesn't need to tear down state — new endpoints append to the existing connection map via `connectHostedTools`, stale connections from removed configs become unreachable through the new agent tool indexes (small memory cost, no correctness hazard). The shutdown path still closes — that's process teardown, where in-flight streams have already been aborted via `abortActiveOperations`. Tests: 4 new MCP cases (undefined-text filter on success + error paths, generic error fallback, 1 MiB body cap), 1 reload-doesn't-close regression, plus a connect-still-succeeds-on-4xx-notification case. 2298 tests passing across the workspace. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reviewer agentic finding #7 flagged ~20 lines of SSE parsing boilerplate in `template/client/src/pages/agents/AgentChat.tsx` that every scaffolded app would copy verbatim. The reviewer suggested extracting it; the cleaner shape is a thin React hook around the existing `connectSSE` utility in `@databricks/appkit-ui/js`, which already owns the buffer cap, abort signal composition, frame parsing, and retry/backoff logic. useAgentChat (new, in @databricks/appkit-ui/react) Single-stream chat primitive for agents-plugin chat endpoints. Wraps connectSSE and adds the stateful glue every chat UI ends up writing: - `content`: accumulates `response.output_text.delta` deltas - `events`: chronological log of every parsed event for replay / inspection - `threadId`: captured from the first `appkit.metadata` event and automatically forwarded on the next `send()` so the server reuses the same thread - `isStreaming` / `error` - `send(message)`: POSTs to `/api/agents/chat` (configurable via `endpoint`), streams the SSE response, parses each JSON-encoded data line, aborts any prior in-flight stream - `reset()`: drops state and aborts the active stream The hook stays narrow on purpose: one stream at a time, no multi-turn message-history ownership (callers compose that). Tool calls, approval gates, and any non-text event are surfaced through `onEvent` for the caller to render however it wants. `maxRetries: 0` is hard-wired into the connectSSE call — chat turns aren't idempotent and re-sending the payload on transient failure would either duplicate the user message or depend on server-side Last-Event-ID resumption (the agents plugin's StreamManager supports it, but failure-mode auditing is easier with retries explicitly off and the responsibility on the caller to re-send). Template migration `template/client/src/pages/agents/AgentChat.tsx` drops the inline fetch/decoder/SSE-buffer/parser block (~22 lines) and uses `useAgentChat({ agent, onEvent })` instead. Tool-call rows are pushed into the local `messages` array via the `onEvent` callback; the streaming assistant text mirrors from the hook's `content` field into the pending assistant row. Dev-playground's `useAgentStream` is left in place for now — it carries inspector + action-dispatcher wiring that doesn't fit on this hook. Migrating it onto `useAgentChat` is a follow-up that also covers the related Agentic finding #21 (duplicate SSE parsing across consumers). Tests: 12 new cases on useAgentChat covering posting payload shape, custom endpoint, delta accumulation, threadId capture + reuse, onEvent dispatch (including handlers that throw), malformed payload skipping, isStreaming lifecycle, reset(), send-while-streaming abort, and onError surfacing. 2310 tests passing across the workspace. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…nfo fetch
Reviewer caught the redundancy: the agents plugin already exposes
`{ agents, defaultAgent }` via `clientConfig()` at startup, inlined
into the boot HTML via `<script id="__appkit__">`. The template was
doing an extra `GET /api/agents/info` round-trip on mount to fetch
the same data that's already sitting in `window.__APPKIT_CONFIG__`
and readable via `usePluginClientConfig`.
Replace the useEffect + fetch + error-handling block with a
synchronous `usePluginClientConfig<AgentsClientConfig>('agents')`.
Net savings: -20 lines of effect/error/loading-state plumbing, one
fewer HTTP request on page load, no flicker between empty-then-
populated agent buttons because the data is available on first render.
`selectedAgent` is now initialised directly from
`defaultAgent ?? agents[0] ?? null` in `useState`, so there's no
intermediate "loading agents…" state. Input placeholder updates to
"No agents registered" when the registry is empty (which only happens
if the agents plugin loaded without any agents — config bug, worth
showing).
The `/api/agents/info` route is left in place on the server — it's
still useful for programmatic callers and debugging — just no longer
consumed by the template.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reviewer agentic finding #21: dev-playground reimplemented the same SSE parsing the template did, creating two divergent implementations of one concern. The new `useAgentChat` hook in @databricks/appkit-ui already owns the fetch + frame parsing + state plumbing; the playground hook is now a thin wrapper that adds the two playground- specific concerns and re-exports the legacy API surface. The wrapper keeps two layers on top of the shared hook: 1. `contextPrefix` on send() — the Smart Dashboard injects active filter / highlight state into the user message so the agent sees the UI state on every turn. Composed in this wrapper rather than polluting the shared hook surface. 2. Stream-inspector wiring — every send opens a `StreamRecord` via `beginStreamRun` and forwards every event to `recordStreamEvent` so the inspector drawer can render the timeline. The inspector is a dev-playground feature, not a shared concern. API stays drop-in compatible: `agentName` (not `agent`), `isLoading` (not `isStreaming`), `send(message, { contextPrefix })`. `SSEEvent` is re-exported as a type alias for `AgentChatEvent` so `use-stream-inspector.ts` and other consumers don't need to be touched. `routes/smart-dashboard.route.tsx`, `agent-sidebar.tsx`, and `quick-actions-bar.tsx` work unchanged. Error handling preserves the prior UX: fetch-level failures now project from `useAgentChat.error` into the streamed `content` as `"Error: ..."`, matching the dashboard's assistant-message rendering of the previous hook's `setContent('Error: ...')` path. Net: 159 -> ~100 lines on the playground hook, zero duplicated SSE parsing across consumers, single source of truth in `@databricks/appkit-ui/react`. Workspace tests: 2310 pass unchanged. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reviewer comment 3200867453: CHANGELOG.md shouldn't be touched by
feature branches. The Stage 2 publish in the secure release repo runs
release-it (per `.release-it.json`) and applies the changelog,
version bump, and tag on its own. A merged PR that hand-edits the
file just makes life harder for the bot.
The two issues called out in the comment:
1. An "Unreleased" entry labelled "Breaking change" was added for
the markdown agent folder layout. The agents plugin hasn't
shipped yet — there's no released contract to break, so this
entry is misleading and the breaking-change marker is wrong.
2. The diff also re-inserted historical release entries (0.25.1,
0.25.0) inside the file, which release-it would have to
reconcile.
This commit reverts CHANGELOG.md to whatever sits on main. The
pre-existing "many `# Changelog` headers" pile-up the reviewer
laughed about is a separate generation bug on the release pipeline,
not in scope here.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…ate timeouts, forward signals
Six P2 findings from the agentic review on agent/v2/6-apps-docs.
Batched as one commit because the rename in finding 15 overlaps with
the annotation migration in finding 12 on every plugin file.
Finding 15 — align tool() and defineTool() callback names.
tool({ execute }) is the public agent-author surface; defineTool({
handler }) is the plugin-author surface. A reviewer pointed out that
switching between them forced re-learning the same field. They now
both spell it execute. ToolEntry.execute carries the same signature
(args, signal?) => unknown | Promise<unknown>.
Mechanical rename across the four core plugins (analytics, lakebase,
files, genie) and all unit tests. HTTP RouteEntry.handler is
unrelated and was left alone.
Findings 12 + 13 — migrate plugins to the effect enum, drop stale
destructive: true references.
ToolAnnotations.effect ("read" | "write" | "update" | "destructive")
has been the preferred field for some time; the legacy readOnly and
destructive booleans are deprecated. The plugins were still
authoring the old fields:
analytics.query readOnly: true -> effect: "read"
files.list/.read/... readOnly: true -> effect: "read"
files.upload/.delete destructive: true -> effect: "destructive"
genie.getConversation readOnly: true -> effect: "read"
genie.sendMessage (unset) -> effect: "read"
lakebase.query readOnly + destructive (redundant pair)
-> effect:
readOnly ? "read"
: "destructive"
The approval gate already honours both forms (see requiresApproval
in agents.ts) so behaviour is unchanged. The redundancy lakebase
had — setting destructive: !readOnly alongside the readOnly bool —
is gone.
Stale comments referring to "tools annotated destructive: true" in
shared/src/agent.ts, core/agent/types.ts, and
docs/docs/plugins/agents.md now lead with effect and document the
legacy boolean as a backward-compat path.
Finding 11 — validate approval.timeoutMs.
cfg.timeoutMs from user config was used verbatim. A misconfiguration
(0, negative, NaN, Infinity, or any value below ~1s) made every
mutating tool call auto-deny instantly because the gate's wait
resolved before any UI could render. The new resolvedApprovalPolicy
getter:
- clamps to a 1 000 ms floor (anything below it falls back to the
60 000 ms default)
- rejects non-finite values (NaN, ±Infinity) by the same path
- logs a single warning so the misconfig is visible
- memoises the result so the warning never repeats across the
thousands of stream lifecycle reads that this getter sees
New test file: approval-config.test.ts (8 tests covering the floor,
NaN, Infinity, negative, default fallback, valid pass-through, and
the memo invariant).
Finding 14 — document AgentDefinition.name precedence.
The field's old doc string ("Filled in from the enclosing key when
used in agents: { foo: def }") was misleading: the registry key
always wins, the explicit name is ignored when registered via the
agents: {...} record or via the markdown folder layout. Expanded
the JSDoc to say so plainly, list the legitimate use cases
(standalone runAgent and consistent fallback labelling), and warn
that setting name to a value that differs from the registry key is
harmless but confusing.
Finding 8 — forward AbortSignal in files and genie tool handlers.
Analytics and lakebase already forward the signal arg to their
underlying I/O. Files and genie ignored it entirely.
Genie now plumbs signal end-to-end. sendMessage takes options.signal
and forwards to streamSendMessage, which already accepts signal at
the connector boundary. getConversation takes a signal parameter
and throws on entry if it's already aborted (the connector's
pagination loop is signal-agnostic today; this catches the common
case where the tool was queued after the user cancelled, without
churning the deeper connector surface).
Files honours signal?.throwIfAborted() at every handler's entry.
The VolumeAPI and the four volume connectors below it don't accept
signal, so deeper mid-call cancellation is still out of scope — but
a queued tool call that's already cancelled by the time dispatch
reaches the handler now unwinds cleanly instead of running to
completion and emitting a wasted result.
Tests: 2310 -> 2318 (added the 8-case approval-config suite). Build
+ docs:build + knip + check:fix clean. Typedoc regenerated to pick
up the rename and the new JSDoc.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…required + cleanup
Two P3 findings plus a consistency sweep across the new agent surface.
Finding 18 — surface partial failures from MCP connectAll.
Used to swallow per-endpoint failures into log.error and return void,
so callers couldn't distinguish "all connected" from "one of three
failed". The structured return McpConnectAllResult exposes
{ connected: string[], failed: Array<{ name, error }> }; the agents
plugin reads it and emits one aggregate startup warning naming the
failed endpoints alongside the per-endpoint diagnostics. Boot still
succeeds — one broken hosted-tool endpoint shouldn't take down every
agent — but operators now see "agent X registered without N hosted
tool endpoints" instead of a silent gap.
McpConnectAllResult is re-exported from beta.ts so user code that
drives connectAll directly can type its result.
4 new unit tests cover the four shapes: all-success, partial,
all-failed, and a non-Error rejection wrapping path so the public
contract always hands callers a real Error.
Finding 19 — tool() description is now required.
description was optional and silently fell back to config.name when
omitted. A tool named "get_weather" would then show up to the LLM
with description="get_weather" — a cryptic identifier that gave the
model no signal about expected use. Switched description: string at
the type level and dropped the ?? config.name fallback, so the
failure mode shifts from "agent silently behaves oddly in
production" to "tsc rejects the call at authoring time". Updated
the test that asserted the old fallback and added a description to
the three other tests that omitted it. No production callsite was
affected (template, dev-playground, and docs already passed a
description).
Consistency sweep across the new agent surface.
After the handler→execute rename and the toolkits:→tools: unification
in earlier PRs in this stack, several JSDoc strings still referred
to dead concepts and one runtime path had drifted from a sibling.
Cleanups in this commit:
- tool-approval-gate.ts header still said "destructive: true agent
tool calls"; rewritten to lead with the effect enum and document
the legacy boolean as a backward-compat path, matching the
requiresApproval gate and the docs.
- AutoInheritToolsConfig and ToolEntry.autoInheritable JSDoc
referred to "markdown toolkits:"; updated to "plugin:NAME entries
in the unified tools: list" so the prose tracks the parser.
- LoadContext.plugins and AgentsPlugin.pluginProviderIndex used the
same dead toolkits: phrasing — same fix.
- lakebase.query previously ignored signal, even though the agentic
review claimed it forwarded. Aligned with the files plugin's
pattern: signal?.throwIfAborted() at handler entry. Deeper
plumbing into the pg connection still wants its own pass, but the
common cancel-before-dispatch case now unwinds cleanly instead of
running to completion against Postgres.
- core/agent/load-agents.ts had a module-private `type ToolEntry`
that shadowed the exported public-API ToolEntry from
tools/define-tool.ts — different layers, same name. Renamed the
local type to FrontmatterToolEntry; ToolEntry now refers to the
plugin-author API everywhere it appears.
- Stale biome-ignore suppression on approval-config.test.ts removed
after the underlying noExplicitAny rule stopped firing on that
pattern.
Tests: 2322 -> 2322 (replaced the description-fallback test with the
new required-description shape, added 4 connectAll tests). Build +
docs:build + knip + typecheck + check:fix clean. Typedoc regenerated;
new McpConnectAllResult page published.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Closes the four remaining reviewer items on the agents stack and the
CI gap that has been red since the helper agent was rebuilt.
CI — playground integration tests.
The playground's `helper` agent has no explicit `model:` and falls
back to DatabricksAdapter.fromModelServing, which reads
DATABRICKS_SERVING_ENDPOINT_NAME at setup() time. The integration
test job never sets that env, so the agents plugin throws while
resolving the adapter and the webServer health check times out
before any test runs. APPKIT_E2E_TEST swaps in the mock
WorkspaceClient at request time so no real serving traffic happens —
the only thing the job needed was a non-empty endpoint name.
Added DATABRICKS_SERVING_ENDPOINT_NAME=e2e-mock alongside the
existing DATABRICKS_WAREHOUSE_ID / DATABRICKS_WORKSPACE_ID mocks.
Reviewer comment 3201574964 — agents serving-endpoint manifest copy.
The optional `serving_endpoint` resource in the agents plugin's
manifest leaked internal API names (`fromModelServing`) into the
scaffold-time description and used the awkward alias "Model Serving
(agents)". A user running `apps init` saw both. Rewrote the entry:
- alias: "Model Serving (agents)" → "Default LLM for agents"
- description leads with intent ("the LLM your agents will call
when they don't pin their own model"), spells out when it's
required (every agent that omits `model:` or `endpoint:`), names
the runtime env var, notes the shared-with-serving-plugin
ergonomics, and warns up front that only streaming-capable
endpoints work.
- field description matches.
`template/appkit.plugins.json` regenerated through `sync:template`.
Reviewer comment 3207721724 — streaming-endpoint requirement in docs.
Custom MLflow / sklearn endpoints don't implement the chat-
completions streaming protocol; pointing an agent at one fails with
"Response body is null — streaming not supported" on the first turn.
This was nowhere in the agents plugin docs. Added a Requirements
section near the top of docs/docs/plugins/agents.md with an `:::info`
callout that names which endpoint families work (Foundation Model
APIs, external chat models), which don't (typical sklearn/MLflow
pyfunc), and points users at the `serving` plugin's `/invoke` route
+ `useServingInvoke` hook for the non-streaming path.
Reviewer comment 3201691529 — drop the `assistant` agent.
Both the scaffolded template and the dev-playground shipped a
markdown `assistant` agent that did nothing the code-defined
`helper` couldn't (helper's prompt already says "For anything else,
answer briefly in plain text"). It added a second tab in the chat
UI that scaffolded users had to read past, with no functional
payoff. Dropped:
- template/config/agents/assistant/agent.md
- apps/dev-playground/config/agents/assistant/agent.md
Followups in the same change:
- The playground's `assistant` was the only markdown agent with
`default: true`; without it the agents plugin would fall back to
first-registered, which alphabetically lands on the `anomaly`
ephemeral. Pinned `defaultAgent: "helper"` on the playground's
agents() config so the bare /agent route lands on a
conversational agent.
- Updated `template/server/agents/helper.ts` and
`template/client/src/pages/agents/AgentChat.tsx` so the prose
docs talk about a single starter agent and a drop-in markdown
pattern, rather than referencing the now-gone assistant.
Markdown-form coverage is still demonstrated by every other
markdown agent in the playground (`anomaly`, `autocomplete`,
`insights`, `query`) and by the docs Level 1 / Level 2 walkthrough.
Scaffolded users get a working tool-calling demo immediately
instead of two tabs that look the same.
Tests: 2322 pass unchanged. Typecheck + build + docs:build + knip +
check:fix clean. Template manifest regenerated via sync:template.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Earlier in this stack I dropped `assistant` from the template to
address reviewer comment 3201691529. The reviewer's concern was real
(assistant did nothing helper couldn't), but the fix went too far —
the template lost its pedagogical contrast between the imperative
code form and the declarative markdown form, which is one of the
agents plugin's selling points.
Replaced assistant with `planner`: a markdown agent that's genuinely
distinct from helper, not a near-duplicate.
- **helper** (code, server/agents/helper.ts) uses `tool()` to expose
`current_time` and `count_words`. Demonstrates the imperative form
for agents whose value is in what they *do*.
- **planner** (markdown, config/agents/planner/agent.md) is pure
prose with no tools. Its system prompt steers behaviour: restate
the goal, surface open questions, propose a small ordered plan,
call out risks. Demonstrates the declarative form for agents whose
value is in *how they think*.
The contrast lines up cleanly: tools-vs-prose, executes-vs-reasons,
compiled-vs-edit-and-go. Switching tabs in the chat UI now shows two
genuinely different conversations, which is what the reviewer's
"why is there a second tab" question was really probing.
The planner prompt avoids two failure modes the prior `assistant`
hit:
- It's not a pale-imitation generalist. The prompt is specific to a
planning role (restate, ask before assuming, ordered output) so
the agent has visible behaviour the user can recognise.
- It explicitly tells the model it has no tools and points at
helper when the user wants an action — so a scaffolded user
naturally discovers the second tab through the conversation
itself.
Wiring follow-ups:
- planner carries `default: true` so the template's default tab is
conversation-first, matching the old behaviour.
- helper's docstring and AgentChat.tsx prose now describe the dual
demo and the file paths behind each tab.
- The empty-state hint suggests one prompt per tab so the contrast
is the first thing a new user experiences.
Playground intentionally untouched — it already has four markdown
agents (anomaly, autocomplete, insights, query) demonstrating the
form, so assistant was redundant there and stays removed.
Tests: 2322 pass. Typecheck clean. No code paths beyond the new
markdown file and the prose updates around it.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…gent
The previous template change exposed `planner` (markdown) and `helper`
(code) as two top-level chat tabs. The dual-form story was right, but
the UX was confusing — switching tabs felt like switching apps even
though both agents handled the same conversations.
Restructured around the agents plugin's sub-agent feature instead:
- `planner` (markdown, `config/agents/planner/agent.md`) is now the
only user-facing chat. Its frontmatter declares
`agents: [helper]`, so the agents plugin exposes `helper` to the
planner prompt as an `agent-helper` tool. Planner's body tells the
model to delegate to `agent-helper` for computational actions
(clock, word count) and stay in prose for planning conversations.
- `helper` (code, `server/agents/helper.ts`) stays registered at the
top level so the markdown loader's two-pass resolver can find it
by name when wiring planner's sub-agents. The chat UI just doesn't
surface it — the template renders one chat against the default
agent and drops the tab picker entirely.
- `AgentChat.tsx` lost the agent-tab list; it talks straight to the
default agent (planner) and renders sub-agent calls inline as
`tool · agent-helper` rows, same shape it already uses for
function-tool calls. Empty-state hint nudges the user toward both
planning prompts ("help me plan a feature") and the delegation
path ("what time is it?" — watch planner call agent-helper).
- `helper.ts` docstring rewritten so a scaffolded reader sees the
coordinator-plus-worker pattern, why each side lives in its own
file (markdown for prose, code for typed tools), and that the
template is meant to be a copy-paste starting point for the
pattern.
No plugin/runtime changes. Both agents stay top-level registered;
visibility is a UI-side choice, not a plugin feature. If a user
later wants two chats they can re-add a tab list reading `agents[]`
from `usePluginClientConfig('agents')` — the data is still there.
Tests: 2322 pass. Typecheck/build clean.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
The landing page had a special hero treatment for `/smart-dashboard`
plus an opacity-muted copy of the same card in its category grid
labelled "Featured above". With the agents stack expanding, this was
adding noise — every category now has agents demos and visually
elevating one of them implied a hierarchy that doesn't really exist.
Stripped the special-case:
- Removed the FeaturedCard component and the `featured` prop on
DemoCard. The grid is now uniform.
- Removed the "Featured above" muted treatment in DemoCard. Every
card renders identically; the user picks by category and label.
- The landing-page comment block now describes the simpler shape
(hero + grouped category grid) so future readers don't go hunting
for a featured slot.
Smoke test updated.
The smoke test asserted on the old hero subtitle copy ("Explore the
capabilities of the AppKit") which the hero hasn't shipped for a
while — it's been "A living catalog of what AppKit can do — ..."
through this stack. The test was passing in CI before because the
webServer health check timed out earlier (the
DATABRICKS_SERVING_ENDPOINT_NAME env miss masked it), and now that
the env is set, the smoke step actually runs and trips on the stale
assertion. Updated to match a stable substring of the current copy
with `exact: false`, so future prose tweaks don't keep breaking the
smoke test — the goal here is "the page rendered", not "the exact
marketing line".
No other behaviour change; the catalog and nav data in @/lib/nav.ts
are untouched. /smart-dashboard is still in its category grid as a
normal demo card.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
The collapsible header in saved-views-panel.tsx wrapped the title row,
the spinner/refresh action, and the chevron in one outer `<button>`,
with the refresh action implemented as a nested `<button>` inside it.
That's invalid HTML — `<button>` cannot contain `<button>` — and
React's dev build was flagging it as a hydration warning on every
render of the Smart Dashboard:
validateDOMNesting: <button> cannot contain a nested <button>.
...
saved-views-panel.tsx:91
saved-views-panel.tsx:72
The fix is purely structural; no behaviour or visual change:
- The header row is now a flex `<div>` rather than a flex `<button>`,
so its children can be sibling buttons instead of nested ones.
- The title cluster (icon + "Saved views" + count) is one button that
toggles open/close.
- The chevron is its own toggle button (same `setOpen` flip); making
it a real button gives keyboard users a second focusable target and
bakes the `aria-expanded` state onto something that announces itself.
- Refresh is a third sibling button, no longer wrapped in the toggle.
Drops the `e.stopPropagation` it needed when it was a child of the
outer toggle button.
Added `aria-expanded` and proper `aria-label`s on both toggle buttons
since the chevron is now keyboard-reachable.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reported by the user testing the template's helper agent against
gemini-flash-lite. After the first tool call (planner -> agent-helper
sub-agent), Vertex returned:
INVALID_ARGUMENT: function call \`agent-helper\` in the 6. content
block is missing a \`thought_signature\`
Vertex AI's OpenAI-compatible surface for Gemini 2.x models (Flash,
Flash-Lite, Pro) attaches an opaque \`thought_signature\` blob on every
function call the model emits. The next assistant-with-tool_calls
message in the conversation history must echo it back verbatim or
Vertex rejects with that 400. The DatabricksAdapter was building
OpenAIToolCall objects from streamed deltas without preserving the
field, so the follow-up request from inside the same turn (the one
that pushes tool results back to the model) failed before we ever
got a final assistant turn.
The fix has two halves — one for the live single-turn case (the
reported bug) and one for resumed threads loaded from ThreadStore.
Single-turn (the reported bug):
- Extended OpenAIToolCall + DeltaToolCall with thought_signature.
- The streaming accumulator now captures the field from either the
top level of the tool_call delta or nested inside \`function\` —
Vertex's proxy has been observed to use both locations.
- When constructing final OpenAIToolCall objects from the
accumulator, the signature is attached at the top level (matches
Vertex's OpenAI-compat docs).
- The existing messages.push(...) flow already preserves the full
tool_calls array verbatim, so once the field is on the object it
reaches the wire on the next streamCompletion call within the same
run() invocation.
Resumed threads:
- Extended the shared ToolCall type with an optional
thoughtSignature so it can survive ThreadStore round-trips.
- buildMessages now reads tc.thoughtSignature off the persisted
ToolCall and emits it as thought_signature on the wire shape so
the first request of a resumed turn doesn't fail Vertex's
signature check before any tool call even fires.
Non-Gemini endpoints (Claude on Databricks, OpenAI-compat external
models, Llama, etc.) leave the field undefined; the spread on the
emit-side omits the key entirely, so we don't pass a meaningless
field to models with stricter shape validators.
Tests (4 new in a Vertex/Gemini describe block, total 2322 -> 2326):
- top-level thought_signature captured from a streamed delta and
echoed verbatim on the second request
- nested function.thought_signature captured from a streamed delta
and canonicalised to top level on the second request
- non-Gemini path (no signature in the delta) does NOT emit a
thought_signature key — defends against accidentally inventing
one when talking to Claude/OpenAI
- resumed thread: persisted Message.toolCalls[*].thoughtSignature
flows through buildMessages to the wire's thought_signature on
the very first request of the new turn
Plain typecheck + workspace test + build clean. No public-API
breakage: thoughtSignature is optional everywhere it appears.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
User reported tsc errors in their scaffolded app on every tool({...})
callsite:
ZodObject<{}, $strip> is missing the following properties from type
ZodType: toJSONSchema, with, exactOptional, apply
Root cause: zod version skew between AppKit and the template.
packages/appkit pinned zod@4.3.6; template/package.json pinned
zod@4.1.13. The methods AppKit's published `.d.ts` advertises on
`ZodType` (`toJSONSchema`, `with`, `exactOptional`, `apply`) were
added in 4.2 — the user's scaffolded `z.object({})` was a different
structural type than the one AppKit's `tool()` constraint expects,
even though both are major version 4.
Two changes:
1. Template now pins zod@4.3.6, matching AppKit. Scaffolded apps see
exactly the ZodType AppKit's published types reference, so
`tool({ schema: z.object(...) })` type-checks without surprises.
2. tools/check-template-deps.ts now cross-checks an allowlist of
version-sensitive runtime deps (`zod` for now) against
packages/appkit/package.json. Future bumps of zod in AppKit
without a matching bump in the template fail this check at CI
time with a clear message ("Version skew on zod: template pins X,
packages/appkit pins Y") instead of surfacing as a confusing
ZodObject mismatch in user-land tsc output.
Header doc on the check tool now explains both guards: the
supply-chain pin check that was already there, and the new
cross-version guard for deps whose types leak through AppKit's public
API surface. The list is intentionally small (just zod today); add to
SYNCED_DEPS as other public-typed deps appear.
Workspace tests still 2326 pass. Typecheck clean across all packages
and the docs site. No lockfile churn; pnpm resolves zod@4.3.6 the
same way for both packages now.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
pkosiec
reviewed
May 12, 2026
Member
pkosiec
left a comment
There was a problem hiding this comment.
Code LGTM overall, I'm about to test the app and will gave a final ✅ if everything is alright 👍
Live capture against `gemini-3.1-flash-lite-preview` showed Vertex's OpenAI-compat proxy emits the signature only as `thoughtSignature` (camelCase) at the top level of the tool_call delta, and accepts the same spelling back on outbound. The dual-spelling capture/emit added in the previous fix was speculative defense without runtime evidence. Simplifies: - `OpenAIToolCall` and `DeltaToolCall` carry only `thoughtSignature` - Inbound capture is `tc.thoughtSignature` (no `??` fallback) - Outbound emit on fresh tool_calls and on `buildMessages` resume is single-key - Test helper unified to `sig`, snake_case-only test removed, negative test asserts both spellings stay absent as a regression guard - Drops two pre-existing dead `biome-ignore noExplicitAny` suppressions in the test file (rule is disabled in tests via overrides) Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
pkosiec
approved these changes
May 12, 2026
The agents plugin's default-LLM resource description had grown into a paragraph covering when it's needed, the env var, the cross-plugin sharing, and which endpoints qualify. That copy is rendered by the plugin CLI, where it pushes everything else off-screen and reads as docs-in-a-tooltip. Match the sibling plugins' style (serving, analytics, lakebase): one short clause for the resource, a noun phrase for the field. The edge-case detail belongs in docs/plugins/agents.md, not in the CLI. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Fixes the actionable findings from the second-pass agentic review on PR #306, all small and confidence-validated against the actual code. - agents.ts: untrackStream fallback aligned with trackStream (?? 0) so the asymmetry stops masking a drift the comment claims cannot happen. - agents.ts: dispatchToolCall guards function-tool args before casting to Record<string, unknown>; a non-object value now throws a clear error instead of silently passing a wrong shape to the tool. - mcp/client.ts: new coerceToolParameters() validates the MCP server's reported inputSchema before exposing it as an AgentToolDefinition. Malformed schemas fall back to an empty parameters object so the tool still registers but cannot accept arguments — defense against a misbehaving (or hostile) MCP server. - tool.ts: drop redundant `as unknown as Record<string, unknown>` double cast; toToolJSONSchema already returns that type. - template AgentChat.tsx: scroll effect now depends on [messages, content] so the viewport actually follows new replies instead of freezing at the top. - dispatch-tool-call.test.ts: new test for runSubAgent depth-limit rejection — exercises the maxSubAgentDepth backstop that the cycle-detection comment relies on. Tool-call budget exhaustion was flagged as untested but the existing "rejects + aborts when the budget is exhausted" test already covers it; no new test needed there. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Express's stock 100kb default rejected the dev-playground smart-dashboard
"save view" upload with a 413 PayloadTooLargeError before the route
handler ever ran. The request body was ~105KB — a base64-encoded
screenshot of the dashboard, barely over the cap. Same shape will hit
any other AppKit consumer that posts modest base64 uploads, light file
metadata, or just large agent chat payloads.
Add `bodyLimit?: string` to ServerConfig and default to 1mb. Covers the
common cases without opening a wide DoS surface; consumers that legit-
imately accept larger uploads can opt in explicitly via
`server({ bodyLimit: "10mb" })`.
Runtime evidence (pre-fix vs post-fix logs on the same flow):
- Pre-fix: content-length 105139, body-parser 413'd before the handler
was reached.
- Post-fix: content-length 104791, handler reached with bodyKeys
[name, filters, highlights, pngBase64].
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Mirrors the tightening in 91c9166, which shortened the agents plugin's default-LLM resource description in the canonical manifest but didn't re-run the template sync. The template's appkit.plugins.json is the copy users see at `npx @databricks/appkit init` time, so it needs to match what the CLI renders for the installed package. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Final layer of the agents feature stack. Everything needed to
exercise, demonstrate, and learn the feature.
Reference application: agent-app
apps/agent-app/— a standalone app purpose-built around the agentsfeature. Demonstrates every major capability in one place:
config/agents/assistant.md, default)with destructive file tools (upload, delete) for HITL demo, and
agents: [support, researcher]delegating to both a markdownsibling and a code-defined specialist.
config/agents/support.md) — full analyticsget_weather.researcherinserver.ts) — definedin code specifically because its MCP tool set is conditional on
runtime env vars, which markdown frontmatter can't express.
Referenced from
assistant.mdvia the markdown → code cross-reference.server.ts— concise: ambienttool()factories, conditional MCPwiring, zero-trust host allowlist derived from the same env vars,
agents()plugin config withautoInheritToolsandmcp.trustedHosts.streaming tokens, tool calls, and an approval card that approves or
denies destructive tool requests over
/api/agents/approve.databricks.yml,app.yaml) and.env.examplefor local dev.dev-playground chat UI + autocomplete agent
apps/dev-playground/client/src/routes/agent.route.tsx— chat UIwith inline autocomplete (hits the
autocompletemarkdown agentconfigured with
ephemeral: true) and a full threaded conversationpanel (hits the default agent).
apps/dev-playground/server/index.ts— code-definedhelperagentusing
fromPlugin(analytics)alongside the markdown-drivenautocompleteagent inconfig/agents/, demonstrating the mixedsetup against the same plugin list. Route tree (
routeTree.gen.ts)regenerated to include the new
/agentroute.Docs
docs/docs/plugins/agents.md— progressive guide covering:toolkits:/tools:frontmatter.fromPlugin().runAgent()(nocreateAppor HTTP).Plus configuration reference (including
approval,limits,mcpkeys), runtime API reference, and a full frontmatter schema table.
docs/docs/api/appkit/— typedoc regenerated for the full agentspublic surface including
AgentDefinition.ephemeral,AgentsPluginConfig.{approval, limits, mcp}, updatedloadAgentFromFile/loadAgentsFromDirsignatures, expandedAgentEventunion, andToolkitEntryannotations.Template
template/appkit.plugins.json— adds theagentsplugin entry sonpx @databricks/appkit init --features agentsscaffolds the plugincorrectly.
Test plan
pnpm docs:buildclean (no broken links)pnpm --filter=@databricks/appkit build:packageclean, publint cleanSigned-off-by: MarioCadenas MarioCadenas@users.noreply.github.com
PR Stack
agents()plugin +createAgent(def)+ markdown-driven agents — feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents #304fromPlugin()DX +runAgentplugins arg + toolkit-resolver — feat(appkit): tools(plugins) DX, runAgent plugins arg, shared toolkit-resolver #305Demo
agent-demo.mp4