feat(appkit): reference agent-app, dev-playground chat UI, docs, and template by MarioCadenas · Pull Request #306 · databricks/appkit

MarioCadenas · 2026-04-21T17:58:48Z

Final layer of the agents feature stack. Everything needed to
exercise, demonstrate, and learn the feature.

Reference application: agent-app

apps/agent-app/ — a standalone app purpose-built around the agents
feature. Demonstrates every major capability in one place:

Markdown orchestrator (config/agents/assistant.md, default)
with destructive file tools (upload, delete) for HITL demo, and
agents: [support, researcher] delegating to both a markdown
sibling and a code-defined specialist.
Markdown specialist (config/agents/support.md) — full analytics
- files toolkits, plus ambient get_weather.
Code-defined specialist (researcher in server.ts) — defined
in code specifically because its MCP tool set is conditional on
runtime env vars, which markdown frontmatter can't express.
Referenced from assistant.md via the markdown → code cross-reference.
server.ts — concise: ambient tool() factories, conditional MCP
wiring, zero-trust host allowlist derived from the same env vars,
agents() plugin config with autoInheritTools and mcp.trustedHosts.
Vite + React 19 + TailwindCSS frontend with a chat UI showing
streaming tokens, tool calls, and an approval card that approves or
denies destructive tool requests over /api/agents/approve.
Databricks deployment config (databricks.yml, app.yaml) and
.env.example for local dev.

dev-playground chat UI + autocomplete agent

apps/dev-playground/client/src/routes/agent.route.tsx — chat UI
with inline autocomplete (hits the autocomplete markdown agent
configured with ephemeral: true) and a full threaded conversation
panel (hits the default agent).

apps/dev-playground/server/index.ts — code-defined helper agent
using fromPlugin(analytics) alongside the markdown-driven
autocomplete agent in config/agents/, demonstrating the mixed
setup against the same plugin list. Route tree (routeTree.gen.ts)
regenerated to include the new /agent route.

Docs

docs/docs/plugins/agents.md — progressive guide covering:

Drop a markdown file → it just works.
Scope tools via toolkits: / tools: frontmatter.
Code-defined agents with fromPlugin().
Sub-agents (markdown → markdown, markdown → code, code → code).
Standalone runAgent() (no createApp or HTTP).

Plus configuration reference (including approval, limits, mcp
keys), runtime API reference, and a full frontmatter schema table.

docs/docs/api/appkit/ — typedoc regenerated for the full agents
public surface including AgentDefinition.ephemeral,
AgentsPluginConfig.{approval, limits, mcp}, updated
loadAgentFromFile / loadAgentsFromDir signatures, expanded
AgentEvent union, and ToolkitEntry annotations.

Template

template/appkit.plugins.json — adds the agents plugin entry so
npx @databricks/appkit init --features agents scaffolds the plugin
correctly.

Test plan

Full appkit vitest suite: 1552 tests passing at stack tip
Typecheck clean across all 8 workspace projects
pnpm docs:build clean (no broken links)
pnpm --filter=@databricks/appkit build:package clean, publint clean

Signed-off-by: MarioCadenas MarioCadenas@users.noreply.github.com

PR Stack

Shared agent types + LLM adapters — feat(appkit): shared agent types and LLM adapter implementations #301
Tool primitives + ToolProvider surfaces — feat(appkit): tool primitives and ToolProvider surfaces on core plugins #302
Plugin infrastructure (attachContext + PluginContext) — feat(appkit): plugin infrastructure — attachContext + PluginContext mediator #303
agents() plugin + createAgent(def) + markdown-driven agents — feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents #304
fromPlugin() DX + runAgent plugins arg + toolkit-resolver — feat(appkit): tools(plugins) DX, runAgent plugins arg, shared toolkit-resolver #305
Reference app + dev-playground + docs (this PR)

Demo

agent-demo.mp4

Removes the generated Interface.RegisteredPlugins.md page and updates TypeAlias.Plugins.md / index / sidebar to reflect the simplified Record<string, PluginToolkitProvider> shape. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…t/beta The agents manifest had "hidden": true, which excluded it from the template plugins.json — but the same flag is also used by generate-plugin-entries.ts to gate the auto-generated barrel beta-exports.generated.ts. Result: import { agents } from "@databricks/appkit/beta" failed at type level, agents() resolved to any in createApp's plugins tuple, and the entire PluginMap inference collapsed (appkit.<TAB> showed nothing in onPluginsReady). Drop "hidden": true. The codegen now emits export { agents } from "./agents" into beta-exports.generated.ts, the playground import resolves, and PluginMap inference is restored end-to-end. Side effect: agents now also appears in template/appkit.plugins.json so projects scaffolded via npx @databricks/appkit init pick it up out of the box. That matches v6's intent of shipping the agents plugin as part of the public beta surface. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Picks up the rewritten runAgent JSDoc with the new trust-boundary language and standalone-init contract notes. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Collapse the two-field `toolkits:` + `tools:` split in markdown agent frontmatter into a single `tools:` list that mirrors the TS function form (`tools(plugins) => { ...plugins.foo.toolkit(), get_weather: ... }`). Reviewer feedback on the v2/4 PR called out the asymmetry: YAML had two fields for what the TS surface treats as one record. Each entry in `tools:` is now one of: tools: - plugin:analytics # all of analytics - plugin:files: [uploads.read, uploads.list] # only filter - plugin:genie: { except: [getConversation] } # full ToolkitOptions - get_weather # ambient tool Strings starting with `plugin:` are plugin references; strings without the prefix are ambient tool names looked up in the agents() plugin's config. Object entries are single-key mappings keyed by `plugin:NAME` whose value is either an array (sugar for `{ only: [...] }`) or a full ToolkitOptions record. Same downstream resolution: plugin entries call `provider.toolkit(opts)` and merge in the resulting ToolkitEntries; ambient entries look up `ctx.availableTools[name]`. The unified list lets a single auto-inherit gate ("declaring `tools:` opts out") cover both cases without the prior double check. The legacy `toolkits:` key is no longer accepted — `parseFrontmatter` throws a focused migration error pointing users at the new prefix syntax. Migration: - apps/dev-playground/config/agents/query/agent.md — moved its `toolkits: - files: [...]` line into `tools:\n - plugin:files: [...]` - docs/docs/plugins/agents.md Level 2 example rewritten with the unified list and the four-shape rubric Tests: - parseFrontmatter "parses nested arrays" updated for the new shape; added a "rejects legacy toolkits: with migration error" test - loadAgentsFromDir tests covering: mixed plugin + ambient, `[only]` shorthand, full ToolkitOptions object, rejection of non-`plugin:` object keys, rejection of empty `plugin:` (which YAML parses as a `{ plugin: null }` mapping) - agents-plugin file-loaded auto-inherit test migrated to the new syntax Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…rate Address two P0 findings from the PR #306 agentic review. 1. tool() runtime crash on the template `npx @databricks/appkit init` produces a helper agent whose tools call `tool({ description, schema, execute })` with no `name` field (because the record key — `current_time`, `count_words` — already names the tool). The previous shape required `name: string`, so `isFunctionTool` rejected the returned FunctionTool and agent registration failed with "Agent 'helper' tool 'current_time' has an unrecognized shape" — the very first `pnpm dev` after init. Make `name` optional on both ToolConfig and FunctionTool, drop the `typeof obj.name === "string"` check from `isFunctionTool`, and widen `execute` from `Promise<string> | string` to `unknown | Promise<unknown>` since the template's tools naturally return objects (`{ now: ... }`). The agents plugin already overrides `name` with the record key at index-build time, so the original field was functionally redundant in keyed-record contexts. Zod error messages fall back to a generic "tool" label when the name is omitted. Downstream: agents plugin already normalises non-string results via `normalizeToolResult`; DatabricksAdapter already `JSON.stringify`s non-string returns at line 438-439. No further changes needed. 2. Level 1 docs lied about auto-inherit `docs/docs/plugins/agents.md` Level 1 said "auto-inherits every registered ToolProvider plugin's tools" but the Configuration Reference correctly states `autoInheritTools` defaults to `{ file: false, code: false }`. New users following Level 1 hit zero tools and confusion. Rewrite Level 1 narrative to be truthful: "The agent starts with no tools. Tools are opt-in — declare them in frontmatter (Level 2) or opt into auto-inherit explicitly with `agents({ autoInheritTools: { file: true } })`. See 'Auto-inherit posture' below." Same doc still had stale `fromPlugin(...)` examples in Level 3 + the standalone runAgent section. Rewritten to use the `tools(plugins) => ...` function form that v5 actually ships, matching the rest of the codebase. The runAgent section also gets a sentence on the new init contract (eager attachContext + setup, shared cache across sub-agents) and the trust-boundary caveat (no OBO, no approval gate). Tests: +3 covering the new shape — name optional, non-string returns, zod-error fallback label. Updated the "returns false when name is missing" guard test to assert the new "returns true" since the record key wins. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Address four agentic-review findings on PR #306. All four were P1 risks under the MCP path or the markdown agent reload flow. 1. mcp callTool joined undefined text into 'undefined' literal McpToolCallResult.content[].text is optional per the spec; the old filter only matched `type === "text"`, so an entry like { type: "text" } (text undefined) flowed through and `Array.join('\n')` emitted the literal string "undefined" — visible to the agent and to the user. Tighten the filter on both the success and error paths with a type predicate that narrows on `typeof c.text === "string"`. The error path falls back to a generic "MCP tool call failed" message when no text entries survive. 2. mcp sendNotification ignored HTTP error status `notifications/initialized` and other fire-and-forget notifications completed `connect()` even when the server returned 4xx/5xx — a silent connect followed by mysterious tool-call failures. MCP spec says notifications don't acknowledge, so we keep the fire-and-forget contract but log a warning when the server clearly rejected the request. Wrap the fetch in try/catch so network failures surface in the logs without breaking the protocol. 3. mcp response body had no size cap `response.text()` and `response.json()` both buffer the entire body into memory. A misconfigured or malicious MCP server could stream unbounded data to exhaust client memory. New `readResponseTextCapped()` helper streams the body via the ReadableStream API and aborts once cumulative bytes cross MCP_RESPONSE_BODY_LIMIT_BYTES (1 MiB — far above any normal `initialize` / `tools/list` / `tools/call` JSON-RPC response). Used on both the SSE and plain-JSON branches in sendRpc. 4. agents.reload() closed the live mcpClient mid-stream Tool dispatch reads `this.mcpClient` at call time, not via capture. The old `reload()` called `await this.mcpClient.close()` and nulled the field; any in-flight stream that resolved the field afterwards crashed with "MCP client is closed" mid-conversation. Drop the synchronous close. The client owns only short-lived fetch handles, so a reload doesn't need to tear down state — new endpoints append to the existing connection map via `connectHostedTools`, stale connections from removed configs become unreachable through the new agent tool indexes (small memory cost, no correctness hazard). The shutdown path still closes — that's process teardown, where in-flight streams have already been aborted via `abortActiveOperations`. Tests: 4 new MCP cases (undefined-text filter on success + error paths, generic error fallback, 1 MiB body cap), 1 reload-doesn't-close regression, plus a connect-still-succeeds-on-4xx-notification case. 2298 tests passing across the workspace. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Reviewer agentic finding #7 flagged ~20 lines of SSE parsing boilerplate in `template/client/src/pages/agents/AgentChat.tsx` that every scaffolded app would copy verbatim. The reviewer suggested extracting it; the cleaner shape is a thin React hook around the existing `connectSSE` utility in `@databricks/appkit-ui/js`, which already owns the buffer cap, abort signal composition, frame parsing, and retry/backoff logic. useAgentChat (new, in @databricks/appkit-ui/react) Single-stream chat primitive for agents-plugin chat endpoints. Wraps connectSSE and adds the stateful glue every chat UI ends up writing: - `content`: accumulates `response.output_text.delta` deltas - `events`: chronological log of every parsed event for replay / inspection - `threadId`: captured from the first `appkit.metadata` event and automatically forwarded on the next `send()` so the server reuses the same thread - `isStreaming` / `error` - `send(message)`: POSTs to `/api/agents/chat` (configurable via `endpoint`), streams the SSE response, parses each JSON-encoded data line, aborts any prior in-flight stream - `reset()`: drops state and aborts the active stream The hook stays narrow on purpose: one stream at a time, no multi-turn message-history ownership (callers compose that). Tool calls, approval gates, and any non-text event are surfaced through `onEvent` for the caller to render however it wants. `maxRetries: 0` is hard-wired into the connectSSE call — chat turns aren't idempotent and re-sending the payload on transient failure would either duplicate the user message or depend on server-side Last-Event-ID resumption (the agents plugin's StreamManager supports it, but failure-mode auditing is easier with retries explicitly off and the responsibility on the caller to re-send). Template migration `template/client/src/pages/agents/AgentChat.tsx` drops the inline fetch/decoder/SSE-buffer/parser block (~22 lines) and uses `useAgentChat({ agent, onEvent })` instead. Tool-call rows are pushed into the local `messages` array via the `onEvent` callback; the streaming assistant text mirrors from the hook's `content` field into the pending assistant row. Dev-playground's `useAgentStream` is left in place for now — it carries inspector + action-dispatcher wiring that doesn't fit on this hook. Migrating it onto `useAgentChat` is a follow-up that also covers the related Agentic finding #21 (duplicate SSE parsing across consumers). Tests: 12 new cases on useAgentChat covering posting payload shape, custom endpoint, delta accumulation, threadId capture + reuse, onEvent dispatch (including handlers that throw), malformed payload skipping, isStreaming lifecycle, reset(), send-while-streaming abort, and onError surfacing. 2310 tests passing across the workspace. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…nfo fetch Reviewer caught the redundancy: the agents plugin already exposes `{ agents, defaultAgent }` via `clientConfig()` at startup, inlined into the boot HTML via `<script id="__appkit__">`. The template was doing an extra `GET /api/agents/info` round-trip on mount to fetch the same data that's already sitting in `window.__APPKIT_CONFIG__` and readable via `usePluginClientConfig`. Replace the useEffect + fetch + error-handling block with a synchronous `usePluginClientConfig<AgentsClientConfig>('agents')`. Net savings: -20 lines of effect/error/loading-state plumbing, one fewer HTTP request on page load, no flicker between empty-then- populated agent buttons because the data is available on first render. `selectedAgent` is now initialised directly from `defaultAgent ?? agents[0] ?? null` in `useState`, so there's no intermediate "loading agents…" state. Input placeholder updates to "No agents registered" when the registry is empty (which only happens if the agents plugin loaded without any agents — config bug, worth showing). The `/api/agents/info` route is left in place on the server — it's still useful for programmatic callers and debugging — just no longer consumed by the template. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Reviewer agentic finding #21: dev-playground reimplemented the same SSE parsing the template did, creating two divergent implementations of one concern. The new `useAgentChat` hook in @databricks/appkit-ui already owns the fetch + frame parsing + state plumbing; the playground hook is now a thin wrapper that adds the two playground- specific concerns and re-exports the legacy API surface. The wrapper keeps two layers on top of the shared hook: 1. `contextPrefix` on send() — the Smart Dashboard injects active filter / highlight state into the user message so the agent sees the UI state on every turn. Composed in this wrapper rather than polluting the shared hook surface. 2. Stream-inspector wiring — every send opens a `StreamRecord` via `beginStreamRun` and forwards every event to `recordStreamEvent` so the inspector drawer can render the timeline. The inspector is a dev-playground feature, not a shared concern. API stays drop-in compatible: `agentName` (not `agent`), `isLoading` (not `isStreaming`), `send(message, { contextPrefix })`. `SSEEvent` is re-exported as a type alias for `AgentChatEvent` so `use-stream-inspector.ts` and other consumers don't need to be touched. `routes/smart-dashboard.route.tsx`, `agent-sidebar.tsx`, and `quick-actions-bar.tsx` work unchanged. Error handling preserves the prior UX: fetch-level failures now project from `useAgentChat.error` into the streamed `content` as `"Error: ..."`, matching the dashboard's assistant-message rendering of the previous hook's `setContent('Error: ...')` path. Net: 159 -> ~100 lines on the playground hook, zero duplicated SSE parsing across consumers, single source of truth in `@databricks/appkit-ui/react`. Workspace tests: 2310 pass unchanged. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Reviewer comment 3200867453: CHANGELOG.md shouldn't be touched by feature branches. The Stage 2 publish in the secure release repo runs release-it (per `.release-it.json`) and applies the changelog, version bump, and tag on its own. A merged PR that hand-edits the file just makes life harder for the bot. The two issues called out in the comment: 1. An "Unreleased" entry labelled "Breaking change" was added for the markdown agent folder layout. The agents plugin hasn't shipped yet — there's no released contract to break, so this entry is misleading and the breaking-change marker is wrong. 2. The diff also re-inserted historical release entries (0.25.1, 0.25.0) inside the file, which release-it would have to reconcile. This commit reverts CHANGELOG.md to whatever sits on main. The pre-existing "many `# Changelog` headers" pile-up the reviewer laughed about is a separate generation bug on the release pipeline, not in scope here. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…ate timeouts, forward signals Six P2 findings from the agentic review on agent/v2/6-apps-docs. Batched as one commit because the rename in finding 15 overlaps with the annotation migration in finding 12 on every plugin file. Finding 15 — align tool() and defineTool() callback names. tool({ execute }) is the public agent-author surface; defineTool({ handler }) is the plugin-author surface. A reviewer pointed out that switching between them forced re-learning the same field. They now both spell it execute. ToolEntry.execute carries the same signature (args, signal?) => unknown | Promise<unknown>. Mechanical rename across the four core plugins (analytics, lakebase, files, genie) and all unit tests. HTTP RouteEntry.handler is unrelated and was left alone. Findings 12 + 13 — migrate plugins to the effect enum, drop stale destructive: true references. ToolAnnotations.effect ("read" | "write" | "update" | "destructive") has been the preferred field for some time; the legacy readOnly and destructive booleans are deprecated. The plugins were still authoring the old fields: analytics.query readOnly: true -> effect: "read" files.list/.read/... readOnly: true -> effect: "read" files.upload/.delete destructive: true -> effect: "destructive" genie.getConversation readOnly: true -> effect: "read" genie.sendMessage (unset) -> effect: "read" lakebase.query readOnly + destructive (redundant pair) -> effect: readOnly ? "read" : "destructive" The approval gate already honours both forms (see requiresApproval in agents.ts) so behaviour is unchanged. The redundancy lakebase had — setting destructive: !readOnly alongside the readOnly bool — is gone. Stale comments referring to "tools annotated destructive: true" in shared/src/agent.ts, core/agent/types.ts, and docs/docs/plugins/agents.md now lead with effect and document the legacy boolean as a backward-compat path. Finding 11 — validate approval.timeoutMs. cfg.timeoutMs from user config was used verbatim. A misconfiguration (0, negative, NaN, Infinity, or any value below ~1s) made every mutating tool call auto-deny instantly because the gate's wait resolved before any UI could render. The new resolvedApprovalPolicy getter: - clamps to a 1 000 ms floor (anything below it falls back to the 60 000 ms default) - rejects non-finite values (NaN, ±Infinity) by the same path - logs a single warning so the misconfig is visible - memoises the result so the warning never repeats across the thousands of stream lifecycle reads that this getter sees New test file: approval-config.test.ts (8 tests covering the floor, NaN, Infinity, negative, default fallback, valid pass-through, and the memo invariant). Finding 14 — document AgentDefinition.name precedence. The field's old doc string ("Filled in from the enclosing key when used in agents: { foo: def }") was misleading: the registry key always wins, the explicit name is ignored when registered via the agents: {...} record or via the markdown folder layout. Expanded the JSDoc to say so plainly, list the legitimate use cases (standalone runAgent and consistent fallback labelling), and warn that setting name to a value that differs from the registry key is harmless but confusing. Finding 8 — forward AbortSignal in files and genie tool handlers. Analytics and lakebase already forward the signal arg to their underlying I/O. Files and genie ignored it entirely. Genie now plumbs signal end-to-end. sendMessage takes options.signal and forwards to streamSendMessage, which already accepts signal at the connector boundary. getConversation takes a signal parameter and throws on entry if it's already aborted (the connector's pagination loop is signal-agnostic today; this catches the common case where the tool was queued after the user cancelled, without churning the deeper connector surface). Files honours signal?.throwIfAborted() at every handler's entry. The VolumeAPI and the four volume connectors below it don't accept signal, so deeper mid-call cancellation is still out of scope — but a queued tool call that's already cancelled by the time dispatch reaches the handler now unwinds cleanly instead of running to completion and emitting a wasted result. Tests: 2310 -> 2318 (added the 8-case approval-config suite). Build + docs:build + knip + check:fix clean. Typedoc regenerated to pick up the rename and the new JSDoc. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…required + cleanup Two P3 findings plus a consistency sweep across the new agent surface. Finding 18 — surface partial failures from MCP connectAll. Used to swallow per-endpoint failures into log.error and return void, so callers couldn't distinguish "all connected" from "one of three failed". The structured return McpConnectAllResult exposes { connected: string[], failed: Array<{ name, error }> }; the agents plugin reads it and emits one aggregate startup warning naming the failed endpoints alongside the per-endpoint diagnostics. Boot still succeeds — one broken hosted-tool endpoint shouldn't take down every agent — but operators now see "agent X registered without N hosted tool endpoints" instead of a silent gap. McpConnectAllResult is re-exported from beta.ts so user code that drives connectAll directly can type its result. 4 new unit tests cover the four shapes: all-success, partial, all-failed, and a non-Error rejection wrapping path so the public contract always hands callers a real Error. Finding 19 — tool() description is now required. description was optional and silently fell back to config.name when omitted. A tool named "get_weather" would then show up to the LLM with description="get_weather" — a cryptic identifier that gave the model no signal about expected use. Switched description: string at the type level and dropped the ?? config.name fallback, so the failure mode shifts from "agent silently behaves oddly in production" to "tsc rejects the call at authoring time". Updated the test that asserted the old fallback and added a description to the three other tests that omitted it. No production callsite was affected (template, dev-playground, and docs already passed a description). Consistency sweep across the new agent surface. After the handler→execute rename and the toolkits:→tools: unification in earlier PRs in this stack, several JSDoc strings still referred to dead concepts and one runtime path had drifted from a sibling. Cleanups in this commit: - tool-approval-gate.ts header still said "destructive: true agent tool calls"; rewritten to lead with the effect enum and document the legacy boolean as a backward-compat path, matching the requiresApproval gate and the docs. - AutoInheritToolsConfig and ToolEntry.autoInheritable JSDoc referred to "markdown toolkits:"; updated to "plugin:NAME entries in the unified tools: list" so the prose tracks the parser. - LoadContext.plugins and AgentsPlugin.pluginProviderIndex used the same dead toolkits: phrasing — same fix. - lakebase.query previously ignored signal, even though the agentic review claimed it forwarded. Aligned with the files plugin's pattern: signal?.throwIfAborted() at handler entry. Deeper plumbing into the pg connection still wants its own pass, but the common cancel-before-dispatch case now unwinds cleanly instead of running to completion against Postgres. - core/agent/load-agents.ts had a module-private `type ToolEntry` that shadowed the exported public-API ToolEntry from tools/define-tool.ts — different layers, same name. Renamed the local type to FrontmatterToolEntry; ToolEntry now refers to the plugin-author API everywhere it appears. - Stale biome-ignore suppression on approval-config.test.ts removed after the underlying noExplicitAny rule stopped firing on that pattern. Tests: 2322 -> 2322 (replaced the description-fallback test with the new required-description shape, added 4 connectAll tests). Build + docs:build + knip + typecheck + check:fix clean. Typedoc regenerated; new McpConnectAllResult page published. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Closes the four remaining reviewer items on the agents stack and the CI gap that has been red since the helper agent was rebuilt. CI — playground integration tests. The playground's `helper` agent has no explicit `model:` and falls back to DatabricksAdapter.fromModelServing, which reads DATABRICKS_SERVING_ENDPOINT_NAME at setup() time. The integration test job never sets that env, so the agents plugin throws while resolving the adapter and the webServer health check times out before any test runs. APPKIT_E2E_TEST swaps in the mock WorkspaceClient at request time so no real serving traffic happens — the only thing the job needed was a non-empty endpoint name. Added DATABRICKS_SERVING_ENDPOINT_NAME=e2e-mock alongside the existing DATABRICKS_WAREHOUSE_ID / DATABRICKS_WORKSPACE_ID mocks. Reviewer comment 3201574964 — agents serving-endpoint manifest copy. The optional `serving_endpoint` resource in the agents plugin's manifest leaked internal API names (`fromModelServing`) into the scaffold-time description and used the awkward alias "Model Serving (agents)". A user running `apps init` saw both. Rewrote the entry: - alias: "Model Serving (agents)" → "Default LLM for agents" - description leads with intent ("the LLM your agents will call when they don't pin their own model"), spells out when it's required (every agent that omits `model:` or `endpoint:`), names the runtime env var, notes the shared-with-serving-plugin ergonomics, and warns up front that only streaming-capable endpoints work. - field description matches. `template/appkit.plugins.json` regenerated through `sync:template`. Reviewer comment 3207721724 — streaming-endpoint requirement in docs. Custom MLflow / sklearn endpoints don't implement the chat- completions streaming protocol; pointing an agent at one fails with "Response body is null — streaming not supported" on the first turn. This was nowhere in the agents plugin docs. Added a Requirements section near the top of docs/docs/plugins/agents.md with an `:::info` callout that names which endpoint families work (Foundation Model APIs, external chat models), which don't (typical sklearn/MLflow pyfunc), and points users at the `serving` plugin's `/invoke` route + `useServingInvoke` hook for the non-streaming path. Reviewer comment 3201691529 — drop the `assistant` agent. Both the scaffolded template and the dev-playground shipped a markdown `assistant` agent that did nothing the code-defined `helper` couldn't (helper's prompt already says "For anything else, answer briefly in plain text"). It added a second tab in the chat UI that scaffolded users had to read past, with no functional payoff. Dropped: - template/config/agents/assistant/agent.md - apps/dev-playground/config/agents/assistant/agent.md Followups in the same change: - The playground's `assistant` was the only markdown agent with `default: true`; without it the agents plugin would fall back to first-registered, which alphabetically lands on the `anomaly` ephemeral. Pinned `defaultAgent: "helper"` on the playground's agents() config so the bare /agent route lands on a conversational agent. - Updated `template/server/agents/helper.ts` and `template/client/src/pages/agents/AgentChat.tsx` so the prose docs talk about a single starter agent and a drop-in markdown pattern, rather than referencing the now-gone assistant. Markdown-form coverage is still demonstrated by every other markdown agent in the playground (`anomaly`, `autocomplete`, `insights`, `query`) and by the docs Level 1 / Level 2 walkthrough. Scaffolded users get a working tool-calling demo immediately instead of two tabs that look the same. Tests: 2322 pass unchanged. Typecheck + build + docs:build + knip + check:fix clean. Template manifest regenerated via sync:template. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Earlier in this stack I dropped `assistant` from the template to address reviewer comment 3201691529. The reviewer's concern was real (assistant did nothing helper couldn't), but the fix went too far — the template lost its pedagogical contrast between the imperative code form and the declarative markdown form, which is one of the agents plugin's selling points. Replaced assistant with `planner`: a markdown agent that's genuinely distinct from helper, not a near-duplicate. - **helper** (code, server/agents/helper.ts) uses `tool()` to expose `current_time` and `count_words`. Demonstrates the imperative form for agents whose value is in what they *do*. - **planner** (markdown, config/agents/planner/agent.md) is pure prose with no tools. Its system prompt steers behaviour: restate the goal, surface open questions, propose a small ordered plan, call out risks. Demonstrates the declarative form for agents whose value is in *how they think*. The contrast lines up cleanly: tools-vs-prose, executes-vs-reasons, compiled-vs-edit-and-go. Switching tabs in the chat UI now shows two genuinely different conversations, which is what the reviewer's "why is there a second tab" question was really probing. The planner prompt avoids two failure modes the prior `assistant` hit: - It's not a pale-imitation generalist. The prompt is specific to a planning role (restate, ask before assuming, ordered output) so the agent has visible behaviour the user can recognise. - It explicitly tells the model it has no tools and points at helper when the user wants an action — so a scaffolded user naturally discovers the second tab through the conversation itself. Wiring follow-ups: - planner carries `default: true` so the template's default tab is conversation-first, matching the old behaviour. - helper's docstring and AgentChat.tsx prose now describe the dual demo and the file paths behind each tab. - The empty-state hint suggests one prompt per tab so the contrast is the first thing a new user experiences. Playground intentionally untouched — it already has four markdown agents (anomaly, autocomplete, insights, query) demonstrating the form, so assistant was redundant there and stays removed. Tests: 2322 pass. Typecheck clean. No code paths beyond the new markdown file and the prose updates around it. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

…gent The previous template change exposed `planner` (markdown) and `helper` (code) as two top-level chat tabs. The dual-form story was right, but the UX was confusing — switching tabs felt like switching apps even though both agents handled the same conversations. Restructured around the agents plugin's sub-agent feature instead: - `planner` (markdown, `config/agents/planner/agent.md`) is now the only user-facing chat. Its frontmatter declares `agents: [helper]`, so the agents plugin exposes `helper` to the planner prompt as an `agent-helper` tool. Planner's body tells the model to delegate to `agent-helper` for computational actions (clock, word count) and stay in prose for planning conversations. - `helper` (code, `server/agents/helper.ts`) stays registered at the top level so the markdown loader's two-pass resolver can find it by name when wiring planner's sub-agents. The chat UI just doesn't surface it — the template renders one chat against the default agent and drops the tab picker entirely. - `AgentChat.tsx` lost the agent-tab list; it talks straight to the default agent (planner) and renders sub-agent calls inline as `tool · agent-helper` rows, same shape it already uses for function-tool calls. Empty-state hint nudges the user toward both planning prompts ("help me plan a feature") and the delegation path ("what time is it?" — watch planner call agent-helper). - `helper.ts` docstring rewritten so a scaffolded reader sees the coordinator-plus-worker pattern, why each side lives in its own file (markdown for prose, code for typed tools), and that the template is meant to be a copy-paste starting point for the pattern. No plugin/runtime changes. Both agents stay top-level registered; visibility is a UI-side choice, not a plugin feature. If a user later wants two chats they can re-add a tab list reading `agents[]` from `usePluginClientConfig('agents')` — the data is still there. Tests: 2322 pass. Typecheck/build clean. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

The landing page had a special hero treatment for `/smart-dashboard` plus an opacity-muted copy of the same card in its category grid labelled "Featured above". With the agents stack expanding, this was adding noise — every category now has agents demos and visually elevating one of them implied a hierarchy that doesn't really exist. Stripped the special-case: - Removed the FeaturedCard component and the `featured` prop on DemoCard. The grid is now uniform. - Removed the "Featured above" muted treatment in DemoCard. Every card renders identically; the user picks by category and label. - The landing-page comment block now describes the simpler shape (hero + grouped category grid) so future readers don't go hunting for a featured slot. Smoke test updated. The smoke test asserted on the old hero subtitle copy ("Explore the capabilities of the AppKit") which the hero hasn't shipped for a while — it's been "A living catalog of what AppKit can do — ..." through this stack. The test was passing in CI before because the webServer health check timed out earlier (the DATABRICKS_SERVING_ENDPOINT_NAME env miss masked it), and now that the env is set, the smoke step actually runs and trips on the stale assertion. Updated to match a stable substring of the current copy with `exact: false`, so future prose tweaks don't keep breaking the smoke test — the goal here is "the page rendered", not "the exact marketing line". No other behaviour change; the catalog and nav data in @/lib/nav.ts are untouched. /smart-dashboard is still in its category grid as a normal demo card. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

The collapsible header in saved-views-panel.tsx wrapped the title row, the spinner/refresh action, and the chevron in one outer `<button>`, with the refresh action implemented as a nested `<button>` inside it. That's invalid HTML — `<button>` cannot contain `<button>` — and React's dev build was flagging it as a hydration warning on every render of the Smart Dashboard: validateDOMNesting: <button> cannot contain a nested <button>. ... saved-views-panel.tsx:91 saved-views-panel.tsx:72 The fix is purely structural; no behaviour or visual change: - The header row is now a flex `<div>` rather than a flex `<button>`, so its children can be sibling buttons instead of nested ones. - The title cluster (icon + "Saved views" + count) is one button that toggles open/close. - The chevron is its own toggle button (same `setOpen` flip); making it a real button gives keyboard users a second focusable target and bakes the `aria-expanded` state onto something that announces itself. - Refresh is a third sibling button, no longer wrapped in the toggle. Drops the `e.stopPropagation` it needed when it was a child of the outer toggle button. Added `aria-expanded` and proper `aria-label`s on both toggle buttons since the chevron is now keyboard-reachable. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Reported by the user testing the template's helper agent against gemini-flash-lite. After the first tool call (planner -> agent-helper sub-agent), Vertex returned: INVALID_ARGUMENT: function call \`agent-helper\` in the 6. content block is missing a \`thought_signature\` Vertex AI's OpenAI-compatible surface for Gemini 2.x models (Flash, Flash-Lite, Pro) attaches an opaque \`thought_signature\` blob on every function call the model emits. The next assistant-with-tool_calls message in the conversation history must echo it back verbatim or Vertex rejects with that 400. The DatabricksAdapter was building OpenAIToolCall objects from streamed deltas without preserving the field, so the follow-up request from inside the same turn (the one that pushes tool results back to the model) failed before we ever got a final assistant turn. The fix has two halves — one for the live single-turn case (the reported bug) and one for resumed threads loaded from ThreadStore. Single-turn (the reported bug): - Extended OpenAIToolCall + DeltaToolCall with thought_signature. - The streaming accumulator now captures the field from either the top level of the tool_call delta or nested inside \`function\` — Vertex's proxy has been observed to use both locations. - When constructing final OpenAIToolCall objects from the accumulator, the signature is attached at the top level (matches Vertex's OpenAI-compat docs). - The existing messages.push(...) flow already preserves the full tool_calls array verbatim, so once the field is on the object it reaches the wire on the next streamCompletion call within the same run() invocation. Resumed threads: - Extended the shared ToolCall type with an optional thoughtSignature so it can survive ThreadStore round-trips. - buildMessages now reads tc.thoughtSignature off the persisted ToolCall and emits it as thought_signature on the wire shape so the first request of a resumed turn doesn't fail Vertex's signature check before any tool call even fires. Non-Gemini endpoints (Claude on Databricks, OpenAI-compat external models, Llama, etc.) leave the field undefined; the spread on the emit-side omits the key entirely, so we don't pass a meaningless field to models with stricter shape validators. Tests (4 new in a Vertex/Gemini describe block, total 2322 -> 2326): - top-level thought_signature captured from a streamed delta and echoed verbatim on the second request - nested function.thought_signature captured from a streamed delta and canonicalised to top level on the second request - non-Gemini path (no signature in the delta) does NOT emit a thought_signature key — defends against accidentally inventing one when talking to Claude/OpenAI - resumed thread: persisted Message.toolCalls[*].thoughtSignature flows through buildMessages to the wire's thought_signature on the very first request of the new turn Plain typecheck + workspace test + build clean. No public-API breakage: thoughtSignature is optional everywhere it appears. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

User reported tsc errors in their scaffolded app on every tool({...}) callsite: ZodObject<{}, $strip> is missing the following properties from type ZodType: toJSONSchema, with, exactOptional, apply Root cause: zod version skew between AppKit and the template. packages/appkit pinned zod@4.3.6; template/package.json pinned zod@4.1.13. The methods AppKit's published `.d.ts` advertises on `ZodType` (`toJSONSchema`, `with`, `exactOptional`, `apply`) were added in 4.2 — the user's scaffolded `z.object({})` was a different structural type than the one AppKit's `tool()` constraint expects, even though both are major version 4. Two changes: 1. Template now pins zod@4.3.6, matching AppKit. Scaffolded apps see exactly the ZodType AppKit's published types reference, so `tool({ schema: z.object(...) })` type-checks without surprises. 2. tools/check-template-deps.ts now cross-checks an allowlist of version-sensitive runtime deps (`zod` for now) against packages/appkit/package.json. Future bumps of zod in AppKit without a matching bump in the template fail this check at CI time with a clear message ("Version skew on zod: template pins X, packages/appkit pins Y") instead of surfacing as a confusing ZodObject mismatch in user-land tsc output. Header doc on the check tool now explains both guards: the supply-chain pin check that was already there, and the new cross-version guard for deps whose types leak through AppKit's public API surface. The list is intentionally small (just zod today); add to SYNCED_DEPS as other public-typed deps appear. Workspace tests still 2326 pass. Typecheck clean across all packages and the docs site. No lockfile churn; pnpm resolves zod@4.3.6 the same way for both packages now. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

pkosiec

Code LGTM overall, I'm about to test the app and will gave a final ✅ if everything is alright 👍

Live capture against `gemini-3.1-flash-lite-preview` showed Vertex's OpenAI-compat proxy emits the signature only as `thoughtSignature` (camelCase) at the top level of the tool_call delta, and accepts the same spelling back on outbound. The dual-spelling capture/emit added in the previous fix was speculative defense without runtime evidence. Simplifies: - `OpenAIToolCall` and `DeltaToolCall` carry only `thoughtSignature` - Inbound capture is `tc.thoughtSignature` (no `??` fallback) - Outbound emit on fresh tool_calls and on `buildMessages` resume is single-key - Test helper unified to `sig`, snake_case-only test removed, negative test asserts both spellings stay absent as a regression guard - Drops two pre-existing dead `biome-ignore noExplicitAny` suppressions in the test file (rule is disabled in tests via overrides) Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

pkosiec

LGTM, just a minor comment

The agents plugin's default-LLM resource description had grown into a paragraph covering when it's needed, the env var, the cross-plugin sharing, and which endpoints qualify. That copy is rendered by the plugin CLI, where it pushes everything else off-screen and reads as docs-in-a-tooltip. Match the sibling plugins' style (serving, analytics, lakebase): one short clause for the resource, a noun phrase for the field. The edge-case detail belongs in docs/plugins/agents.md, not in the CLI. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Fixes the actionable findings from the second-pass agentic review on PR #306, all small and confidence-validated against the actual code. - agents.ts: untrackStream fallback aligned with trackStream (?? 0) so the asymmetry stops masking a drift the comment claims cannot happen. - agents.ts: dispatchToolCall guards function-tool args before casting to Record<string, unknown>; a non-object value now throws a clear error instead of silently passing a wrong shape to the tool. - mcp/client.ts: new coerceToolParameters() validates the MCP server's reported inputSchema before exposing it as an AgentToolDefinition. Malformed schemas fall back to an empty parameters object so the tool still registers but cannot accept arguments — defense against a misbehaving (or hostile) MCP server. - tool.ts: drop redundant `as unknown as Record<string, unknown>` double cast; toToolJSONSchema already returns that type. - template AgentChat.tsx: scroll effect now depends on [messages, content] so the viewport actually follows new replies instead of freezing at the top. - dispatch-tool-call.test.ts: new test for runSubAgent depth-limit rejection — exercises the maxSubAgentDepth backstop that the cycle-detection comment relies on. Tool-call budget exhaustion was flagged as untested but the existing "rejects + aborts when the budget is exhausted" test already covers it; no new test needed there. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Express's stock 100kb default rejected the dev-playground smart-dashboard "save view" upload with a 413 PayloadTooLargeError before the route handler ever ran. The request body was ~105KB — a base64-encoded screenshot of the dashboard, barely over the cap. Same shape will hit any other AppKit consumer that posts modest base64 uploads, light file metadata, or just large agent chat payloads. Add `bodyLimit?: string` to ServerConfig and default to 1mb. Covers the common cases without opening a wide DoS surface; consumers that legit- imately accept larger uploads can opt in explicitly via `server({ bodyLimit: "10mb" })`. Runtime evidence (pre-fix vs post-fix logs on the same flow): - Pre-fix: content-length 105139, body-parser 413'd before the handler was reached. - Post-fix: content-length 104791, handler reached with bodyKeys [name, filters, highlights, pngBase64]. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

Mirrors the tightening in 91c9166, which shortened the agents plugin's default-LLM resource description in the canonical manifest but didn't re-run the template sync. The template's appkit.plugins.json is the copy users see at `npx @databricks/appkit init` time, so it needs to match what the CLI renders for the installed package. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch from 162e970 to 29e3534 Compare April 21, 2026 20:41

MarioCadenas force-pushed the agent/v2/6-apps-docs branch from 57cc1e4 to 4a441d2 Compare April 21, 2026 20:41

MarioCadenas mentioned this pull request Apr 22, 2026

feat(appkit): zero-trust MCP host policy with URL allowlist and scoped auth #307

Closed

7 tasks

MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch from 29e3534 to b462716 Compare April 22, 2026 08:45

MarioCadenas force-pushed the agent/v2/6-apps-docs branch 2 times, most recently from d16cdd5 to e81d8bb Compare April 22, 2026 09:24

MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch 2 times, most recently from 539487e to dac73b5 Compare April 22, 2026 09:46

MarioCadenas force-pushed the agent/v2/6-apps-docs branch from e81d8bb to 829ad13 Compare April 22, 2026 09:46

MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch from dac73b5 to 624f2a0 Compare April 22, 2026 09:59

MarioCadenas force-pushed the agent/v2/6-apps-docs branch 2 times, most recently from 3386200 to 263f587 Compare April 22, 2026 10:21

MarioCadenas added 21 commits May 8, 2026 18:14

chore(appkit): regenerate typedoc after runAgent docstring update

00e5702

Picks up the rewritten runAgent JSDoc with the new trust-boundary language and standalone-init contract notes. Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>

chore: package-lock

b933498

chore: package-lock

aa58ff4

pkosiec reviewed May 12, 2026

View reviewed changes

Comment thread .github/workflows/ci.yml Outdated

Comment thread .github/workflows/ci.yml

pkosiec approved these changes May 12, 2026

View reviewed changes

Comment thread packages/appkit/src/plugins/agents/manifest.json Outdated

MarioCadenas added 6 commits May 12, 2026 12:11

chore: fixup

e1481c3

chore: fixup

abe81d4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(appkit): reference agent-app, dev-playground chat UI, docs, and template#306

feat(appkit): reference agent-app, dev-playground chat UI, docs, and template#306
MarioCadenas merged 56 commits into
mainfrom
agent/v2/6-apps-docs

MarioCadenas commented Apr 21, 2026 •

edited

Loading

Uh oh!

pkosiec left a comment

Uh oh!

Uh oh!

Uh oh!

pkosiec left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MarioCadenas commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference application: agent-app

dev-playground chat UI + autocomplete agent

Docs

Template

Test plan

PR Stack

Demo

Uh oh!

pkosiec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pkosiec left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarioCadenas commented Apr 21, 2026 •

edited

Loading