Skip to content

feat(appkit): reference agent-app, dev-playground chat UI, docs, and template#306

Merged
MarioCadenas merged 56 commits into
mainfrom
agent/v2/6-apps-docs
May 12, 2026
Merged

feat(appkit): reference agent-app, dev-playground chat UI, docs, and template#306
MarioCadenas merged 56 commits into
mainfrom
agent/v2/6-apps-docs

Conversation

@MarioCadenas
Copy link
Copy Markdown
Collaborator

@MarioCadenas MarioCadenas commented Apr 21, 2026

Final layer of the agents feature stack. Everything needed to
exercise, demonstrate, and learn the feature.

Reference application: agent-app

apps/agent-app/ — a standalone app purpose-built around the agents
feature. Demonstrates every major capability in one place:

  • Markdown orchestrator (config/agents/assistant.md, default)
    with destructive file tools (upload, delete) for HITL demo, and
    agents: [support, researcher] delegating to both a markdown
    sibling and a code-defined specialist.
  • Markdown specialist (config/agents/support.md) — full analytics
    • files toolkits, plus ambient get_weather.
  • Code-defined specialist (researcher in server.ts) — defined
    in code specifically because its MCP tool set is conditional on
    runtime env vars, which markdown frontmatter can't express.
    Referenced from assistant.md via the markdown → code cross-reference.
  • server.ts — concise: ambient tool() factories, conditional MCP
    wiring, zero-trust host allowlist derived from the same env vars,
    agents() plugin config with autoInheritTools and mcp.trustedHosts.
  • Vite + React 19 + TailwindCSS frontend with a chat UI showing
    streaming tokens, tool calls, and an approval card that approves or
    denies destructive tool requests over /api/agents/approve.
  • Databricks deployment config (databricks.yml, app.yaml) and
    .env.example for local dev.

dev-playground chat UI + autocomplete agent

apps/dev-playground/client/src/routes/agent.route.tsx — chat UI
with inline autocomplete (hits the autocomplete markdown agent
configured with ephemeral: true) and a full threaded conversation
panel (hits the default agent).

apps/dev-playground/server/index.ts — code-defined helper agent
using fromPlugin(analytics) alongside the markdown-driven
autocomplete agent in config/agents/, demonstrating the mixed
setup against the same plugin list. Route tree (routeTree.gen.ts)
regenerated to include the new /agent route.

Docs

docs/docs/plugins/agents.md — progressive guide covering:

  1. Drop a markdown file → it just works.
  2. Scope tools via toolkits: / tools: frontmatter.
  3. Code-defined agents with fromPlugin().
  4. Sub-agents (markdown → markdown, markdown → code, code → code).
  5. Standalone runAgent() (no createApp or HTTP).

Plus configuration reference (including approval, limits, mcp
keys), runtime API reference, and a full frontmatter schema table.

docs/docs/api/appkit/ — typedoc regenerated for the full agents
public surface including AgentDefinition.ephemeral,
AgentsPluginConfig.{approval, limits, mcp}, updated
loadAgentFromFile / loadAgentsFromDir signatures, expanded
AgentEvent union, and ToolkitEntry annotations.

Template

template/appkit.plugins.json — adds the agents plugin entry so
npx @databricks/appkit init --features agents scaffolds the plugin
correctly.

Test plan

  • Full appkit vitest suite: 1552 tests passing at stack tip
  • Typecheck clean across all 8 workspace projects
  • pnpm docs:build clean (no broken links)
  • pnpm --filter=@databricks/appkit build:package clean, publint clean

Signed-off-by: MarioCadenas MarioCadenas@users.noreply.github.com

PR Stack

  1. Shared agent types + LLM adapters — feat(appkit): shared agent types and LLM adapter implementations #301
  2. Tool primitives + ToolProvider surfaces — feat(appkit): tool primitives and ToolProvider surfaces on core plugins #302
  3. Plugin infrastructure (attachContext + PluginContext) — feat(appkit): plugin infrastructure — attachContext + PluginContext mediator #303
  4. agents() plugin + createAgent(def) + markdown-driven agents — feat(appkit): agents() plugin, createAgent(def), and markdown-driven agents #304
  5. fromPlugin() DX + runAgent plugins arg + toolkit-resolver — feat(appkit): tools(plugins) DX, runAgent plugins arg, shared toolkit-resolver #305
  6. Reference app + dev-playground + docs (this PR)

Demo

agent-demo.mp4

This was referenced Apr 21, 2026
@MarioCadenas MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch from 162e970 to 29e3534 Compare April 21, 2026 20:41
@MarioCadenas MarioCadenas force-pushed the agent/v2/6-apps-docs branch from 57cc1e4 to 4a441d2 Compare April 21, 2026 20:41
@MarioCadenas MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch from 29e3534 to b462716 Compare April 22, 2026 08:45
@MarioCadenas MarioCadenas force-pushed the agent/v2/6-apps-docs branch 2 times, most recently from d16cdd5 to e81d8bb Compare April 22, 2026 09:24
@MarioCadenas MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch 2 times, most recently from 539487e to dac73b5 Compare April 22, 2026 09:46
@MarioCadenas MarioCadenas force-pushed the agent/v2/6-apps-docs branch from e81d8bb to 829ad13 Compare April 22, 2026 09:46
@MarioCadenas MarioCadenas force-pushed the agent/v2/5-fromplugin-runagent branch from dac73b5 to 624f2a0 Compare April 22, 2026 09:59
@MarioCadenas MarioCadenas force-pushed the agent/v2/6-apps-docs branch 2 times, most recently from 3386200 to 263f587 Compare April 22, 2026 10:21
Removes the generated Interface.RegisteredPlugins.md page and updates
TypeAlias.Plugins.md / index / sidebar to reflect the simplified
Record<string, PluginToolkitProvider> shape.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…t/beta

The agents manifest had "hidden": true, which excluded it from the
template plugins.json — but the same flag is also used by
generate-plugin-entries.ts to gate the auto-generated barrel
beta-exports.generated.ts. Result: import { agents } from
"@databricks/appkit/beta" failed at type level, agents() resolved to
any in createApp's plugins tuple, and the entire PluginMap inference
collapsed (appkit.<TAB> showed nothing in onPluginsReady).

Drop "hidden": true. The codegen now emits export { agents } from
"./agents" into beta-exports.generated.ts, the playground import
resolves, and PluginMap inference is restored end-to-end.

Side effect: agents now also appears in template/appkit.plugins.json
so projects scaffolded via npx @databricks/appkit init pick it up
out of the box. That matches v6's intent of shipping the agents
plugin as part of the public beta surface.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Picks up the rewritten runAgent JSDoc with the new trust-boundary
language and standalone-init contract notes.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Collapse the two-field `toolkits:` + `tools:` split in markdown agent
frontmatter into a single `tools:` list that mirrors the TS function
form (`tools(plugins) => { ...plugins.foo.toolkit(), get_weather: ... }`).
Reviewer feedback on the v2/4 PR called out the asymmetry: YAML had
two fields for what the TS surface treats as one record.

Each entry in `tools:` is now one of:

  tools:
    - plugin:analytics                              # all of analytics
    - plugin:files: [uploads.read, uploads.list]    # only filter
    - plugin:genie: { except: [getConversation] }   # full ToolkitOptions
    - get_weather                                   # ambient tool

Strings starting with `plugin:` are plugin references; strings without
the prefix are ambient tool names looked up in the agents() plugin's
config. Object entries are single-key mappings keyed by `plugin:NAME`
whose value is either an array (sugar for `{ only: [...] }`) or a full
ToolkitOptions record.

Same downstream resolution: plugin entries call `provider.toolkit(opts)`
and merge in the resulting ToolkitEntries; ambient entries look up
`ctx.availableTools[name]`. The unified list lets a single auto-inherit
gate ("declaring `tools:` opts out") cover both cases without the prior
double check.

The legacy `toolkits:` key is no longer accepted — `parseFrontmatter`
throws a focused migration error pointing users at the new prefix
syntax.

Migration:
- apps/dev-playground/config/agents/query/agent.md — moved its
  `toolkits: - files: [...]` line into `tools:\n  - plugin:files: [...]`
- docs/docs/plugins/agents.md Level 2 example rewritten with the
  unified list and the four-shape rubric

Tests:
- parseFrontmatter "parses nested arrays" updated for the new shape;
  added a "rejects legacy toolkits: with migration error" test
- loadAgentsFromDir tests covering: mixed plugin + ambient, `[only]`
  shorthand, full ToolkitOptions object, rejection of non-`plugin:`
  object keys, rejection of empty `plugin:` (which YAML parses as a
  `{ plugin: null }` mapping)
- agents-plugin file-loaded auto-inherit test migrated to the new
  syntax

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…rate

Address two P0 findings from the PR #306 agentic review.

1. tool() runtime crash on the template

   `npx @databricks/appkit init` produces a helper agent whose tools
   call `tool({ description, schema, execute })` with no `name` field
   (because the record key — `current_time`, `count_words` — already
   names the tool). The previous shape required `name: string`, so
   `isFunctionTool` rejected the returned FunctionTool and agent
   registration failed with "Agent 'helper' tool 'current_time' has
   an unrecognized shape" — the very first `pnpm dev` after init.

   Make `name` optional on both ToolConfig and FunctionTool, drop the
   `typeof obj.name === "string"` check from `isFunctionTool`, and
   widen `execute` from `Promise<string> | string` to `unknown |
   Promise<unknown>` since the template's tools naturally return
   objects (`{ now: ... }`). The agents plugin already overrides
   `name` with the record key at index-build time, so the original
   field was functionally redundant in keyed-record contexts. Zod
   error messages fall back to a generic "tool" label when the name
   is omitted.

   Downstream: agents plugin already normalises non-string results
   via `normalizeToolResult`; DatabricksAdapter already `JSON.stringify`s
   non-string returns at line 438-439. No further changes needed.

2. Level 1 docs lied about auto-inherit

   `docs/docs/plugins/agents.md` Level 1 said "auto-inherits every
   registered ToolProvider plugin's tools" but the Configuration
   Reference correctly states `autoInheritTools` defaults to
   `{ file: false, code: false }`. New users following Level 1 hit
   zero tools and confusion.

   Rewrite Level 1 narrative to be truthful: "The agent starts with
   no tools. Tools are opt-in — declare them in frontmatter (Level 2)
   or opt into auto-inherit explicitly with `agents({
   autoInheritTools: { file: true } })`. See 'Auto-inherit posture'
   below."

   Same doc still had stale `fromPlugin(...)` examples in Level 3 +
   the standalone runAgent section. Rewritten to use the
   `tools(plugins) => ...` function form that v5 actually ships,
   matching the rest of the codebase. The runAgent section also gets
   a sentence on the new init contract (eager attachContext + setup,
   shared cache across sub-agents) and the trust-boundary caveat (no
   OBO, no approval gate).

Tests: +3 covering the new shape — name optional, non-string
returns, zod-error fallback label. Updated the "returns false when
name is missing" guard test to assert the new "returns true" since
the record key wins.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Address four agentic-review findings on PR #306. All four were P1 risks
under the MCP path or the markdown agent reload flow.

1. mcp callTool joined undefined text into 'undefined' literal

   McpToolCallResult.content[].text is optional per the spec; the old
   filter only matched `type === "text"`, so an entry like
   { type: "text" } (text undefined) flowed through and
   `Array.join('\n')` emitted the literal string "undefined" — visible
   to the agent and to the user.

   Tighten the filter on both the success and error paths with a type
   predicate that narrows on `typeof c.text === "string"`. The error
   path falls back to a generic "MCP tool call failed" message when no
   text entries survive.

2. mcp sendNotification ignored HTTP error status

   `notifications/initialized` and other fire-and-forget notifications
   completed `connect()` even when the server returned 4xx/5xx — a
   silent connect followed by mysterious tool-call failures. MCP spec
   says notifications don't acknowledge, so we keep the fire-and-forget
   contract but log a warning when the server clearly rejected the
   request. Wrap the fetch in try/catch so network failures surface in
   the logs without breaking the protocol.

3. mcp response body had no size cap

   `response.text()` and `response.json()` both buffer the entire body
   into memory. A misconfigured or malicious MCP server could stream
   unbounded data to exhaust client memory.

   New `readResponseTextCapped()` helper streams the body via the
   ReadableStream API and aborts once cumulative bytes cross
   MCP_RESPONSE_BODY_LIMIT_BYTES (1 MiB — far above any normal
   `initialize` / `tools/list` / `tools/call` JSON-RPC response). Used
   on both the SSE and plain-JSON branches in sendRpc.

4. agents.reload() closed the live mcpClient mid-stream

   Tool dispatch reads `this.mcpClient` at call time, not via capture.
   The old `reload()` called `await this.mcpClient.close()` and
   nulled the field; any in-flight stream that resolved the field
   afterwards crashed with "MCP client is closed" mid-conversation.

   Drop the synchronous close. The client owns only short-lived
   fetch handles, so a reload doesn't need to tear down state — new
   endpoints append to the existing connection map via
   `connectHostedTools`, stale connections from removed configs
   become unreachable through the new agent tool indexes (small
   memory cost, no correctness hazard). The shutdown path still
   closes — that's process teardown, where in-flight streams have
   already been aborted via `abortActiveOperations`.

Tests: 4 new MCP cases (undefined-text filter on success + error
paths, generic error fallback, 1 MiB body cap), 1 reload-doesn't-close
regression, plus a connect-still-succeeds-on-4xx-notification case.
2298 tests passing across the workspace.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reviewer agentic finding #7 flagged ~20 lines of SSE parsing
boilerplate in `template/client/src/pages/agents/AgentChat.tsx` that
every scaffolded app would copy verbatim. The reviewer suggested
extracting it; the cleaner shape is a thin React hook around the
existing `connectSSE` utility in `@databricks/appkit-ui/js`, which
already owns the buffer cap, abort signal composition, frame parsing,
and retry/backoff logic.

useAgentChat (new, in @databricks/appkit-ui/react)

Single-stream chat primitive for agents-plugin chat endpoints. Wraps
connectSSE and adds the stateful glue every chat UI ends up writing:

- `content`: accumulates `response.output_text.delta` deltas
- `events`: chronological log of every parsed event for replay /
  inspection
- `threadId`: captured from the first `appkit.metadata` event and
  automatically forwarded on the next `send()` so the server reuses
  the same thread
- `isStreaming` / `error`
- `send(message)`: POSTs to `/api/agents/chat` (configurable via
  `endpoint`), streams the SSE response, parses each JSON-encoded
  data line, aborts any prior in-flight stream
- `reset()`: drops state and aborts the active stream

The hook stays narrow on purpose: one stream at a time, no multi-turn
message-history ownership (callers compose that). Tool calls, approval
gates, and any non-text event are surfaced through `onEvent` for the
caller to render however it wants.

`maxRetries: 0` is hard-wired into the connectSSE call — chat turns
aren't idempotent and re-sending the payload on transient failure
would either duplicate the user message or depend on server-side
Last-Event-ID resumption (the agents plugin's StreamManager supports
it, but failure-mode auditing is easier with retries explicitly off
and the responsibility on the caller to re-send).

Template migration

`template/client/src/pages/agents/AgentChat.tsx` drops the inline
fetch/decoder/SSE-buffer/parser block (~22 lines) and uses
`useAgentChat({ agent, onEvent })` instead. Tool-call rows are pushed
into the local `messages` array via the `onEvent` callback; the
streaming assistant text mirrors from the hook's `content` field into
the pending assistant row.

Dev-playground's `useAgentStream` is left in place for now — it
carries inspector + action-dispatcher wiring that doesn't fit on this
hook. Migrating it onto `useAgentChat` is a follow-up that also covers
the related Agentic finding #21 (duplicate SSE parsing across consumers).

Tests: 12 new cases on useAgentChat covering posting payload shape,
custom endpoint, delta accumulation, threadId capture + reuse,
onEvent dispatch (including handlers that throw), malformed payload
skipping, isStreaming lifecycle, reset(), send-while-streaming abort,
and onError surfacing. 2310 tests passing across the workspace.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…nfo fetch

Reviewer caught the redundancy: the agents plugin already exposes
`{ agents, defaultAgent }` via `clientConfig()` at startup, inlined
into the boot HTML via `<script id="__appkit__">`. The template was
doing an extra `GET /api/agents/info` round-trip on mount to fetch
the same data that's already sitting in `window.__APPKIT_CONFIG__`
and readable via `usePluginClientConfig`.

Replace the useEffect + fetch + error-handling block with a
synchronous `usePluginClientConfig<AgentsClientConfig>('agents')`.
Net savings: -20 lines of effect/error/loading-state plumbing, one
fewer HTTP request on page load, no flicker between empty-then-
populated agent buttons because the data is available on first render.

`selectedAgent` is now initialised directly from
`defaultAgent ?? agents[0] ?? null` in `useState`, so there's no
intermediate "loading agents…" state. Input placeholder updates to
"No agents registered" when the registry is empty (which only happens
if the agents plugin loaded without any agents — config bug, worth
showing).

The `/api/agents/info` route is left in place on the server — it's
still useful for programmatic callers and debugging — just no longer
consumed by the template.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reviewer agentic finding #21: dev-playground reimplemented the same
SSE parsing the template did, creating two divergent implementations
of one concern. The new `useAgentChat` hook in @databricks/appkit-ui
already owns the fetch + frame parsing + state plumbing; the
playground hook is now a thin wrapper that adds the two playground-
specific concerns and re-exports the legacy API surface.

The wrapper keeps two layers on top of the shared hook:

  1. `contextPrefix` on send() — the Smart Dashboard injects active
     filter / highlight state into the user message so the agent sees
     the UI state on every turn. Composed in this wrapper rather than
     polluting the shared hook surface.
  2. Stream-inspector wiring — every send opens a `StreamRecord` via
     `beginStreamRun` and forwards every event to `recordStreamEvent`
     so the inspector drawer can render the timeline. The inspector
     is a dev-playground feature, not a shared concern.

API stays drop-in compatible: `agentName` (not `agent`), `isLoading`
(not `isStreaming`), `send(message, { contextPrefix })`. `SSEEvent` is
re-exported as a type alias for `AgentChatEvent` so
`use-stream-inspector.ts` and other consumers don't need to be
touched. `routes/smart-dashboard.route.tsx`, `agent-sidebar.tsx`, and
`quick-actions-bar.tsx` work unchanged.

Error handling preserves the prior UX: fetch-level failures now
project from `useAgentChat.error` into the streamed `content` as
`"Error: ..."`, matching the dashboard's assistant-message rendering
of the previous hook's `setContent('Error: ...')` path.

Net: 159 -> ~100 lines on the playground hook, zero duplicated SSE
parsing across consumers, single source of truth in
`@databricks/appkit-ui/react`. Workspace tests: 2310 pass unchanged.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reviewer comment 3200867453: CHANGELOG.md shouldn't be touched by
feature branches. The Stage 2 publish in the secure release repo runs
release-it (per `.release-it.json`) and applies the changelog,
version bump, and tag on its own. A merged PR that hand-edits the
file just makes life harder for the bot.

The two issues called out in the comment:

  1. An "Unreleased" entry labelled "Breaking change" was added for
     the markdown agent folder layout. The agents plugin hasn't
     shipped yet — there's no released contract to break, so this
     entry is misleading and the breaking-change marker is wrong.
  2. The diff also re-inserted historical release entries (0.25.1,
     0.25.0) inside the file, which release-it would have to
     reconcile.

This commit reverts CHANGELOG.md to whatever sits on main. The
pre-existing "many `# Changelog` headers" pile-up the reviewer
laughed about is a separate generation bug on the release pipeline,
not in scope here.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…ate timeouts, forward signals

Six P2 findings from the agentic review on agent/v2/6-apps-docs.
Batched as one commit because the rename in finding 15 overlaps with
the annotation migration in finding 12 on every plugin file.

Finding 15 — align tool() and defineTool() callback names.

  tool({ execute }) is the public agent-author surface; defineTool({
  handler }) is the plugin-author surface. A reviewer pointed out that
  switching between them forced re-learning the same field. They now
  both spell it execute. ToolEntry.execute carries the same signature
  (args, signal?) => unknown | Promise<unknown>.

  Mechanical rename across the four core plugins (analytics, lakebase,
  files, genie) and all unit tests. HTTP RouteEntry.handler is
  unrelated and was left alone.

Findings 12 + 13 — migrate plugins to the effect enum, drop stale
destructive: true references.

  ToolAnnotations.effect ("read" | "write" | "update" | "destructive")
  has been the preferred field for some time; the legacy readOnly and
  destructive booleans are deprecated. The plugins were still
  authoring the old fields:

    analytics.query        readOnly: true        -> effect: "read"
    files.list/.read/...   readOnly: true        -> effect: "read"
    files.upload/.delete   destructive: true     -> effect: "destructive"
    genie.getConversation  readOnly: true        -> effect: "read"
    genie.sendMessage      (unset)               -> effect: "read"
    lakebase.query         readOnly + destructive (redundant pair)
                                                 -> effect:
                                                    readOnly ? "read"
                                                             : "destructive"

  The approval gate already honours both forms (see requiresApproval
  in agents.ts) so behaviour is unchanged. The redundancy lakebase
  had — setting destructive: !readOnly alongside the readOnly bool —
  is gone.

  Stale comments referring to "tools annotated destructive: true" in
  shared/src/agent.ts, core/agent/types.ts, and
  docs/docs/plugins/agents.md now lead with effect and document the
  legacy boolean as a backward-compat path.

Finding 11 — validate approval.timeoutMs.

  cfg.timeoutMs from user config was used verbatim. A misconfiguration
  (0, negative, NaN, Infinity, or any value below ~1s) made every
  mutating tool call auto-deny instantly because the gate's wait
  resolved before any UI could render. The new resolvedApprovalPolicy
  getter:

    - clamps to a 1 000 ms floor (anything below it falls back to the
      60 000 ms default)
    - rejects non-finite values (NaN, ±Infinity) by the same path
    - logs a single warning so the misconfig is visible
    - memoises the result so the warning never repeats across the
      thousands of stream lifecycle reads that this getter sees

  New test file: approval-config.test.ts (8 tests covering the floor,
  NaN, Infinity, negative, default fallback, valid pass-through, and
  the memo invariant).

Finding 14 — document AgentDefinition.name precedence.

  The field's old doc string ("Filled in from the enclosing key when
  used in agents: { foo: def }") was misleading: the registry key
  always wins, the explicit name is ignored when registered via the
  agents: {...} record or via the markdown folder layout. Expanded
  the JSDoc to say so plainly, list the legitimate use cases
  (standalone runAgent and consistent fallback labelling), and warn
  that setting name to a value that differs from the registry key is
  harmless but confusing.

Finding 8 — forward AbortSignal in files and genie tool handlers.

  Analytics and lakebase already forward the signal arg to their
  underlying I/O. Files and genie ignored it entirely.

  Genie now plumbs signal end-to-end. sendMessage takes options.signal
  and forwards to streamSendMessage, which already accepts signal at
  the connector boundary. getConversation takes a signal parameter
  and throws on entry if it's already aborted (the connector's
  pagination loop is signal-agnostic today; this catches the common
  case where the tool was queued after the user cancelled, without
  churning the deeper connector surface).

  Files honours signal?.throwIfAborted() at every handler's entry.
  The VolumeAPI and the four volume connectors below it don't accept
  signal, so deeper mid-call cancellation is still out of scope — but
  a queued tool call that's already cancelled by the time dispatch
  reaches the handler now unwinds cleanly instead of running to
  completion and emitting a wasted result.

Tests: 2310 -> 2318 (added the 8-case approval-config suite). Build
+ docs:build + knip + check:fix clean. Typedoc regenerated to pick
up the rename and the new JSDoc.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…required + cleanup

Two P3 findings plus a consistency sweep across the new agent surface.

Finding 18 — surface partial failures from MCP connectAll.

  Used to swallow per-endpoint failures into log.error and return void,
  so callers couldn't distinguish "all connected" from "one of three
  failed". The structured return McpConnectAllResult exposes
  { connected: string[], failed: Array<{ name, error }> }; the agents
  plugin reads it and emits one aggregate startup warning naming the
  failed endpoints alongside the per-endpoint diagnostics. Boot still
  succeeds — one broken hosted-tool endpoint shouldn't take down every
  agent — but operators now see "agent X registered without N hosted
  tool endpoints" instead of a silent gap.

  McpConnectAllResult is re-exported from beta.ts so user code that
  drives connectAll directly can type its result.

  4 new unit tests cover the four shapes: all-success, partial,
  all-failed, and a non-Error rejection wrapping path so the public
  contract always hands callers a real Error.

Finding 19 — tool() description is now required.

  description was optional and silently fell back to config.name when
  omitted. A tool named "get_weather" would then show up to the LLM
  with description="get_weather" — a cryptic identifier that gave the
  model no signal about expected use. Switched description: string at
  the type level and dropped the ?? config.name fallback, so the
  failure mode shifts from "agent silently behaves oddly in
  production" to "tsc rejects the call at authoring time". Updated
  the test that asserted the old fallback and added a description to
  the three other tests that omitted it. No production callsite was
  affected (template, dev-playground, and docs already passed a
  description).

Consistency sweep across the new agent surface.

  After the handler→execute rename and the toolkits:→tools: unification
  in earlier PRs in this stack, several JSDoc strings still referred
  to dead concepts and one runtime path had drifted from a sibling.
  Cleanups in this commit:

  - tool-approval-gate.ts header still said "destructive: true agent
    tool calls"; rewritten to lead with the effect enum and document
    the legacy boolean as a backward-compat path, matching the
    requiresApproval gate and the docs.
  - AutoInheritToolsConfig and ToolEntry.autoInheritable JSDoc
    referred to "markdown toolkits:"; updated to "plugin:NAME entries
    in the unified tools: list" so the prose tracks the parser.
  - LoadContext.plugins and AgentsPlugin.pluginProviderIndex used the
    same dead toolkits: phrasing — same fix.
  - lakebase.query previously ignored signal, even though the agentic
    review claimed it forwarded. Aligned with the files plugin's
    pattern: signal?.throwIfAborted() at handler entry. Deeper
    plumbing into the pg connection still wants its own pass, but the
    common cancel-before-dispatch case now unwinds cleanly instead of
    running to completion against Postgres.
  - core/agent/load-agents.ts had a module-private `type ToolEntry`
    that shadowed the exported public-API ToolEntry from
    tools/define-tool.ts — different layers, same name. Renamed the
    local type to FrontmatterToolEntry; ToolEntry now refers to the
    plugin-author API everywhere it appears.
  - Stale biome-ignore suppression on approval-config.test.ts removed
    after the underlying noExplicitAny rule stopped firing on that
    pattern.

Tests: 2322 -> 2322 (replaced the description-fallback test with the
new required-description shape, added 4 connectAll tests). Build +
docs:build + knip + typecheck + check:fix clean. Typedoc regenerated;
new McpConnectAllResult page published.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Closes the four remaining reviewer items on the agents stack and the
CI gap that has been red since the helper agent was rebuilt.

CI — playground integration tests.

  The playground's `helper` agent has no explicit `model:` and falls
  back to DatabricksAdapter.fromModelServing, which reads
  DATABRICKS_SERVING_ENDPOINT_NAME at setup() time. The integration
  test job never sets that env, so the agents plugin throws while
  resolving the adapter and the webServer health check times out
  before any test runs. APPKIT_E2E_TEST swaps in the mock
  WorkspaceClient at request time so no real serving traffic happens —
  the only thing the job needed was a non-empty endpoint name.
  Added DATABRICKS_SERVING_ENDPOINT_NAME=e2e-mock alongside the
  existing DATABRICKS_WAREHOUSE_ID / DATABRICKS_WORKSPACE_ID mocks.

Reviewer comment 3201574964 — agents serving-endpoint manifest copy.

  The optional `serving_endpoint` resource in the agents plugin's
  manifest leaked internal API names (`fromModelServing`) into the
  scaffold-time description and used the awkward alias "Model Serving
  (agents)". A user running `apps init` saw both. Rewrote the entry:

    - alias: "Model Serving (agents)" → "Default LLM for agents"
    - description leads with intent ("the LLM your agents will call
      when they don't pin their own model"), spells out when it's
      required (every agent that omits `model:` or `endpoint:`), names
      the runtime env var, notes the shared-with-serving-plugin
      ergonomics, and warns up front that only streaming-capable
      endpoints work.
    - field description matches.

  `template/appkit.plugins.json` regenerated through `sync:template`.

Reviewer comment 3207721724 — streaming-endpoint requirement in docs.

  Custom MLflow / sklearn endpoints don't implement the chat-
  completions streaming protocol; pointing an agent at one fails with
  "Response body is null — streaming not supported" on the first turn.
  This was nowhere in the agents plugin docs. Added a Requirements
  section near the top of docs/docs/plugins/agents.md with an `:::info`
  callout that names which endpoint families work (Foundation Model
  APIs, external chat models), which don't (typical sklearn/MLflow
  pyfunc), and points users at the `serving` plugin's `/invoke` route
  + `useServingInvoke` hook for the non-streaming path.

Reviewer comment 3201691529 — drop the `assistant` agent.

  Both the scaffolded template and the dev-playground shipped a
  markdown `assistant` agent that did nothing the code-defined
  `helper` couldn't (helper's prompt already says "For anything else,
  answer briefly in plain text"). It added a second tab in the chat
  UI that scaffolded users had to read past, with no functional
  payoff. Dropped:

    - template/config/agents/assistant/agent.md
    - apps/dev-playground/config/agents/assistant/agent.md

  Followups in the same change:

    - The playground's `assistant` was the only markdown agent with
      `default: true`; without it the agents plugin would fall back to
      first-registered, which alphabetically lands on the `anomaly`
      ephemeral. Pinned `defaultAgent: "helper"` on the playground's
      agents() config so the bare /agent route lands on a
      conversational agent.
    - Updated `template/server/agents/helper.ts` and
      `template/client/src/pages/agents/AgentChat.tsx` so the prose
      docs talk about a single starter agent and a drop-in markdown
      pattern, rather than referencing the now-gone assistant.

  Markdown-form coverage is still demonstrated by every other
  markdown agent in the playground (`anomaly`, `autocomplete`,
  `insights`, `query`) and by the docs Level 1 / Level 2 walkthrough.
  Scaffolded users get a working tool-calling demo immediately
  instead of two tabs that look the same.

Tests: 2322 pass unchanged. Typecheck + build + docs:build + knip +
check:fix clean. Template manifest regenerated via sync:template.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Earlier in this stack I dropped `assistant` from the template to
address reviewer comment 3201691529. The reviewer's concern was real
(assistant did nothing helper couldn't), but the fix went too far —
the template lost its pedagogical contrast between the imperative
code form and the declarative markdown form, which is one of the
agents plugin's selling points.

Replaced assistant with `planner`: a markdown agent that's genuinely
distinct from helper, not a near-duplicate.

  - **helper** (code, server/agents/helper.ts) uses `tool()` to expose
    `current_time` and `count_words`. Demonstrates the imperative form
    for agents whose value is in what they *do*.
  - **planner** (markdown, config/agents/planner/agent.md) is pure
    prose with no tools. Its system prompt steers behaviour: restate
    the goal, surface open questions, propose a small ordered plan,
    call out risks. Demonstrates the declarative form for agents whose
    value is in *how they think*.

The contrast lines up cleanly: tools-vs-prose, executes-vs-reasons,
compiled-vs-edit-and-go. Switching tabs in the chat UI now shows two
genuinely different conversations, which is what the reviewer's
"why is there a second tab" question was really probing.

The planner prompt avoids two failure modes the prior `assistant`
hit:

  - It's not a pale-imitation generalist. The prompt is specific to a
    planning role (restate, ask before assuming, ordered output) so
    the agent has visible behaviour the user can recognise.
  - It explicitly tells the model it has no tools and points at
    helper when the user wants an action — so a scaffolded user
    naturally discovers the second tab through the conversation
    itself.

Wiring follow-ups:

  - planner carries `default: true` so the template's default tab is
    conversation-first, matching the old behaviour.
  - helper's docstring and AgentChat.tsx prose now describe the dual
    demo and the file paths behind each tab.
  - The empty-state hint suggests one prompt per tab so the contrast
    is the first thing a new user experiences.

Playground intentionally untouched — it already has four markdown
agents (anomaly, autocomplete, insights, query) demonstrating the
form, so assistant was redundant there and stays removed.

Tests: 2322 pass. Typecheck clean. No code paths beyond the new
markdown file and the prose updates around it.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
…gent

The previous template change exposed `planner` (markdown) and `helper`
(code) as two top-level chat tabs. The dual-form story was right, but
the UX was confusing — switching tabs felt like switching apps even
though both agents handled the same conversations.

Restructured around the agents plugin's sub-agent feature instead:

  - `planner` (markdown, `config/agents/planner/agent.md`) is now the
    only user-facing chat. Its frontmatter declares
    `agents: [helper]`, so the agents plugin exposes `helper` to the
    planner prompt as an `agent-helper` tool. Planner's body tells the
    model to delegate to `agent-helper` for computational actions
    (clock, word count) and stay in prose for planning conversations.
  - `helper` (code, `server/agents/helper.ts`) stays registered at the
    top level so the markdown loader's two-pass resolver can find it
    by name when wiring planner's sub-agents. The chat UI just doesn't
    surface it — the template renders one chat against the default
    agent and drops the tab picker entirely.
  - `AgentChat.tsx` lost the agent-tab list; it talks straight to the
    default agent (planner) and renders sub-agent calls inline as
    `tool · agent-helper` rows, same shape it already uses for
    function-tool calls. Empty-state hint nudges the user toward both
    planning prompts ("help me plan a feature") and the delegation
    path ("what time is it?" — watch planner call agent-helper).
  - `helper.ts` docstring rewritten so a scaffolded reader sees the
    coordinator-plus-worker pattern, why each side lives in its own
    file (markdown for prose, code for typed tools), and that the
    template is meant to be a copy-paste starting point for the
    pattern.

No plugin/runtime changes. Both agents stay top-level registered;
visibility is a UI-side choice, not a plugin feature. If a user
later wants two chats they can re-add a tab list reading `agents[]`
from `usePluginClientConfig('agents')` — the data is still there.

Tests: 2322 pass. Typecheck/build clean.
Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
The landing page had a special hero treatment for `/smart-dashboard`
plus an opacity-muted copy of the same card in its category grid
labelled "Featured above". With the agents stack expanding, this was
adding noise — every category now has agents demos and visually
elevating one of them implied a hierarchy that doesn't really exist.

Stripped the special-case:

  - Removed the FeaturedCard component and the `featured` prop on
    DemoCard. The grid is now uniform.
  - Removed the "Featured above" muted treatment in DemoCard. Every
    card renders identically; the user picks by category and label.
  - The landing-page comment block now describes the simpler shape
    (hero + grouped category grid) so future readers don't go hunting
    for a featured slot.

Smoke test updated.

  The smoke test asserted on the old hero subtitle copy ("Explore the
  capabilities of the AppKit") which the hero hasn't shipped for a
  while — it's been "A living catalog of what AppKit can do — ..."
  through this stack. The test was passing in CI before because the
  webServer health check timed out earlier (the
  DATABRICKS_SERVING_ENDPOINT_NAME env miss masked it), and now that
  the env is set, the smoke step actually runs and trips on the stale
  assertion. Updated to match a stable substring of the current copy
  with `exact: false`, so future prose tweaks don't keep breaking the
  smoke test — the goal here is "the page rendered", not "the exact
  marketing line".

No other behaviour change; the catalog and nav data in @/lib/nav.ts
are untouched. /smart-dashboard is still in its category grid as a
normal demo card.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
The collapsible header in saved-views-panel.tsx wrapped the title row,
the spinner/refresh action, and the chevron in one outer `<button>`,
with the refresh action implemented as a nested `<button>` inside it.
That's invalid HTML — `<button>` cannot contain `<button>` — and
React's dev build was flagging it as a hydration warning on every
render of the Smart Dashboard:

  validateDOMNesting: <button> cannot contain a nested <button>.
  ...
  saved-views-panel.tsx:91
  saved-views-panel.tsx:72

The fix is purely structural; no behaviour or visual change:

  - The header row is now a flex `<div>` rather than a flex `<button>`,
    so its children can be sibling buttons instead of nested ones.
  - The title cluster (icon + "Saved views" + count) is one button that
    toggles open/close.
  - The chevron is its own toggle button (same `setOpen` flip); making
    it a real button gives keyboard users a second focusable target and
    bakes the `aria-expanded` state onto something that announces itself.
  - Refresh is a third sibling button, no longer wrapped in the toggle.
    Drops the `e.stopPropagation` it needed when it was a child of the
    outer toggle button.

Added `aria-expanded` and proper `aria-label`s on both toggle buttons
since the chevron is now keyboard-reachable.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Reported by the user testing the template's helper agent against
gemini-flash-lite. After the first tool call (planner -> agent-helper
sub-agent), Vertex returned:

  INVALID_ARGUMENT: function call \`agent-helper\` in the 6. content
  block is missing a \`thought_signature\`

Vertex AI's OpenAI-compatible surface for Gemini 2.x models (Flash,
Flash-Lite, Pro) attaches an opaque \`thought_signature\` blob on every
function call the model emits. The next assistant-with-tool_calls
message in the conversation history must echo it back verbatim or
Vertex rejects with that 400. The DatabricksAdapter was building
OpenAIToolCall objects from streamed deltas without preserving the
field, so the follow-up request from inside the same turn (the one
that pushes tool results back to the model) failed before we ever
got a final assistant turn.

The fix has two halves — one for the live single-turn case (the
reported bug) and one for resumed threads loaded from ThreadStore.

Single-turn (the reported bug):

  - Extended OpenAIToolCall + DeltaToolCall with thought_signature.
  - The streaming accumulator now captures the field from either the
    top level of the tool_call delta or nested inside \`function\` —
    Vertex's proxy has been observed to use both locations.
  - When constructing final OpenAIToolCall objects from the
    accumulator, the signature is attached at the top level (matches
    Vertex's OpenAI-compat docs).
  - The existing messages.push(...) flow already preserves the full
    tool_calls array verbatim, so once the field is on the object it
    reaches the wire on the next streamCompletion call within the same
    run() invocation.

Resumed threads:

  - Extended the shared ToolCall type with an optional
    thoughtSignature so it can survive ThreadStore round-trips.
  - buildMessages now reads tc.thoughtSignature off the persisted
    ToolCall and emits it as thought_signature on the wire shape so
    the first request of a resumed turn doesn't fail Vertex's
    signature check before any tool call even fires.

Non-Gemini endpoints (Claude on Databricks, OpenAI-compat external
models, Llama, etc.) leave the field undefined; the spread on the
emit-side omits the key entirely, so we don't pass a meaningless
field to models with stricter shape validators.

Tests (4 new in a Vertex/Gemini describe block, total 2322 -> 2326):

  - top-level thought_signature captured from a streamed delta and
    echoed verbatim on the second request
  - nested function.thought_signature captured from a streamed delta
    and canonicalised to top level on the second request
  - non-Gemini path (no signature in the delta) does NOT emit a
    thought_signature key — defends against accidentally inventing
    one when talking to Claude/OpenAI
  - resumed thread: persisted Message.toolCalls[*].thoughtSignature
    flows through buildMessages to the wire's thought_signature on
    the very first request of the new turn

Plain typecheck + workspace test + build clean. No public-API
breakage: thoughtSignature is optional everywhere it appears.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
User reported tsc errors in their scaffolded app on every tool({...})
callsite:

  ZodObject<{}, $strip> is missing the following properties from type
  ZodType: toJSONSchema, with, exactOptional, apply

Root cause: zod version skew between AppKit and the template.
packages/appkit pinned zod@4.3.6; template/package.json pinned
zod@4.1.13. The methods AppKit's published `.d.ts` advertises on
`ZodType` (`toJSONSchema`, `with`, `exactOptional`, `apply`) were
added in 4.2 — the user's scaffolded `z.object({})` was a different
structural type than the one AppKit's `tool()` constraint expects,
even though both are major version 4.

Two changes:

1. Template now pins zod@4.3.6, matching AppKit. Scaffolded apps see
   exactly the ZodType AppKit's published types reference, so
   `tool({ schema: z.object(...) })` type-checks without surprises.

2. tools/check-template-deps.ts now cross-checks an allowlist of
   version-sensitive runtime deps (`zod` for now) against
   packages/appkit/package.json. Future bumps of zod in AppKit
   without a matching bump in the template fail this check at CI
   time with a clear message ("Version skew on zod: template pins X,
   packages/appkit pins Y") instead of surfacing as a confusing
   ZodObject mismatch in user-land tsc output.

Header doc on the check tool now explains both guards: the
supply-chain pin check that was already there, and the new
cross-version guard for deps whose types leak through AppKit's public
API surface. The list is intentionally small (just zod today); add to
SYNCED_DEPS as other public-typed deps appear.

Workspace tests still 2326 pass. Typecheck clean across all packages
and the docs site. No lockfile churn; pnpm resolves zod@4.3.6 the
same way for both packages now.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Copy link
Copy Markdown
Member

@pkosiec pkosiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM overall, I'm about to test the app and will gave a final ✅ if everything is alright 👍

Comment thread .github/workflows/ci.yml Outdated
Comment thread .github/workflows/ci.yml
Live capture against `gemini-3.1-flash-lite-preview` showed Vertex's
OpenAI-compat proxy emits the signature only as `thoughtSignature`
(camelCase) at the top level of the tool_call delta, and accepts the
same spelling back on outbound. The dual-spelling capture/emit added in
the previous fix was speculative defense without runtime evidence.

Simplifies:
- `OpenAIToolCall` and `DeltaToolCall` carry only `thoughtSignature`
- Inbound capture is `tc.thoughtSignature` (no `??` fallback)
- Outbound emit on fresh tool_calls and on `buildMessages` resume is
  single-key
- Test helper unified to `sig`, snake_case-only test removed, negative
  test asserts both spellings stay absent as a regression guard
- Drops two pre-existing dead `biome-ignore noExplicitAny` suppressions
  in the test file (rule is disabled in tests via overrides)

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Copy link
Copy Markdown
Member

@pkosiec pkosiec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a minor comment

Comment thread packages/appkit/src/plugins/agents/manifest.json Outdated
The agents plugin's default-LLM resource description had grown into a
paragraph covering when it's needed, the env var, the cross-plugin
sharing, and which endpoints qualify. That copy is rendered by the
plugin CLI, where it pushes everything else off-screen and reads as
docs-in-a-tooltip.

Match the sibling plugins' style (serving, analytics, lakebase): one
short clause for the resource, a noun phrase for the field. The
edge-case detail belongs in docs/plugins/agents.md, not in the CLI.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Fixes the actionable findings from the second-pass agentic review on
PR #306, all small and confidence-validated against the actual code.

- agents.ts: untrackStream fallback aligned with trackStream (?? 0)
  so the asymmetry stops masking a drift the comment claims cannot
  happen.
- agents.ts: dispatchToolCall guards function-tool args before
  casting to Record<string, unknown>; a non-object value now throws
  a clear error instead of silently passing a wrong shape to the
  tool.
- mcp/client.ts: new coerceToolParameters() validates the MCP
  server's reported inputSchema before exposing it as an
  AgentToolDefinition. Malformed schemas fall back to an empty
  parameters object so the tool still registers but cannot accept
  arguments — defense against a misbehaving (or hostile) MCP server.
- tool.ts: drop redundant `as unknown as Record<string, unknown>`
  double cast; toToolJSONSchema already returns that type.
- template AgentChat.tsx: scroll effect now depends on [messages,
  content] so the viewport actually follows new replies instead of
  freezing at the top.
- dispatch-tool-call.test.ts: new test for runSubAgent depth-limit
  rejection — exercises the maxSubAgentDepth backstop that the
  cycle-detection comment relies on.

Tool-call budget exhaustion was flagged as untested but the existing
"rejects + aborts when the budget is exhausted" test already covers
it; no new test needed there.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Express's stock 100kb default rejected the dev-playground smart-dashboard
"save view" upload with a 413 PayloadTooLargeError before the route
handler ever ran. The request body was ~105KB — a base64-encoded
screenshot of the dashboard, barely over the cap. Same shape will hit
any other AppKit consumer that posts modest base64 uploads, light file
metadata, or just large agent chat payloads.

Add `bodyLimit?: string` to ServerConfig and default to 1mb. Covers the
common cases without opening a wide DoS surface; consumers that legit-
imately accept larger uploads can opt in explicitly via
`server({ bodyLimit: "10mb" })`.

Runtime evidence (pre-fix vs post-fix logs on the same flow):
- Pre-fix: content-length 105139, body-parser 413'd before the handler
  was reached.
- Post-fix: content-length 104791, handler reached with bodyKeys
  [name, filters, highlights, pngBase64].

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Mirrors the tightening in 91c9166, which shortened the agents plugin's
default-LLM resource description in the canonical manifest but didn't
re-run the template sync. The template's appkit.plugins.json is the
copy users see at `npx @databricks/appkit init` time, so it needs to
match what the CLI renders for the installed package.

Signed-off-by: MarioCadenas <MarioCadenas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants