[Frontend Feat] Enable running evaluations on evaluators by ardaerzin · Pull Request #4237 · Agenta-AI/agenta

ardaerzin · 2026-04-28T13:02:12Z

Summary

can now select an evaluator workflow and run an evaluation on it.

Testing

Verified locally

ran an evaluation on an evaluator

Added or updated tests

QA follow-up

need to test more evaluations on evaluators

Checklist

I have included a video or screen recording for UI changes, or marked Demo as N/A
Relevant tests pass locally
Relevant linting and formatting pass locally
I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

…0 handler

…-evaluations

…tent component

…nd unify workflow store for apps and evaluators - Add `transformAgDataForTestset` to unwrap nested subject data from evaluator annotation spans, stripping internal bookkeeping keys and reshaping to `{inputs, outputs, score}` for testset compatibility - Detect evaluator annotation spans via `trace_type === "annotation"` and `is_evaluator` flag or evaluator references - Apply transformation to both live and original data for consistent

…cture - Remove `transformAgDataForTestset` and evaluator annotation span detection logic that unwrapped nested subject data - Replace `leafTracePathsAtom` and `traceDataPathsAtom` with `canonicalTracePathsAtom` that returns only `data.inputs` and `data.outputs` when present - Update auto-mapping to seed from canonical envelope paths instead of all leaf paths - Preserve evaluator annotation spans' nested structure in testcases to maintain replay

…chat handlers - Expose inputs both flattened at root (e.g., `{{country}}`) and nested under `inputs` key (e.g., `{{$.inputs.country}}`) for consistent JSONPath resolution across completion, chat, and evaluator prompts - Clear `input_keys` after validation to prevent `PromptTemplate.format()` from rejecting the injected envelope meta-key as extra - Exclude `messages` from nested view in chat handler since it is handled separately

…yground - Wrap `PlaygroundConfigSection` in `PlaygroundNodeTokenPathProvider` to scope suggestions to the node's chain context - Split token path provider into global (inputs-only) and scoped (adds outputs when upstream exists) variants - Update `useInputsSource` to accept `scopedEntityId` and read entity-specific input ports when provided - Update `useOutputsSource` to accept `upstreamEntityId` and read upstream output ports directly

vercel · 2026-04-28T13:02:17Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agenta-documentation	Ready	Preview, Comment	May 4, 2026 3:21pm

…-evaluations

…traction to prevent ReDoS - Replace regex-based extraction with prefix/suffix checks and string slicing - Prevents polynomial backtracking on adversarial inputs (addresses CodeQL js/redos) - Handle Jinja whitespace control (`{%-` / `-%}`) via explicit slice trimming - Maintain identical behavior for `{{...}}`, `{%...%}`, and `{#...#}` wrappers

Replace legacy fallback mode with first-class flat token support. Both `{{country}}` and `{{$.inputs.country}}` now use the same suggestion pipeline, with flat tokens implicitly drilling into `$.inputs.*` to surface testcase columns and port schemas.

Wrap type pills in tooltips to show full labels on hover and apply max-width truncation to prevent long evaluator names (e.g., `__main__.MyEvaluator`) from breaking column layouts.

Add `is_managed` flag checks to exclude user-deployed Python evaluators (e.g., `user:custom:__main__.MyEval:latest`) from the Evaluators page. These SDK-registered evaluators aren't first-class catalog entries and should only appear in workflow execution contexts. Apply filter at both workflow-id cache level and revision query level for defense-in-depth.

… casing

…ndered nested config

Implement multi-tier language detection for CodeEditorControl: explicit schema override → sibling field lookup via `languageFromField` → heuristic on `runtime` field → python fallback. Add `useOptionalDrillIn` hook to safely access drill-in context from shared controls that may render outside the provider tree.

github-actions · 2026-04-30T08:46:01Z

Railway Preview Environment


Preview URL	https://gateway-production-4a40.up.railway.app/w
Image tag	`pr-4237-4af29ca`
Status	Failed
Railway logs	Open logs
Logs	View workflow run
Updated at 2026-05-04T15:37:37.851Z

mmabrouk · 2026-04-30T11:38:51Z

Thanks @ardaerzin

I think the filtering view could be improved. The number of tabs is very large and always overflow.

My comments:

Per default we should show only the prompts (completion and chat) and not the evaluators.
-- One option is to simplify it by adding a switch "include/show evaluators" that is per default switched off. So the default is that the user only see their prompts (current state), and then if they switch this on they see all apps including evaluators. We could have a tooltip (you can evaluate your evaluators and compare them to human feedback for instance).
-- Another option is to have a drop down for filtering

I have created some designs that might be helpful

CleanShot.2026-04-30.at.13.37.20.mp4

CleanShot.2026-04-30.at.13.38.13.mp4

Data:
Filtering applicatoins to evaluate.zip

ardaerzin · 2026-04-30T17:35:42Z

One option is to simplify it by adding a switch "include/show evaluators" that is per default switched off. So the default is that the user only see their prompts (current state), and then if they switch this on they see all apps including evaluators. We could have a tooltip (you can evaluate your evaluators and compare them to human feedback for instance)

@mmabrouk here's the approach I took

mmabrouk · 2026-04-30T17:37:57Z

Thanks @ardaerzin looks fine to me (although redundant imo).
However, I am not sure where "All types" went in the dropdown. When you opened the Drop down the "All Types option" was not there.

ardaerzin · 2026-04-30T17:39:47Z

@mmabrouk there's a remove option after a selection is made

mmabrouk · 2026-04-30T17:41:01Z

Why not have it simply as an option like in the design? Feels like less obvious. Sorry too nitpicky

Replace `null` placeholder + clear button pattern with an explicit "All types" first option in the workflow type filter Select. Simplifies state handling by eliminating nullable semantics and ensures the reset path is always visible as a clickable option rather than requiring hover to reveal a clear button.

ardaerzin · 2026-04-30T17:49:08Z

Why not have it simply as an option like in the design? Feels like less obvious. Sorry too nitpicky

@mmabrouk fair enough. pushed this change

mmabrouk · 2026-04-30T17:56:39Z

Can I be nitpiky again and say that we should remove the padding for the subitems otherwise it looks weird

…orkflow type filter Remove extra left indent from Ant Design's grouped select options to ensure visual alignment with the ungrouped "All types" entry. The default `.ant-select-item-option-grouped` padding creates unintended hierarchy when only one or two groups are visible.

coderabbitai · 2026-05-04T15:18:55Z

📝 Walkthrough

Summary by CodeRabbit

New Features
- Added scope-aware token path typeahead suggestions for template variables with envelope slot completion.
- Introduced template variable validation with error tooltips and correction suggestions.
- Enhanced testcase editor with flat/nested view toggle, property addition, and column delete actions.
- Added workflow type filtering to application management.
Bug Fixes
- Improved evaluator filtering to display only platform-managed evaluators.
- Enhanced code editor language detection based on runtime hints.
- Fixed nested span handling in trace data display.
UI/UX Updates
- Renamed "Variant" to "Revision" in evaluation modals.
- Refactored evaluator table columns with unified type tag rendering.
- Improved JSON default value computation for testcase editor fields.

Walkthrough

This pull request introduces a comprehensive token path suggestion system for the playground editor, refactors evaluator component display using shared WorkflowTypeTag components, enhances workflow type resolution and template variable validation, updates the evaluation modal flow to use "Revision" terminology and new workflow selection logic, improves testcase editor capabilities with nested field support, and adjusts SDK resolver context injection for prompt templating. The changes span multiple layers: data shape updates (atoms, types), core logic implementations (resolvers, filters, sources), UI component refactoring (removing custom cells, wiring new providers), and supporting utilities (template validation, port helpers).

Changes

SDK Resolver Context & Evaluator Envelope Handling

Layer / File(s)	Summary
Core Handler Updates `sdk/agenta/sdk/engines/running/handlers.py`	`auto_ai_critique_v0` now injects `inputs` before `outputs` with dual-access (flattened + nested). `completion_v0` and `chat_v0` clear `input_keys` after validation and add `_variables["inputs"]` envelope for consistent template resolution.

Evaluator Component Display Consolidation

Layer / File(s)	Summary
Custom Cell Removal `web/oss/src/components/Evaluators/assets/cells/*`	Deleted `EvaluatorTagsCell`, `EvaluatorTypePill`, `TableDropdownMenu` (and types), and `getColumns` function that previously rendered evaluator table UI.
Column Refactoring `web/oss/src/components/Evaluators/Table/assets/evaluatorColumns.tsx`	`EvaluatorTypeCell` now derives `evaluatorKey` from `workflowKeyAtomFamily` and renders via shared `WorkflowTypeTag`. Automatic evaluator tags switched from `capitalize` to `truncate` styling.
Managed Evaluator Filtering `web/oss/src/components/Evaluators/store/evaluatorsPaginatedStore.ts`	Added `isManagedEvaluator` helper and filtering logic to exclude non-managed evaluators from cache and paginated store queries (server-side filter `is_managed: true`).

Token Path Suggestion System

Layer / File(s)	Summary
Atom State `web/oss/src/components/Playground/PlaygroundTokenPath/atoms.ts`	New atoms: `observedTestcasesAtom` (derives testcase entities from execution rows), `aggregatedParametersSchemaAtom` (unions parameter schemas across nodes).
Chain Context Derivation `web/oss/src/components/Playground/PlaygroundTokenPath/chainContext.ts`	`NodeChainContext` describes allowed envelope slots and upstream entity; `nodeChainContextAtomFamily` resolves node input/output availability based on DAG position.
Envelope Source Hooks `web/oss/src/components/Playground/PlaygroundTokenPath/sources/*`	`useInputsSource`, `useOutputsSource`, `useParametersSource`, `useTestcaseSource` implement suggestion generation per envelope slot; shared `getSubPathsFromSchema`, `collectNextKeysAtPath`, `aggregateObservedKeys` helpers extract/merge observable keys.
Provider Implementation `web/oss/src/components/Playground/PlaygroundTokenPath/index.tsx`, `web/oss/src/components/Playground/PlaygroundTokenPath/types.ts`	`PlaygroundTokenPathProvider` (global, inputs-only), `PlaygroundNodeTokenPathProvider` (scoped to node's chain context), `EnvelopeSource` interface for slot-based suggestion dispatch.
Shell Integration `web/oss/src/components/Playground/OSSPlaygroundShell.tsx`, `web/oss/src/components/Playground/Components/PlaygroundVariantConfig/index.tsx`	Wraps playground UI with `PlaygroundTokenPathProvider` and scopes `PlaygroundNodeTokenPathProvider` to config section for depth-aware JSONPath typeahead.
Editor Token Support `web/packages/agenta-ui/src/Editor/plugins/token/*`	`TokenPathSuggestionsContext` provides consumer hook for path-aware suggestions; `TokenNode` validates expressions and applies styling/tooltips; `TokenTypeaheadPlugin` uses unified `Suggestion` model with path vs flat mode handling; `TokenTooltipPlugin` renders validation tooltips for invalid tokens.
Editor Tests/Docs `web/packages/agenta-ui/src/Editor/index.ts`, `web/packages/agenta-ui/src/Editor/plugins/code/index.tsx`	Exports suggestion provider/hook and types; refactors `InsertInitialCodeBlockPlugin` to compare semantic content (JSON/YAML) before dispatching initialization.

Template Variable Validation & Port Grouping

Layer / File(s)	Summary
Validation Utilities `web/packages/agenta-shared/src/utils/templateVariable.ts`	New `KNOWN_ENVELOPE_SLOTS` constant; `validateTemplateVariable`, `isValidTemplateVariable`, `extractTemplateExpression` check JSONPath/JSON Pointer roots against known envelopes and detect malformed paths.
Port Helpers & Grouping `web/packages/agenta-entities/src/runnable/portHelpers.ts`	`extractLastPathSegment` for display labels; `GroupedTemplateVariable` type and `groupTemplateVariables` function parse/validate/group template placeholders by envelope/key with sub-path aggregation.
Molecule Port Derivation `web/packages/agenta-entities/src/workflow/state/molecule.ts`	Input port generation now uses `groupTemplateVariables` to derive grouped ports with optional schemas from sub-paths; trace-derived ephemeral inputs use path-segment-based naming.
Evaluator Configuration `web/packages/agenta-entities/src/runnable/evaluatorTransforms.ts`	`nestEvaluatorConfiguration` now enforces schema-driven allowlist (drops undeclared/hidden keys), properly routes advanced fields into `advanced_settings`.
Shared Exports `web/packages/agenta-entities/src/runnable/index.ts`, `web/packages/agenta-shared/src/utils/index.ts`	Re-export new template/port validation and grouping utilities for downstream consumption.

Workflow Type Resolution Enhancement

Layer / File(s)	Summary
Type Detection Helpers `web/packages/agenta-entities/src/workflow/state/helpers.ts`	New `resolveEvaluatorWorkflowType` uses revision flags then legacy evaluator-key map; `deriveWorkflowTypeFromRevision` accepts optional `{isEvaluator}` override and routes classification through evaluator resolver when detected.
App Type Color `web/packages/agenta-entities/src/workflow/core/schema.ts`	New `APP_TYPE_PRESET` map and `getAppTypeColor` function for deterministic app workflow type coloring.
Evaluator Utils `web/packages/agenta-entities/src/workflow/state/evaluatorUtils.ts`	`createEvaluatorFromTemplate` removes hidden fields (`x-ag-type: "hidden"`) from final parameters for both JSON-Schema and legacy formats.
Store Re-nesting `web/packages/agenta-entities/src/workflow/state/store.ts`	Evaluator workflows preserve original flat parameter schema (`flatSchemaForReNest`) through draft merges for consistent re-nesting that properly excludes hidden/legacy fields.
Exports `web/packages/agenta-entities/src/workflow/*`	Re-export `deriveWorkflowTypeFromRevision`, `getAppTypeColor`, and related type utilities for UI consumption.

Evaluation Modal Flow Refactoring

Layer / File(s)	Summary
Terminology Update `web/oss/tests/playwright/acceptance/*`	Tab/step navigation changed from "Variant" to "Revision" in auto-evaluation and human-annotation test flows.
Workflow Selection Component `web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx`	New component replaces `SelectAppSection`; manages searchable workflow table with `showEvaluators` toggle, type filter, and `invokableOnly` constraint; emits `onSelectWorkflow(id, {label, isEvaluator})`.
Modal Content & Inner `web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx`, `web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalInner.tsx`	Content now renders `SelectWorkflowSection` (renamed tab to "Revision"), Inner captures `selectedWorkflowMeta` for readable display when workflow not in `availableApps`, updates validation messaging (variant → revision).
Type Updates `web/oss/src/components/pages/evaluations/NewEvaluation/types.ts`	`onSelectApp` callback extended with optional `meta: {label?, isEvaluator?}` parameter.
Online Evaluation Filtering `web/oss/src/components/pages/evaluations/onlineEvaluation/OnlineEvaluationDrawer.tsx`	`previewEvaluators` now requires both `is_feedback === false` and `has_url === true` (added `has_url` filter).
Variant Section Text `web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsx`	Empty state text updated to "No available revisions found to display".

App/Workflow Management Filter Model

Layer / File(s)	Summary
Filter Atoms `web/oss/src/components/pages/app-management/store/appWorkflowFilterAtoms.ts`	New `WorkflowTypeFilter` union and `workflowTypeFilterAtom` (default `"app"`), `workflowInvokableOnlyAtom` (default `false`) for filtering listings.
Paginated Store Logic `web/oss/src/components/pages/app-management/store/appWorkflowStore.ts`	Store expanded to support dynamic workflow type filtering and invokable-only constraints; `EnrichedWorkflow` carries derived `_derivedType`/`_workflowKey`; client-side pagination when `invokableOnly` is enabled; row shape includes optional `workflowType`/`workflowKey`.
Management Section `web/oss/src/components/pages/app-management/components/ApplicationManagementSection.tsx`	Pins `workflowTypeFilterAtom` to `"app"` and `workflowInvokableOnlyAtom` to `false` on mount via `useEffect`.
Column Rendering `web/oss/src/components/pages/app-management/components/appWorkflowColumns.tsx`	App "Type" cell now renders via `WorkflowTypeTag` instead of plain `Tag`.
Store Exports `web/oss/src/components/pages/app-management/store/index.ts`	Re-exports `workflowPaginatedStore`, `workflowTypeFilterAtom`, `workflowInvokableOnlyAtom`, and `WorkflowTypeFilter` type.

Workflow Display Components

Layer / File(s)	Summary
Type Tag Component `web/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsx`	New component renders evaluator or app type badges with colors and tooltips; reads evaluator templates via Jotai store; uses `getEvaluatorColor`/`getAppTypeColor` for styling.
Kind Tag Component `web/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsx`	New component renders `Evaluator`/`App` kind badge (purple/blue) with optional `className`.
Module Exports `web/packages/agenta-entity-ui/src/workflow/index.ts`, `web/packages/agenta-entity-ui/package.json`, `web/packages/agenta-entity-ui/src/index.ts`	New `./workflow` subpath export; barrel files re-export `WorkflowTypeTag`/`WorkflowKindTag` and props types.

Testcase Editor & Variable Control Enhancement

Layer / File(s)	Summary
Testcase Editor Rewrite `web/oss/src/components/Playground/Components/PlaygroundTestcaseEditor.tsx`	Replaces `EntityDrillInView` with bespoke layout: tracks `editMode`/`fieldView`, derives `existingColumns`/`suggestedColumns` from schema/testcase, introduces `NestedFieldEditor` for sub-path editing, renders per-column delete actions, supports "Flat" view decomposition via `getPortSubPaths`, and adds property promotion UI.
Variable Control `web/packages/agenta-playground-ui/src/components/adapters/VariableControlAdapter.tsx`	Derives control label/schema from `schemaMap` (now includes optional `name`), computes typed JSON defaults from port schema (`properties`), uses `jsonDefault` for visual hint only (no store seeding), remounts Lexical editor when `schemaKey` changes.
Execution Selectors `web/packages/agenta-playground/src/state/execution/selectors.ts`, `web/packages/agenta-playground/src/state/execution/index.ts`, `web/packages/agenta-playground/src/state/controllers/executionItemController.ts`	New `outputPortSchemaMapAtom` alongside input schema map; controller exposes both input/output port schema via selectors for editor typeahead and variable control derivation.

Value Extraction & Code Editor Enhancement

Layer / File(s)	Summary
Value Coercion `web/packages/agenta-entities/src/shared/execution/valueExtraction.ts`	New `coerceCellValue` helper intelligently JSON-parses strings (heuristic: starts with `{` or `[`); `extractInputValues` return type now `Record<string, unknown>` with smart unwrapping (preserves typed non-string values instead of stringifying).
Code Editor Language Detection `web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/CodeEditorControl.tsx`	Language detection now prioritizes schema `x-parameters.language` → sibling field reference → `rootData.runtime` heuristic → fallback `"python"`; requires drill-in context via `useOptionalDrillIn`.
Optional Drill-In Hook `web/packages/agenta-entity-ui/src/DrillInView/components/MoleculeDrillInContext.tsx`, `web/packages/agenta-entity-ui/src/DrillInView/components/index.ts`	New `useOptionalDrillIn` hook returns context value or `null` (safe usage outside provider) alongside existing `useDrillIn`.

Trace Drawer & Workflow Revision Drawer

Layer / File(s)	Summary
Trace Payload Handling `web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/index.tsx`, `web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts`	Removed `stripNestedSpans` sanitization; trace content now uses raw active trace directly for "Raw Data" display and span drill-in.
Trace Type Header `web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx`	"Open in Playground" now passes `stacked: true` to workflow revision drawer (prevents trace drawer focus stealing).
Revision Drawer Stacking `web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/store.ts`, `web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsx`	New `workflowRevisionDrawerStackedAtom` and `OpenDrawerParams.stacked` flag enable mask blur/focus lock when drawer opens atop another drawer.

Testset Auto-Mapping Simplification

Layer / File(s)	Summary
Canonical Paths `web/oss/src/components/SharedDrawers/AddToTestsetDrawer/atoms/drawerState.ts`	Introduced `canonicalTracePathsAtom` returning only `"data.inputs"` and/or `"data.outputs"` (envelope-aware, top-level paths); removed leaf-based `leafTracePathsAtom` and `traceDataPathsAtom`.
Auto-Mapping Effect `web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts`	New effect seeds `mappingData` with one `"create"` mapping per canonical path (instead of single sanitized path from first available); uses path's last segment as column name.

Observability Filter Enhancement

Layer / File(s)	Summary
Filter Constants `web/oss/src/components/pages/observability/assets/constants.ts`	Added "Evaluator" reference filter group with `evaluator.id` and `evaluator.slug` leaf filters supporting `in` / `not_in` operators.

Sequence Diagram(s)

sequenceDiagram
    participant Editor as Token Editor
    participant PathProvider as PlaygroundTokenPathProvider
    participant NodeProvider as PlaygroundNodeTokenPathProvider
    participant InputSource as useInputsSource
    participant OutputSource as useOutputsSource
    participant ParametersSource as useParametersSource
    participant Testcase as useTestcaseSource
    participant Atom as Jotai Atoms

    Editor->>PathProvider: mount with {{$...}} token input
    PathProvider->>NodeProvider: scope to entity in DAG
    NodeProvider->>Atom: read nodeChainContextAtomFamily
    Atom-->>NodeProvider: {allowedSlots, upstreamEntityId}
    
    NodeProvider->>InputSource: create with scoped inputs schema
    NodeProvider->>OutputSource: create with upstream entity (if exists)
    NodeProvider->>ParametersSource: create with aggregated params
    NodeProvider->>Testcase: create with observed testcases
    
    Editor->>NodeProvider: request suggestions for $.inpu[query]
    NodeProvider->>InputSource: getSuggestions(["inpu"], query)
    InputSource->>Atom: read schemaMap + observedTestcases
    Atom-->>InputSource: merge schema keys + testcase data keys
    InputSource-->>NodeProvider: [label, hint] suggestions
    NodeProvider-->>Editor: render dropdown with suggestions
    
    Editor->>Editor: select suggestion, write token text
    Editor->>PathProvider: token committed to document

sequenceDiagram
    participant User as User
    participant Modal as NewEvaluationModal
    participant WorkflowSelector as SelectWorkflowSection
    participant Store as appWorkflowStore
    participant Filter as workflowFilterAtoms

    User->>Modal: open evaluation wizard
    Modal->>Filter: set workflowTypeFilterAtom='all'
    Modal->>Filter: set workflowInvokableOnlyAtom=true
    
    Modal->>WorkflowSelector: render workflow picker
    WorkflowSelector->>Store: query workflows with typeFilter + invokableOnly
    Store->>Store: fetch + filter (non-human, has_url=true)
    Store-->>WorkflowSelector: paginated workflow list
    
    User->>WorkflowSelector: toggle 'Show evaluators' on
    WorkflowSelector->>Filter: set workflowTypeFilterAtom='all'
    Store->>Store: refetch including evaluator subtypes
    
    User->>WorkflowSelector: select workflow + capture metadata
    WorkflowSelector->>Modal: onSelectWorkflow(id, {label, isEvaluator})
    Modal->>Modal: store selectedWorkflowMeta for display
    Modal->>Modal: advance to 'Revision' tab

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

[feat] Extend prompt templates #4171 — Both PRs modify SDK resolver context (handlers.py) and how inputs/outputs/prompt envelopes are assembled for LLM execution, including context injection ordering and dual-access variable wiring.
feat: frontend archive UI #4229 — Both PRs modify evaluators store (evaluatorsPaginatedStore.ts) including managed evaluator filtering, cache logic, and paginated query parameter updates.
[release] v0.98.0 #4251 — Both PRs enhance workflow type resolution helpers (helpers.ts) and evaluator-type classification via legacy key maps and revision flag analysis.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feature/eval-evaluations

coderabbitai

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

web/packages/agenta-entities/src/workflow/state/store.ts (1)

1082-1110: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

flatSchemaForReNest is still nested for local clones.

These branches read flatSchemaForReNest from localData, but local drafts created by createLocalDraftFromWorkflowRevision() are cloned from workflowEntityAtomFamily(...), so their evaluator params/schema are already normalized. In that case this variable never captures the original flat schema with the hidden markers you want to preserve, and the post-draft re-nest can still drop hidden/advanced fields on cloned evaluators after presets are applied. Consider cloning from the flat source (getFlatSourceData) or persisting the original flat schema alongside the local draft before normalizing it.

Also applies to: 1155-1168, 1270-1298, 1343-1356
web/packages/agenta-entities/src/workflow/state/molecule.ts (1)
792-805: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filter system fields after grouping to avoid leaking reserved nested inputs.

At Line 793, system fields are filtered on raw variable paths before grouping. With dotted paths (which this new grouped flow explicitly supports), a reserved top-level field can slip through (e.g. inputs.context.user_id won’t match context directly). Filter on grouped keys instead.
Suggested fix
-            const vars = extractVariablesFromConfig(params as Record<string, unknown>).filter(
-                (key) => !systemFields.has(key),
-            )
+            const vars = extractVariablesFromConfig(params as Record<string, unknown>)
             if (vars.length > 0) {
                 return groupTemplateVariables(vars)
-                    .filter((group) => group.envelope === "inputs")
+                    .filter(
+                        (group) =>
+                            group.envelope === "inputs" &&
+                            !systemFields.has(group.key),
+                    )
                     .map((group) => ({
                         key: group.key,
                         name: group.name,
                         type: group.type,
                         required: true,
                         ...(group.subPaths ? {schema: buildSubPathSchema(group.subPaths)} : {}),
                     }))
             }
web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx (1)
122-125: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update gating copy to “workflow” instead of “application.”

These messages now sit behind a workflow selector that can return apps or evaluators, so “application” is misleading.
✏️ Suggested wording tweak
- Select an application first to load this section.
+ Select a workflow first to load this section.

- Please select an application to continue configuring the evaluation.
+ Please select a workflow to continue configuring the evaluation.
Also applies to: 153-156

🧹 Nitpick comments (5)

web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts (1)
183-194: 💤 Low value

Minor: Redundant condition check.

When mappingData.length === 0, isMapColumnExist is guaranteed to be false (since some() on an empty array always returns false). The !isMapColumnExist check is therefore redundant in this condition.

This isn't a bug—the logic works correctly—but simplifying to just canonicalPaths.length > 0 && mappingData.length === 0 makes the intent clearer: seed mappings only when none exist and canonical paths are available.
♻️ Optional simplification
 useEffect(() => {
-    if (!isMapColumnExist && canonicalPaths.length > 0 && mappingData.length === 0) {
+    if (canonicalPaths.length > 0 && mappingData.length === 0) {
         setMappingData(
             canonicalPaths.map((path) => ({
                 id: createMappingId(),
                 data: path,
                 column: "create",
                 newColumn: path.split(".").pop() || path,
             })),
         )
     }
-}, [isMapColumnExist, canonicalPaths, mappingData.length, setMappingData])
+}, [canonicalPaths, mappingData.length, setMappingData])
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx (1)

185-190: ⚡ Quick win

Add a focused regression test for the stacked-open path.

This path now depends on stacked: true to avoid focus-trap conflicts; a small UI/state test around this call would help prevent subtle regressions.

web/packages/agenta-entities/src/workflow/state/molecule.ts (1)

753-761: ⚡ Quick win

Extract grouped-variable → port mapping into a shared helper.

The same mapping logic appears in two branches (Line 753-761 and Line 797-805). Pulling this into one helper will prevent branch drift and keep future tweaks (e.g., schema hint behavior) consistent.

Also applies to: 797-805

web/oss/src/components/pages/app-management/store/appWorkflowStore.ts (1)

341-371: 💤 Low value

Consider caching filterInvokableWorkflows results to avoid redundant filtering.

Both appWorkflowTotalCountQueryAtom and appWorkflowCountQueryAtom call filterInvokableWorkflows independently when invokableOnly is true, and so does fetchPage. While TanStack Query caches the underlying queryWorkflows response, the bulk revision fetch and filtering logic runs separately each time.

If this becomes a performance concern, consider memoizing the invokable entries or using a shared query atom for the filtered results.

web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx (1)

155-160: ⚡ Quick win

Debounce search before writing into shared workflow filters.

Right now each keypress updates store state and table search deps immediately; this is noisy and causes avoidable refresh churn.

As per coding guidelines, "Debounce search inputs and filters; throttle scroll and resize handlers."

Also applies to: 259-264

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: aa239025-fd5f-42b1-8bea-33c3d3c11434

📥 Commits

Reviewing files that changed from the base of the PR and between bd447c0 and 2ad51fd.

📒 Files selected for processing (78)

sdk/agenta/sdk/engines/running/handlers.py
web/oss/src/components/Evaluators/Table/assets/evaluatorColumns.tsx
web/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsx
web/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsx
web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsx
web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.ts
web/oss/src/components/Evaluators/assets/getColumns.tsx
web/oss/src/components/Evaluators/assets/types.ts
web/oss/src/components/Evaluators/store/evaluatorsPaginatedStore.ts
web/oss/src/components/Playground/Components/PlaygroundTestcaseEditor.tsx
web/oss/src/components/Playground/Components/PlaygroundVariantConfig/index.tsx
web/oss/src/components/Playground/OSSPlaygroundShell.tsx
web/oss/src/components/Playground/PlaygroundTokenPath/atoms.ts
web/oss/src/components/Playground/PlaygroundTokenPath/chainContext.ts
web/oss/src/components/Playground/PlaygroundTokenPath/index.tsx
web/oss/src/components/Playground/PlaygroundTokenPath/sources/inputs.ts
web/oss/src/components/Playground/PlaygroundTokenPath/sources/outputs.ts
web/oss/src/components/Playground/PlaygroundTokenPath/sources/parameters.ts
web/oss/src/components/Playground/PlaygroundTokenPath/sources/shared.ts
web/oss/src/components/Playground/PlaygroundTokenPath/sources/testcase.ts
web/oss/src/components/Playground/PlaygroundTokenPath/types.ts
web/oss/src/components/SharedDrawers/AddToTestsetDrawer/atoms/drawerState.ts
web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/index.tsx
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
web/oss/src/components/pages/app-management/components/ApplicationManagementSection.tsx
web/oss/src/components/pages/app-management/components/appWorkflowColumns.tsx
web/oss/src/components/pages/app-management/store/appWorkflowFilterAtoms.ts
web/oss/src/components/pages/app-management/store/appWorkflowStore.ts
web/oss/src/components/pages/app-management/store/index.ts
web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalInner.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/types.ts
web/oss/src/components/pages/evaluations/onlineEvaluation/OnlineEvaluationDrawer.tsx
web/oss/src/components/pages/observability/assets/constants.ts
web/oss/tests/playwright/acceptance/auto-evaluation/index.ts
web/oss/tests/playwright/acceptance/auto-evaluation/tests.ts
web/oss/tests/playwright/acceptance/human-annotation/index.ts
web/oss/tests/playwright/acceptance/human-annotation/tests.ts
web/packages/agenta-entities/src/runnable/evaluatorTransforms.ts
web/packages/agenta-entities/src/runnable/index.ts
web/packages/agenta-entities/src/runnable/portHelpers.ts
web/packages/agenta-entities/src/shared/execution/valueExtraction.ts
web/packages/agenta-entities/src/workflow/core/index.ts
web/packages/agenta-entities/src/workflow/core/schema.ts
web/packages/agenta-entities/src/workflow/index.ts
web/packages/agenta-entities/src/workflow/state/evaluatorUtils.ts
web/packages/agenta-entities/src/workflow/state/helpers.ts
web/packages/agenta-entities/src/workflow/state/index.ts
web/packages/agenta-entities/src/workflow/state/molecule.ts
web/packages/agenta-entities/src/workflow/state/store.ts
web/packages/agenta-entity-ui/package.json
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/CodeEditorControl.tsx
web/packages/agenta-entity-ui/src/DrillInView/components/MoleculeDrillInContext.tsx
web/packages/agenta-entity-ui/src/DrillInView/components/index.ts
web/packages/agenta-entity-ui/src/index.ts
web/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsx
web/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsx
web/packages/agenta-entity-ui/src/workflow/index.ts
web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsx
web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/store.ts
web/packages/agenta-playground-ui/src/components/adapters/VariableControlAdapter.tsx
web/packages/agenta-playground/src/state/controllers/executionItemController.ts
web/packages/agenta-playground/src/state/execution/index.ts
web/packages/agenta-playground/src/state/execution/selectors.ts
web/packages/agenta-shared/src/utils/index.ts
web/packages/agenta-shared/src/utils/templateVariable.ts
web/packages/agenta-ui/src/Editor/index.ts
web/packages/agenta-ui/src/Editor/plugins/code/index.tsx
web/packages/agenta-ui/src/Editor/plugins/token/TokenNode.ts
web/packages/agenta-ui/src/Editor/plugins/token/TokenPathSuggestionsContext.tsx
web/packages/agenta-ui/src/Editor/plugins/token/TokenTooltipPlugin.tsx
web/packages/agenta-ui/src/Editor/plugins/token/TokenTypeaheadPlugin.tsx
web/packages/agenta-ui/src/Editor/plugins/token/extensions/tokenBehavior.tsx

💤 Files with no reviewable changes (8)

web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.ts
web/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsx
web/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsx
web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsx
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsx
web/oss/src/components/Evaluators/assets/types.ts
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
web/oss/src/components/Evaluators/assets/getColumns.tsx

coderabbitai · 2026-05-04T15:31:34Z

+    if inputs is not None:
+        context.update(**inputs)
        context.update(
            **{
-                "prediction": outputs,
-                "outputs": outputs,
+                "inputs": inputs,
            }


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reserve or protect the synthetic inputs envelope key.

These changes overwrite any real caller-supplied field named inputs with the synthetic envelope object. After that, root-level prompt references to inputs no longer see the user value, which is a breaking behavior change unless "inputs" is now a reserved field name everywhere. Please either reject reserved names up front or preserve the original field before injecting the envelope.

Also applies to: 2211-2218, 2283-2290

coderabbitai · 2026-05-04T15:31:34Z

+        (value: string, meta?: {label?: string; isEvaluator?: boolean}) => {
            if (value === selectedAppId) return
            setSelectedAppId(value)
+            setSelectedWorkflowMeta(value ? (meta ?? null) : null)
            setSelectedTestsetId("")


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Allow metadata updates even when the selected workflow ID is unchanged.

The early return on same value skips setSelectedWorkflowMeta, so label/kind fallback can remain stale.

🔧 Minimal fix

- (value: string, meta?: {label?: string; isEvaluator?: boolean}) => { - if (value === selectedAppId) return + (value: string, meta?: {label?: string; isEvaluator?: boolean}) => { + if (value === selectedAppId) { + if (value && meta) setSelectedWorkflowMeta(meta) + return + } setSelectedAppId(value) setSelectedWorkflowMeta(value ? (meta ?? null) : null)

coderabbitai · 2026-05-04T15:31:34Z

+    const onSelectRow = useCallback(
+        (selectedRowKeys: React.Key[]) => {
+            if (disabled) return
+            const selectedId = selectedRowKeys[0] as string | undefined
+            if (!selectedId) {
+                onSelectWorkflow("")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix single-selection behavior: current checkbox model can keep the old selection.

This picker stores a single selectedWorkflowId, but checkbox mode sends multi-key arrays and you always take the first key. Selecting another row can keep the previous ID selected.

✅ Suggested fix (radio selection)

- const onSelectRow = useCallback( - (selectedRowKeys: React.Key[]) => { + const onSelectRow = useCallback( + (selectedRowKeys: React.Key[]) => { if (disabled) return - const selectedId = selectedRowKeys[0] as string | undefined + const selectedId = selectedRowKeys.at(-1) as string | undefined if (!selectedId) { onSelectWorkflow("") return } const row = tableRows.find((r) => r.workflowId === selectedId) onSelectWorkflow(selectedId, { label: row?.name, isEvaluator: row?.isEvaluator, }) }, [disabled, onSelectWorkflow, tableRows], ) const rowSelection = useMemo( () => ({ - type: "checkbox" as const, + type: "radio" as const, selectedRowKeys: selectedWorkflowId ? [selectedWorkflowId] : [], onChange: (keys: React.Key[]) => onSelectRow(keys), getCheckboxProps: () => ({disabled}), selectOnRowClick: !disabled, }), [selectedWorkflowId, onSelectRow, disabled], )

Also applies to: 191-199

coderabbitai · 2026-05-04T15:31:34Z

    const previewEvaluators = useMemo(
-        () => (evaluators || []).filter((e) => e.flags?.is_feedback !== true),
+        () =>
+            (evaluators || []).filter(
+                (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,
+            ),


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

has_url-only filtering can hide valid online evaluators

On Line 78, requiring e.flags?.has_url === true is narrower than the platform’s online-capable evaluator logic (e.g., code/LLM/handler-backed evaluators). This can remove valid options and block live-eval creation.

Suggested fix

-import { +import { evaluatorConfigRevisionsListDataAtom, evaluatorConfigRevisionsQueryStateAtom, evaluatorTemplatesDataAtom, evaluatorTemplatesQueryAtom, + isOnlineCapableEvaluator, } from "@agenta/entities/workflow" @@ - const previewEvaluators = useMemo( - () => - (evaluators || []).filter( - (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true, - ), - [evaluators], - ) + const previewEvaluators = useMemo( + () => + (evaluators || []).filter( + (e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e), + ), + [evaluators], + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const previewEvaluators = useMemo(

() => (evaluators || []).filter((e) => e.flags?.is_feedback !== true),

() =>

(evaluators || []).filter(

(e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,

),

import {

evaluatorConfigRevisionsListDataAtom,

evaluatorConfigRevisionsQueryStateAtom,

evaluatorTemplatesDataAtom,

evaluatorTemplatesQueryAtom,

isOnlineCapableEvaluator,

} from "@agenta/entities/workflow"

const previewEvaluators = useMemo(

() =>

(evaluators || []).filter(

(e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e),

),

[evaluators],

)

coderabbitai · 2026-05-04T15:31:34Z

+    const {value, isParsable} = useMemo(() => {
+        if (!parentRaw) return {value: "", isParsable: true}
+        try {
+            const parsed = JSON.parse(parentRaw) as unknown
+            if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
+                const raw = (parsed as Record<string, unknown>)[subPath]
+                return {value: raw == null ? "" : String(raw), isParsable: true}
+            }
+            return {value: "", isParsable: false}
+        } catch {
+            return {value: "", isParsable: false}
+        }
+    }, [parentRaw, subPath])
+
+    const handleChange = useCallback(
+        (nextVal: string) => {
+            let parsed: Record<string, unknown> = {}
+            if (parentRaw) {
+                try {
+                    const p = JSON.parse(parentRaw) as unknown
+                    if (p && typeof p === "object" && !Array.isArray(p)) {
+                        parsed = {...(p as Record<string, unknown>)}
+                    }
+                } catch {
+                    // non-JSON parent — start fresh; overwrite handled by isParsable gate
+                }
+            }
+            parsed[subPath] = nextVal
+            setCellValue({
+                testcaseId,
+                column: parentKey,
+                value: JSON.stringify(parsed),
+            })


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Walk dotted sub-paths instead of indexing them literally.

Line 90 and Line 112 treat subPath as a single property name, but getPortSubPaths() is intentionally allowed to return multi-segment hints like a.b.c. In flat view that reads/writes {"a.b.c": ...} instead of traversing a -> b -> c, so nested fields are edited incorrectly.

Suggested fix

+const splitPath = (path: string) => path.split(".").filter(Boolean) + +function getValueAtPath(obj: Record<string, unknown>, path: string): unknown { + return splitPath(path).reduce<unknown>((acc, segment) => { + return acc && typeof acc === "object" && !Array.isArray(acc) + ? (acc as Record<string, unknown>)[segment] + : undefined + }, obj) +} + +function setValueAtPath(obj: Record<string, unknown>, path: string, value: string) { + const segments = splitPath(path) + let cursor = obj + + for (const segment of segments.slice(0, -1)) { + const next = cursor[segment] + cursor[segment] = + next && typeof next === "object" && !Array.isArray(next) + ? {...(next as Record<string, unknown>)} + : {} + cursor = cursor[segment] as Record<string, unknown> + } + + cursor[segments[segments.length - 1]!] = value +} + const {value, isParsable} = useMemo(() => { if (!parentRaw) return {value: "", isParsable: true} try { const parsed = JSON.parse(parentRaw) as unknown if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) { - const raw = (parsed as Record<string, unknown>)[subPath] + const raw = getValueAtPath(parsed as Record<string, unknown>, subPath) return {value: raw == null ? "" : String(raw), isParsable: true} } return {value: "", isParsable: false} @@ - parsed[subPath] = nextVal + setValueAtPath(parsed, subPath, nextVal) setCellValue({ testcaseId, column: parentKey, value: JSON.stringify(parsed), })

coderabbitai · 2026-05-04T15:31:35Z

+        const explicit = xParams?.language as string | undefined
+        if (explicit) return explicit


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate the explicit schema language before returning it.

Only the dynamic candidates are normalized and checked. An explicit value like "Python" or a typo bypasses the guard and gets cast into SharedEditor, which skips the fallback and can leave syntax highlighting in an undefined state.

Suggested fix

- const explicit = xParams?.language as string | undefined - if (explicit) return explicit + const explicit = + typeof xParams?.language === "string" ? xParams.language.toLowerCase() : undefined + if (explicit && isSupportedLanguage(explicit)) return explicit

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const explicit = xParams?.language as string | undefined

if (explicit) return explicit

const explicit =

typeof xParams?.language === "string" ? xParams.language.toLowerCase() : undefined

if (explicit && isSupportedLanguage(explicit)) return explicit

coderabbitai · 2026-05-04T15:31:35Z

+    const jsonDefault = useMemo(() => {
+        if (portType === "array") return "[]"
+        const props =
+            portSchema && typeof portSchema === "object"
+                ? (portSchema as {properties?: Record<string, unknown>}).properties
+                : null
+        if (!props || typeof props !== "object") return "{}"
+        const keys = Object.keys(props)
+        if (keys.length === 0) return "{}"
+        const obj: Record<string, string> = {}
+        for (const k of keys) obj[k] = ""
+        return JSON.stringify(obj, null, 2)
+    }, [portType, portSchema])


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t render a JSON value that won’t be submitted.

These lines make empty object/array cells look populated, but the underlying testcase cell stays "" and the comment below confirms the run payload remains empty. In this flow that is pretty misleading: evaluator inputs can appear present in the editor and still resolve as missing at execution time. Either persist the seed before execution, or surface the schema-derived shape as placeholder/help text instead of value.

Also applies to: 221-223, 239-249

coderabbitai · 2026-05-04T15:31:35Z

+            if (language === "json" || language === "yaml") {
+                const currentParsed = safeJson5Parse(currentEditorContent)
+                const incomingParsed = safeJson5Parse(initialValue)
+                if (!isEqual(currentParsed, incomingParsed)) {
+                    needsDispatch = true
+                    // Force only when we're replacing real content; on first
+                    // populate, the handler should run normally.
+                    forceUpdate = hasExistingContent
                }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Inspect safeJson5Parse implementation and callers:" rg -n -C4 '\bsafeJson5Parse\b' --glob '*.{ts,tsx,js,jsx}' echo echo "Inspect the YAML equality gate in the editor plugin:" sed -n '895,903p' web/packages/agenta-ui/src/Editor/plugins/code/index.tsx

Repository: Agenta-AI/agenta

Length of output: 8839

🏁 Script executed:

# Check the implementation of tryParsePartialJson rg -n -A 10 'export function tryParsePartialJson' web/packages/agenta-shared/src/utils/jsonParsing.ts

Repository: Agenta-AI/agenta

Length of output: 540

🏁 Script executed:

# Check the imports in the editor plugin file head -n 30 web/packages/agenta-ui/src/Editor/plugins/code/index.tsx

Repository: Agenta-AI/agenta

Length of output: 923

🏁 Script executed:

# Get the full implementation of tryParsePartialJson sed -n '32,80p' web/packages/agenta-shared/src/utils/jsonParsing.ts

Repository: Agenta-AI/agenta

Length of output: 2007

Use a YAML parser in the equality gate.

This branch treats YAML like JSON5. When YAML strings that aren't valid JSON fail safeJson5Parse(), both fallback to tryParsePartialJson(), which applies JSON heuristics and returns null for unparseable content. Two different YAML documents can both return null, making isEqual() treat them as identical and skip needsDispatch. External YAML updates never reach the editor. Parse YAML with yaml.load() instead, and fall back to string comparison when parsing fails.

Suggested fix

- if (language === "json" || language === "yaml") { + if (language === "json") { const currentParsed = safeJson5Parse(currentEditorContent) const incomingParsed = safeJson5Parse(initialValue) if (!isEqual(currentParsed, incomingParsed)) { needsDispatch = true - // Force only when we're replacing real content; on first - // populate, the handler should run normally. forceUpdate = hasExistingContent } return } + + if (language === "yaml") { + let currentParsed: unknown = currentEditorContent?.trim() ?? "" + let incomingParsed: unknown = initialValue?.trim() ?? "" + + try { + currentParsed = yaml.load(currentEditorContent) + } catch {} + + try { + incomingParsed = yaml.load(initialValue) + } catch {} + + if (!isEqual(currentParsed, incomingParsed)) { + needsDispatch = true + forceUpdate = hasExistingContent + } + return + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (language === "json" || language === "yaml") {

const currentParsed = safeJson5Parse(currentEditorContent)

const incomingParsed = safeJson5Parse(initialValue)

if (!isEqual(currentParsed, incomingParsed)) {

needsDispatch = true

// Force only when we're replacing real content; on first

// populate, the handler should run normally.

forceUpdate = hasExistingContent

}

if (language === "json") {

const currentParsed = safeJson5Parse(currentEditorContent)

const incomingParsed = safeJson5Parse(initialValue)

if (!isEqual(currentParsed, incomingParsed)) {

needsDispatch = true

forceUpdate = hasExistingContent

}

return

}

if (language === "yaml") {

let currentParsed: unknown = currentEditorContent?.trim() ?? ""

let incomingParsed: unknown = initialValue?.trim() ?? ""

try {

currentParsed = yaml.load(currentEditorContent)

} catch {}

try {

incomingParsed = yaml.load(initialValue)

} catch {}

if (!isEqual(currentParsed, incomingParsed)) {

needsDispatch = true

forceUpdate = hasExistingContent

}

return

}

coderabbitai · 2026-05-04T15:31:35Z

    updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {
        const text = this.getTextContent()
-        if (text !== dom.textContent) {
-            dom.textContent = text
-            return true
-        }
-        return false
+        const textChanged = text !== dom.textContent
+        if (textChanged) dom.textContent = text
+        // Re-apply styles even when text didn't change at this exact node —
+        // edits to a nearby path segment can change validity without
+        // changing this node's own text, but when it DOES change we also
+        // need to re-evaluate because the new text may be valid/invalid.
+        applyTokenStyles(dom, text)
+        return textChanged


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Lexical 0.40, what does a custom TextNode.updateDOM(...) returning true mean? If the node already updates textContent/styles in place, should updateDOM return false to avoid DOM replacement?

💡 Result:

In Lexical 0.40, a custom TextNode.updateDOM(...) returning true signals to Lexical that the DOM element needs to be replaced with a new one created by createDOM. Returning false indicates that the existing DOM element was successfully updated in place, avoiding replacement. Yes, if the node already updates textContent/styles in place (e.g., via super.updateDOM and additional style changes), it should return false to prevent unnecessary DOM replacement and preserve performance/DOM state like selection anchors.

Citations:

1: https://lexical.dev/docs/concepts/nodes

2: https://www.mintlify.com/facebook/lexical/concepts/nodes

Return false from updateDOM() after mutating the span in place.

This method already updates textContent and styles directly on the DOM element. Returning textChanged signals Lexical to recreate the DOM node on text edits, which breaks selection/hover continuity and defeats the in-place update optimization. Per Lexical's node contract, return false to indicate the DOM was successfully updated in place.

Suggested fix

updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean { const text = this.getTextContent() const textChanged = text !== dom.textContent if (textChanged) dom.textContent = text // Re-apply styles even when text didn't change at this exact node — // edits to a nearby path segment can change validity without // changing this node's own text, but when it DOES change we also // need to re-evaluate because the new text may be valid/invalid. applyTokenStyles(dom, text) - return textChanged + return false }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {

const text = this.getTextContent()

if (text !== dom.textContent) {

dom.textContent = text

return true

}

return false

const textChanged = text !== dom.textContent

if (textChanged) dom.textContent = text

// Re-apply styles even when text didn't change at this exact node —

// edits to a nearby path segment can change validity without

// changing this node's own text, but when it DOES change we also

// need to re-evaluate because the new text may be valid/invalid.

applyTokenStyles(dom, text)

return textChanged

updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {

const text = this.getTextContent()

const textChanged = text !== dom.textContent

if (textChanged) dom.textContent = text

// Re-apply styles even when text didn't change at this exact node —

// edits to a nearby path segment can change validity without

// changing this node's own text, but when it DOES change we also

// need to re-evaluate because the new text may be valid/invalid.

applyTokenStyles(dom, text)

return false

}

coderabbitai · 2026-05-04T15:31:35Z

+                    node.setTextContent(suggestion.tokenText)
+                    // Position cursor just before the closing `}}` so the user
+                    // can keep typing (e.g. drill further into a path).
                    navigateCursor({
                        nodeKey: node.getKey(),
-                        offset: node.getTextContent().length,
+                        offset: suggestion.tokenText.length - 2,
                    })


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Selecting a suggestion in the middle of a token drops the suffix.

Line 285 opens the menu anywhere inside {{...}}, but Line 251 rewrites the entire token to suggestion.tokenText. If the caret sits in the middle of {{$.inputs.country}}, accepting a suggestion replaces everything after the caret as well. The safe fix is to keep autocomplete end-of-token only until the replacement logic becomes caret-aware.

Suggested fix

- if (match && offsetPos >= 2 && offsetPos <= text.length - 2) { + if (match && offsetPos === text.length - 2) { setInputQuery(match[1]) const dom = editor.getElementByKey(node.getKey()) if (dom) {

Also applies to: 285-286

ardaerzin added 12 commits April 23, 2026 15:34

debug: add detailed logging for context updates in auto_ai_critique_v…

51fa900

…0 handler

Merge branch 'improve/evaluator-span-in-playground' into feature/eval…

56099cd

…-evaluations

Merge branch 'improve/evaluator-span-in-playground' into feature/eval…

44a66b8

…-evaluations

refactor: remove stripNestedSpans utility and its usage from TraceCon…

b6d3022

…tent component

refactor: remove debug logging from auto_ai_critique_v0 handler

6c33c89

feat(web): unify workflow type tags and simplify evaluator table

c6577c9

feat(web): improve workflow classification in selection flows

137c082

feat(web): add token-path aware variable editing in playground

918dc9f

github-advanced-security AI found potential problems Apr 28, 2026

View reviewed changes

Comment thread web/packages/agenta-shared/src/utils/templateVariable.ts Fixed

Comment thread web/packages/agenta-shared/src/utils/templateVariable.ts Fixed

Comment thread web/packages/agenta-shared/src/utils/templateVariable.ts Fixed

ardaerzin changed the base branch from main to release/v0.96.10 April 28, 2026 13:16

Merge branch 'improve/evaluator-span-in-playground' into feature/eval…

785a4b1

…-evaluations

vercel Bot deployed to Preview April 28, 2026 13:19 View deployment

vercel Bot deployed to Preview April 28, 2026 13:23 View deployment

Merge branch 'main' into feature/eval-evaluations

a927100

vercel Bot deployed to Preview April 29, 2026 10:57 View deployment

ardaerzin changed the base branch from release/v0.96.10 to main April 29, 2026 12:20

ardaerzin added 6 commits April 29, 2026 18:19

feat: add tooltip and truncation to workflow type tags

7c2f423

Wrap type pills in tooltips to show full labels on hover and apply max-width truncation to prevent long evaluator names (e.g., `__main__.MyEvaluator`) from breaking column layouts.

fix: remove capitalize class from automatic tags to preserve original…

81ded12

… casing

fix(editor): sync initial code content against current Lexical state

237ddc7

fix(evaluator): preserve hidden fields for round-trip and strip in re…

e78b8c9

…ndered nested config

vercel Bot deployed to Preview April 30, 2026 07:53 View deployment

vercel Bot deployed to Preview April 30, 2026 07:58 View deployment

ardaerzin marked this pull request as ready for review April 30, 2026 08:36

dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. feature Frontend labels Apr 30, 2026

ardaerzin requested a review from ashrafchowdury April 30, 2026 09:30

ardaerzin added 2 commits April 30, 2026 12:19

fix(web): update app tab label copy in evaluation modal

bb32b09

test(web): align evaluation e2e steps with revision tab naming

35330a6

vercel Bot deployed to Preview April 30, 2026 10:22 View deployment

bekossy approved these changes Apr 30, 2026

View reviewed changes

dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 30, 2026

ardaerzin added 2 commits April 30, 2026 19:36

fix(frontend): support stacked workflow revision drawer focus lock

3aecd9c

feat(frontend): simplify evaluation workflow picker filters

040433b

vercel Bot deployed to Preview April 30, 2026 17:38 View deployment

vercel Bot deployed to Preview April 30, 2026 17:50 View deployment

vercel Bot deployed to Preview May 1, 2026 08:17 View deployment

Merge branch 'main' into feature/eval-evaluations

2ad51fd

vercel Bot deployed to Preview May 4, 2026 15:21 View deployment

coderabbitai Bot reviewed May 4, 2026

View reviewed changes

-    const previewEvaluators = useMemo(
-        () => (evaluators || []).filter((e) => e.flags?.is_feedback !== true),
-        () =>
-            (evaluators || []).filter(
-                (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,
-            ),
+import {
+    evaluatorConfigRevisionsListDataAtom,
+    evaluatorConfigRevisionsQueryStateAtom,
+    evaluatorTemplatesDataAtom,
+    evaluatorTemplatesQueryAtom,
+    isOnlineCapableEvaluator,
+} from "@agenta/entities/workflow"
+    const previewEvaluators = useMemo(
+        () =>
+            (evaluators || []).filter(
+                (e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e),
+            ),
+        [evaluators],
+    )

		const explicit = xParams?.language as string \| undefined
		if (explicit) return explicit

Conversation

ardaerzin commented Apr 28, 2026

Summary

Testing

Verified locally

Added or updated tests

QA follow-up

Checklist

Contributor Resources

Uh oh!

vercel Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Railway Preview Environment

Uh oh!

mmabrouk commented Apr 30, 2026

Uh oh!

ardaerzin commented Apr 30, 2026

Uh oh!

mmabrouk commented Apr 30, 2026

Uh oh!

ardaerzin commented Apr 30, 2026

Uh oh!

mmabrouk commented Apr 30, 2026

Uh oh!

ardaerzin commented Apr 30, 2026

Uh oh!

mmabrouk commented Apr 30, 2026

Uh oh!

coderabbitai Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

vercel Bot commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 30, 2026 •

edited

Loading

coderabbitai Bot commented May 4, 2026 •

edited

Loading