Skip to content

[Frontend Feat] Enable running evaluations on evaluators#4237

Open
ardaerzin wants to merge 29 commits intomainfrom
feature/eval-evaluations
Open

[Frontend Feat] Enable running evaluations on evaluators#4237
ardaerzin wants to merge 29 commits intomainfrom
feature/eval-evaluations

Conversation

@ardaerzin
Copy link
Copy Markdown
Contributor

Summary

can now select an evaluator workflow and run an evaluation on it.

Testing

Verified locally

ran an evaluation on an evaluator

Added or updated tests

QA follow-up

need to test more evaluations on evaluators

Checklist

  • I have included a video or screen recording for UI changes, or marked Demo as N/A
  • Relevant tests pass locally
  • Relevant linting and formatting pass locally
  • I have signed the CLA, or I will sign it when the bot prompts me

Contributor Resources

…nd unify workflow store for apps and evaluators

- Add `transformAgDataForTestset` to unwrap nested subject data from evaluator annotation spans, stripping internal bookkeeping keys and reshaping to `{inputs, outputs, score}` for testset compatibility
- Detect evaluator annotation spans via `trace_type === "annotation"` and `is_evaluator` flag or evaluator references
- Apply transformation to both live and original data for consistent
…cture

- Remove `transformAgDataForTestset` and evaluator annotation span detection logic that unwrapped nested subject data
- Replace `leafTracePathsAtom` and `traceDataPathsAtom` with `canonicalTracePathsAtom` that returns only `data.inputs` and `data.outputs` when present
- Update auto-mapping to seed from canonical envelope paths instead of all leaf paths
- Preserve evaluator annotation spans' nested structure in testcases to maintain replay
…chat handlers

- Expose inputs both flattened at root (e.g., `{{country}}`) and nested under `inputs` key (e.g., `{{$.inputs.country}}`) for consistent JSONPath resolution across completion, chat, and evaluator prompts
- Clear `input_keys` after validation to prevent `PromptTemplate.format()` from rejecting the injected envelope meta-key as extra
- Exclude `messages` from nested view in chat handler since it is handled separately
…yground

- Wrap `PlaygroundConfigSection` in `PlaygroundNodeTokenPathProvider` to scope suggestions to the node's chain context
- Split token path provider into global (inputs-only) and scoped (adds outputs when upstream exists) variants
- Update `useInputsSource` to accept `scopedEntityId` and read entity-specific input ports when provided
- Update `useOutputsSource` to accept `upstreamEntityId` and read upstream output ports directly
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agenta-documentation Ready Ready Preview, Comment May 4, 2026 3:21pm

Request Review

Comment thread web/packages/agenta-shared/src/utils/templateVariable.ts Fixed
Comment thread web/packages/agenta-shared/src/utils/templateVariable.ts Fixed
Comment thread web/packages/agenta-shared/src/utils/templateVariable.ts Fixed
@ardaerzin ardaerzin changed the base branch from main to release/v0.96.10 April 28, 2026 13:16
…traction to prevent ReDoS

- Replace regex-based extraction with prefix/suffix checks and string slicing
- Prevents polynomial backtracking on adversarial inputs (addresses CodeQL js/redos)
- Handle Jinja whitespace control (`{%-` / `-%}`) via explicit slice trimming
- Maintain identical behavior for `{{...}}`, `{%...%}`, and `{#...#}` wrappers
@ardaerzin ardaerzin changed the base branch from release/v0.96.10 to main April 29, 2026 12:20
Replace legacy fallback mode with first-class flat token support. Both `{{country}}` and `{{$.inputs.country}}` now use the same suggestion pipeline, with flat tokens implicitly drilling into `$.inputs.*` to surface testcase columns and port schemas.
Wrap type pills in tooltips to show full labels on hover and apply max-width truncation to prevent long evaluator names (e.g., `__main__.MyEvaluator`) from breaking column layouts.
Add `is_managed` flag checks to exclude user-deployed Python evaluators (e.g., `user:custom:__main__.MyEval:latest`) from the Evaluators page. These SDK-registered evaluators aren't first-class catalog entries and should only appear in workflow execution contexts. Apply filter at both workflow-id cache level and revision query level for defense-in-depth.
Implement multi-tier language detection for CodeEditorControl: explicit schema override → sibling field lookup via `languageFromField` → heuristic on `runtime` field → python fallback. Add `useOptionalDrillIn` hook to safely access drill-in context from shared controls that may render outside the provider tree.
@ardaerzin ardaerzin marked this pull request as ready for review April 30, 2026 08:36
@dosubot dosubot Bot added size:XXL This PR changes 1000+ lines, ignoring generated files. feature Frontend labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Railway Preview Environment

Preview URL https://gateway-production-4a40.up.railway.app/w
Image tag pr-4237-4af29ca
Status Failed
Railway logs Open logs
Logs View workflow run
Updated at 2026-05-04T15:37:37.851Z

@mmabrouk
Copy link
Copy Markdown
Member

Thanks @ardaerzin

I think the filtering view could be improved. The number of tabs is very large and always overflow.

My comments:

  • Per default we should show only the prompts (completion and chat) and not the evaluators.
    -- One option is to simplify it by adding a switch "include/show evaluators" that is per default switched off. So the default is that the user only see their prompts (current state), and then if they switch this on they see all apps including evaluators. We could have a tooltip (you can evaluate your evaluators and compare them to human feedback for instance).
    -- Another option is to have a drop down for filtering
CleanShot 2026-04-30 at 13 18 03@2x

I have created some designs that might be helpful

CleanShot.2026-04-30.at.13.37.20.mp4
CleanShot.2026-04-30.at.13.38.13.mp4

Data:
Filtering applicatoins to evaluate.zip

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Apr 30, 2026
@ardaerzin
Copy link
Copy Markdown
Contributor Author

  • One option is to simplify it by adding a switch "include/show evaluators" that is per default switched off. So the default is that the user only see their prompts (current state), and then if they switch this on they see all apps including evaluators. We could have a tooltip (you can evaluate your evaluators and compare them to human feedback for instance)

@mmabrouk here's the approach I took

Screenshot 2026-04-30 at 19 34 23 Screenshot 2026-04-30 at 19 34 28 Screenshot 2026-04-30 at 19 34 33

Copy link
Copy Markdown
Member

Thanks @ardaerzin looks fine to me (although redundant imo).
However, I am not sure where "All types" went in the dropdown. When you opened the Drop down the "All Types option" was not there.

@ardaerzin
Copy link
Copy Markdown
Contributor Author

@mmabrouk there's a remove option after a selection is made

Screenshot 2026-04-30 at 19 39 08

Copy link
Copy Markdown
Member

Why not have it simply as an option like in the design? Feels like less obvious. Sorry too nitpicky

Replace `null` placeholder + clear button pattern with an explicit "All types" first option in the workflow type filter Select. Simplifies state handling by eliminating nullable semantics and ensures the reset path is always visible as a clickable option rather than requiring hover to reveal a clear button.
@ardaerzin
Copy link
Copy Markdown
Contributor Author

Why not have it simply as an option like in the design? Feels like less obvious. Sorry too nitpicky

@mmabrouk fair enough. pushed this change

Screenshot 2026-04-30 at 19 47 44

Copy link
Copy Markdown
Member

Can I be nitpiky again and say that we should remove the padding for the subitems otherwise it looks weird

…orkflow type filter

Remove extra left indent from Ant Design's grouped select options to ensure visual alignment with the ungrouped "All types" entry. The default `.ant-select-item-option-grouped` padding creates unintended hierarchy when only one or two groups are visible.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 4, 2026

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added scope-aware token path typeahead suggestions for template variables with envelope slot completion.
    • Introduced template variable validation with error tooltips and correction suggestions.
    • Enhanced testcase editor with flat/nested view toggle, property addition, and column delete actions.
    • Added workflow type filtering to application management.
  • Bug Fixes

    • Improved evaluator filtering to display only platform-managed evaluators.
    • Enhanced code editor language detection based on runtime hints.
    • Fixed nested span handling in trace data display.
  • UI/UX Updates

    • Renamed "Variant" to "Revision" in evaluation modals.
    • Refactored evaluator table columns with unified type tag rendering.
    • Improved JSON default value computation for testcase editor fields.

Walkthrough

This pull request introduces a comprehensive token path suggestion system for the playground editor, refactors evaluator component display using shared WorkflowTypeTag components, enhances workflow type resolution and template variable validation, updates the evaluation modal flow to use "Revision" terminology and new workflow selection logic, improves testcase editor capabilities with nested field support, and adjusts SDK resolver context injection for prompt templating. The changes span multiple layers: data shape updates (atoms, types), core logic implementations (resolvers, filters, sources), UI component refactoring (removing custom cells, wiring new providers), and supporting utilities (template validation, port helpers).


Changes

SDK Resolver Context & Evaluator Envelope Handling

Layer / File(s) Summary
Core Handler Updates
sdk/agenta/sdk/engines/running/handlers.py
auto_ai_critique_v0 now injects inputs before outputs with dual-access (flattened + nested). completion_v0 and chat_v0 clear input_keys after validation and add _variables["inputs"] envelope for consistent template resolution.

Evaluator Component Display Consolidation

Layer / File(s) Summary
Custom Cell Removal
web/oss/src/components/Evaluators/assets/cells/*
Deleted EvaluatorTagsCell, EvaluatorTypePill, TableDropdownMenu (and types), and getColumns function that previously rendered evaluator table UI.
Column Refactoring
web/oss/src/components/Evaluators/Table/assets/evaluatorColumns.tsx
EvaluatorTypeCell now derives evaluatorKey from workflowKeyAtomFamily and renders via shared WorkflowTypeTag. Automatic evaluator tags switched from capitalize to truncate styling.
Managed Evaluator Filtering
web/oss/src/components/Evaluators/store/evaluatorsPaginatedStore.ts
Added isManagedEvaluator helper and filtering logic to exclude non-managed evaluators from cache and paginated store queries (server-side filter is_managed: true).

Token Path Suggestion System

Layer / File(s) Summary
Atom State
web/oss/src/components/Playground/PlaygroundTokenPath/atoms.ts
New atoms: observedTestcasesAtom (derives testcase entities from execution rows), aggregatedParametersSchemaAtom (unions parameter schemas across nodes).
Chain Context Derivation
web/oss/src/components/Playground/PlaygroundTokenPath/chainContext.ts
NodeChainContext describes allowed envelope slots and upstream entity; nodeChainContextAtomFamily resolves node input/output availability based on DAG position.
Envelope Source Hooks
web/oss/src/components/Playground/PlaygroundTokenPath/sources/*
useInputsSource, useOutputsSource, useParametersSource, useTestcaseSource implement suggestion generation per envelope slot; shared getSubPathsFromSchema, collectNextKeysAtPath, aggregateObservedKeys helpers extract/merge observable keys.
Provider Implementation
web/oss/src/components/Playground/PlaygroundTokenPath/index.tsx, web/oss/src/components/Playground/PlaygroundTokenPath/types.ts
PlaygroundTokenPathProvider (global, inputs-only), PlaygroundNodeTokenPathProvider (scoped to node's chain context), EnvelopeSource interface for slot-based suggestion dispatch.
Shell Integration
web/oss/src/components/Playground/OSSPlaygroundShell.tsx, web/oss/src/components/Playground/Components/PlaygroundVariantConfig/index.tsx
Wraps playground UI with PlaygroundTokenPathProvider and scopes PlaygroundNodeTokenPathProvider to config section for depth-aware JSONPath typeahead.
Editor Token Support
web/packages/agenta-ui/src/Editor/plugins/token/*
TokenPathSuggestionsContext provides consumer hook for path-aware suggestions; TokenNode validates expressions and applies styling/tooltips; TokenTypeaheadPlugin uses unified Suggestion model with path vs flat mode handling; TokenTooltipPlugin renders validation tooltips for invalid tokens.
Editor Tests/Docs
web/packages/agenta-ui/src/Editor/index.ts, web/packages/agenta-ui/src/Editor/plugins/code/index.tsx
Exports suggestion provider/hook and types; refactors InsertInitialCodeBlockPlugin to compare semantic content (JSON/YAML) before dispatching initialization.

Template Variable Validation & Port Grouping

Layer / File(s) Summary
Validation Utilities
web/packages/agenta-shared/src/utils/templateVariable.ts
New KNOWN_ENVELOPE_SLOTS constant; validateTemplateVariable, isValidTemplateVariable, extractTemplateExpression check JSONPath/JSON Pointer roots against known envelopes and detect malformed paths.
Port Helpers & Grouping
web/packages/agenta-entities/src/runnable/portHelpers.ts
extractLastPathSegment for display labels; GroupedTemplateVariable type and groupTemplateVariables function parse/validate/group template placeholders by envelope/key with sub-path aggregation.
Molecule Port Derivation
web/packages/agenta-entities/src/workflow/state/molecule.ts
Input port generation now uses groupTemplateVariables to derive grouped ports with optional schemas from sub-paths; trace-derived ephemeral inputs use path-segment-based naming.
Evaluator Configuration
web/packages/agenta-entities/src/runnable/evaluatorTransforms.ts
nestEvaluatorConfiguration now enforces schema-driven allowlist (drops undeclared/hidden keys), properly routes advanced fields into advanced_settings.
Shared Exports
web/packages/agenta-entities/src/runnable/index.ts, web/packages/agenta-shared/src/utils/index.ts
Re-export new template/port validation and grouping utilities for downstream consumption.

Workflow Type Resolution Enhancement

Layer / File(s) Summary
Type Detection Helpers
web/packages/agenta-entities/src/workflow/state/helpers.ts
New resolveEvaluatorWorkflowType uses revision flags then legacy evaluator-key map; deriveWorkflowTypeFromRevision accepts optional {isEvaluator} override and routes classification through evaluator resolver when detected.
App Type Color
web/packages/agenta-entities/src/workflow/core/schema.ts
New APP_TYPE_PRESET map and getAppTypeColor function for deterministic app workflow type coloring.
Evaluator Utils
web/packages/agenta-entities/src/workflow/state/evaluatorUtils.ts
createEvaluatorFromTemplate removes hidden fields (x-ag-type: "hidden") from final parameters for both JSON-Schema and legacy formats.
Store Re-nesting
web/packages/agenta-entities/src/workflow/state/store.ts
Evaluator workflows preserve original flat parameter schema (flatSchemaForReNest) through draft merges for consistent re-nesting that properly excludes hidden/legacy fields.
Exports
web/packages/agenta-entities/src/workflow/*
Re-export deriveWorkflowTypeFromRevision, getAppTypeColor, and related type utilities for UI consumption.

Evaluation Modal Flow Refactoring

Layer / File(s) Summary
Terminology Update
web/oss/tests/playwright/acceptance/*
Tab/step navigation changed from "Variant" to "Revision" in auto-evaluation and human-annotation test flows.
Workflow Selection Component
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx
New component replaces SelectAppSection; manages searchable workflow table with showEvaluators toggle, type filter, and invokableOnly constraint; emits onSelectWorkflow(id, {label, isEvaluator}).
Modal Content & Inner
web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx, web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalInner.tsx
Content now renders SelectWorkflowSection (renamed tab to "Revision"), Inner captures selectedWorkflowMeta for readable display when workflow not in availableApps, updates validation messaging (variant → revision).
Type Updates
web/oss/src/components/pages/evaluations/NewEvaluation/types.ts
onSelectApp callback extended with optional meta: {label?, isEvaluator?} parameter.
Online Evaluation Filtering
web/oss/src/components/pages/evaluations/onlineEvaluation/OnlineEvaluationDrawer.tsx
previewEvaluators now requires both is_feedback === false and has_url === true (added has_url filter).
Variant Section Text
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsx
Empty state text updated to "No available revisions found to display".

App/Workflow Management Filter Model

Layer / File(s) Summary
Filter Atoms
web/oss/src/components/pages/app-management/store/appWorkflowFilterAtoms.ts
New WorkflowTypeFilter union and workflowTypeFilterAtom (default "app"), workflowInvokableOnlyAtom (default false) for filtering listings.
Paginated Store Logic
web/oss/src/components/pages/app-management/store/appWorkflowStore.ts
Store expanded to support dynamic workflow type filtering and invokable-only constraints; EnrichedWorkflow carries derived _derivedType/_workflowKey; client-side pagination when invokableOnly is enabled; row shape includes optional workflowType/workflowKey.
Management Section
web/oss/src/components/pages/app-management/components/ApplicationManagementSection.tsx
Pins workflowTypeFilterAtom to "app" and workflowInvokableOnlyAtom to false on mount via useEffect.
Column Rendering
web/oss/src/components/pages/app-management/components/appWorkflowColumns.tsx
App "Type" cell now renders via WorkflowTypeTag instead of plain Tag.
Store Exports
web/oss/src/components/pages/app-management/store/index.ts
Re-exports workflowPaginatedStore, workflowTypeFilterAtom, workflowInvokableOnlyAtom, and WorkflowTypeFilter type.

Workflow Display Components

Layer / File(s) Summary
Type Tag Component
web/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsx
New component renders evaluator or app type badges with colors and tooltips; reads evaluator templates via Jotai store; uses getEvaluatorColor/getAppTypeColor for styling.
Kind Tag Component
web/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsx
New component renders Evaluator/App kind badge (purple/blue) with optional className.
Module Exports
web/packages/agenta-entity-ui/src/workflow/index.ts, web/packages/agenta-entity-ui/package.json, web/packages/agenta-entity-ui/src/index.ts
New ./workflow subpath export; barrel files re-export WorkflowTypeTag/WorkflowKindTag and props types.

Testcase Editor & Variable Control Enhancement

Layer / File(s) Summary
Testcase Editor Rewrite
web/oss/src/components/Playground/Components/PlaygroundTestcaseEditor.tsx
Replaces EntityDrillInView with bespoke layout: tracks editMode/fieldView, derives existingColumns/suggestedColumns from schema/testcase, introduces NestedFieldEditor for sub-path editing, renders per-column delete actions, supports "Flat" view decomposition via getPortSubPaths, and adds property promotion UI.
Variable Control
web/packages/agenta-playground-ui/src/components/adapters/VariableControlAdapter.tsx
Derives control label/schema from schemaMap (now includes optional name), computes typed JSON defaults from port schema (properties), uses jsonDefault for visual hint only (no store seeding), remounts Lexical editor when schemaKey changes.
Execution Selectors
web/packages/agenta-playground/src/state/execution/selectors.ts, web/packages/agenta-playground/src/state/execution/index.ts, web/packages/agenta-playground/src/state/controllers/executionItemController.ts
New outputPortSchemaMapAtom alongside input schema map; controller exposes both input/output port schema via selectors for editor typeahead and variable control derivation.

Value Extraction & Code Editor Enhancement

Layer / File(s) Summary
Value Coercion
web/packages/agenta-entities/src/shared/execution/valueExtraction.ts
New coerceCellValue helper intelligently JSON-parses strings (heuristic: starts with { or [); extractInputValues return type now Record<string, unknown> with smart unwrapping (preserves typed non-string values instead of stringifying).
Code Editor Language Detection
web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/CodeEditorControl.tsx
Language detection now prioritizes schema x-parameters.language → sibling field reference → rootData.runtime heuristic → fallback "python"; requires drill-in context via useOptionalDrillIn.
Optional Drill-In Hook
web/packages/agenta-entity-ui/src/DrillInView/components/MoleculeDrillInContext.tsx, web/packages/agenta-entity-ui/src/DrillInView/components/index.ts
New useOptionalDrillIn hook returns context value or null (safe usage outside provider) alongside existing useDrillIn.

Trace Drawer & Workflow Revision Drawer

Layer / File(s) Summary
Trace Payload Handling
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/index.tsx, web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
Removed stripNestedSpans sanitization; trace content now uses raw active trace directly for "Raw Data" display and span drill-in.
Trace Type Header
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx
"Open in Playground" now passes stacked: true to workflow revision drawer (prevents trace drawer focus stealing).
Revision Drawer Stacking
web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/store.ts, web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsx
New workflowRevisionDrawerStackedAtom and OpenDrawerParams.stacked flag enable mask blur/focus lock when drawer opens atop another drawer.

Testset Auto-Mapping Simplification

Layer / File(s) Summary
Canonical Paths
web/oss/src/components/SharedDrawers/AddToTestsetDrawer/atoms/drawerState.ts
Introduced canonicalTracePathsAtom returning only "data.inputs" and/or "data.outputs" (envelope-aware, top-level paths); removed leaf-based leafTracePathsAtom and traceDataPathsAtom.
Auto-Mapping Effect
web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts
New effect seeds mappingData with one "create" mapping per canonical path (instead of single sanitized path from first available); uses path's last segment as column name.

Observability Filter Enhancement

Layer / File(s) Summary
Filter Constants
web/oss/src/components/pages/observability/assets/constants.ts
Added "Evaluator" reference filter group with evaluator.id and evaluator.slug leaf filters supporting in / not_in operators.

Sequence Diagram(s)

sequenceDiagram
    participant Editor as Token Editor
    participant PathProvider as PlaygroundTokenPathProvider
    participant NodeProvider as PlaygroundNodeTokenPathProvider
    participant InputSource as useInputsSource
    participant OutputSource as useOutputsSource
    participant ParametersSource as useParametersSource
    participant Testcase as useTestcaseSource
    participant Atom as Jotai Atoms

    Editor->>PathProvider: mount with {{$...}} token input
    PathProvider->>NodeProvider: scope to entity in DAG
    NodeProvider->>Atom: read nodeChainContextAtomFamily
    Atom-->>NodeProvider: {allowedSlots, upstreamEntityId}
    
    NodeProvider->>InputSource: create with scoped inputs schema
    NodeProvider->>OutputSource: create with upstream entity (if exists)
    NodeProvider->>ParametersSource: create with aggregated params
    NodeProvider->>Testcase: create with observed testcases
    
    Editor->>NodeProvider: request suggestions for $.inpu[query]
    NodeProvider->>InputSource: getSuggestions(["inpu"], query)
    InputSource->>Atom: read schemaMap + observedTestcases
    Atom-->>InputSource: merge schema keys + testcase data keys
    InputSource-->>NodeProvider: [label, hint] suggestions
    NodeProvider-->>Editor: render dropdown with suggestions
    
    Editor->>Editor: select suggestion, write token text
    Editor->>PathProvider: token committed to document
Loading
sequenceDiagram
    participant User as User
    participant Modal as NewEvaluationModal
    participant WorkflowSelector as SelectWorkflowSection
    participant Store as appWorkflowStore
    participant Filter as workflowFilterAtoms

    User->>Modal: open evaluation wizard
    Modal->>Filter: set workflowTypeFilterAtom='all'
    Modal->>Filter: set workflowInvokableOnlyAtom=true
    
    Modal->>WorkflowSelector: render workflow picker
    WorkflowSelector->>Store: query workflows with typeFilter + invokableOnly
    Store->>Store: fetch + filter (non-human, has_url=true)
    Store-->>WorkflowSelector: paginated workflow list
    
    User->>WorkflowSelector: toggle 'Show evaluators' on
    WorkflowSelector->>Filter: set workflowTypeFilterAtom='all'
    Store->>Store: refetch including evaluator subtypes
    
    User->>WorkflowSelector: select workflow + capture metadata
    WorkflowSelector->>Modal: onSelectWorkflow(id, {label, isEvaluator})
    Modal->>Modal: store selectedWorkflowMeta for display
    Modal->>Modal: advance to 'Revision' tab
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes


Possibly related PRs

  • [feat] Extend prompt templates #4171 — Both PRs modify SDK resolver context (handlers.py) and how inputs/outputs/prompt envelopes are assembled for LLM execution, including context injection ordering and dual-access variable wiring.
  • feat: frontend archive UI #4229 — Both PRs modify evaluators store (evaluatorsPaginatedStore.ts) including managed evaluator filtering, cache logic, and paginated query parameter updates.
  • [release] v0.98.0 #4251 — Both PRs enhance workflow type resolution helpers (helpers.ts) and evaluator-type classification via legacy key maps and revision flag analysis.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/eval-evaluations

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
web/packages/agenta-entities/src/workflow/state/store.ts (1)

1082-1110: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

flatSchemaForReNest is still nested for local clones.

These branches read flatSchemaForReNest from localData, but local drafts created by createLocalDraftFromWorkflowRevision() are cloned from workflowEntityAtomFamily(...), so their evaluator params/schema are already normalized. In that case this variable never captures the original flat schema with the hidden markers you want to preserve, and the post-draft re-nest can still drop hidden/advanced fields on cloned evaluators after presets are applied. Consider cloning from the flat source (getFlatSourceData) or persisting the original flat schema alongside the local draft before normalizing it.

Also applies to: 1155-1168, 1270-1298, 1343-1356

web/packages/agenta-entities/src/workflow/state/molecule.ts (1)

792-805: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Filter system fields after grouping to avoid leaking reserved nested inputs.

At Line 793, system fields are filtered on raw variable paths before grouping. With dotted paths (which this new grouped flow explicitly supports), a reserved top-level field can slip through (e.g. inputs.context.user_id won’t match context directly). Filter on grouped keys instead.

Suggested fix
-            const vars = extractVariablesFromConfig(params as Record<string, unknown>).filter(
-                (key) => !systemFields.has(key),
-            )
+            const vars = extractVariablesFromConfig(params as Record<string, unknown>)
             if (vars.length > 0) {
                 return groupTemplateVariables(vars)
-                    .filter((group) => group.envelope === "inputs")
+                    .filter(
+                        (group) =>
+                            group.envelope === "inputs" &&
+                            !systemFields.has(group.key),
+                    )
                     .map((group) => ({
                         key: group.key,
                         name: group.name,
                         type: group.type,
                         required: true,
                         ...(group.subPaths ? {schema: buildSubPathSchema(group.subPaths)} : {}),
                     }))
             }
web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx (1)

122-125: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Update gating copy to “workflow” instead of “application.”

These messages now sit behind a workflow selector that can return apps or evaluators, so “application” is misleading.

✏️ Suggested wording tweak
- Select an application first to load this section.
+ Select a workflow first to load this section.

- Please select an application to continue configuring the evaluation.
+ Please select a workflow to continue configuring the evaluation.

Also applies to: 153-156

🧹 Nitpick comments (5)
web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts (1)

183-194: 💤 Low value

Minor: Redundant condition check.

When mappingData.length === 0, isMapColumnExist is guaranteed to be false (since some() on an empty array always returns false). The !isMapColumnExist check is therefore redundant in this condition.

This isn't a bug—the logic works correctly—but simplifying to just canonicalPaths.length > 0 && mappingData.length === 0 makes the intent clearer: seed mappings only when none exist and canonical paths are available.

♻️ Optional simplification
 useEffect(() => {
-    if (!isMapColumnExist && canonicalPaths.length > 0 && mappingData.length === 0) {
+    if (canonicalPaths.length > 0 && mappingData.length === 0) {
         setMappingData(
             canonicalPaths.map((path) => ({
                 id: createMappingId(),
                 data: path,
                 column: "create",
                 newColumn: path.split(".").pop() || path,
             })),
         )
     }
-}, [isMapColumnExist, canonicalPaths, mappingData.length, setMappingData])
+}, [canonicalPaths, mappingData.length, setMappingData])
web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx (1)

185-190: ⚡ Quick win

Add a focused regression test for the stacked-open path.

This path now depends on stacked: true to avoid focus-trap conflicts; a small UI/state test around this call would help prevent subtle regressions.

web/packages/agenta-entities/src/workflow/state/molecule.ts (1)

753-761: ⚡ Quick win

Extract grouped-variable → port mapping into a shared helper.

The same mapping logic appears in two branches (Line 753-761 and Line 797-805). Pulling this into one helper will prevent branch drift and keep future tweaks (e.g., schema hint behavior) consistent.

Also applies to: 797-805

web/oss/src/components/pages/app-management/store/appWorkflowStore.ts (1)

341-371: 💤 Low value

Consider caching filterInvokableWorkflows results to avoid redundant filtering.

Both appWorkflowTotalCountQueryAtom and appWorkflowCountQueryAtom call filterInvokableWorkflows independently when invokableOnly is true, and so does fetchPage. While TanStack Query caches the underlying queryWorkflows response, the bulk revision fetch and filtering logic runs separately each time.

If this becomes a performance concern, consider memoizing the invokable entries or using a shared query atom for the filtered results.

web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx (1)

155-160: ⚡ Quick win

Debounce search before writing into shared workflow filters.

Right now each keypress updates store state and table search deps immediately; this is noisy and causes avoidable refresh churn.

As per coding guidelines, "Debounce search inputs and filters; throttle scroll and resize handlers."

Also applies to: 259-264


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: aa239025-fd5f-42b1-8bea-33c3d3c11434

📥 Commits

Reviewing files that changed from the base of the PR and between bd447c0 and 2ad51fd.

📒 Files selected for processing (78)
  • sdk/agenta/sdk/engines/running/handlers.py
  • web/oss/src/components/Evaluators/Table/assets/evaluatorColumns.tsx
  • web/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsx
  • web/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsx
  • web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsx
  • web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.ts
  • web/oss/src/components/Evaluators/assets/getColumns.tsx
  • web/oss/src/components/Evaluators/assets/types.ts
  • web/oss/src/components/Evaluators/store/evaluatorsPaginatedStore.ts
  • web/oss/src/components/Playground/Components/PlaygroundTestcaseEditor.tsx
  • web/oss/src/components/Playground/Components/PlaygroundVariantConfig/index.tsx
  • web/oss/src/components/Playground/OSSPlaygroundShell.tsx
  • web/oss/src/components/Playground/PlaygroundTokenPath/atoms.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/chainContext.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/index.tsx
  • web/oss/src/components/Playground/PlaygroundTokenPath/sources/inputs.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/sources/outputs.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/sources/parameters.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/sources/shared.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/sources/testcase.ts
  • web/oss/src/components/Playground/PlaygroundTokenPath/types.ts
  • web/oss/src/components/SharedDrawers/AddToTestsetDrawer/atoms/drawerState.ts
  • web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts
  • web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx
  • web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/index.tsx
  • web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
  • web/oss/src/components/pages/app-management/components/ApplicationManagementSection.tsx
  • web/oss/src/components/pages/app-management/components/appWorkflowColumns.tsx
  • web/oss/src/components/pages/app-management/store/appWorkflowFilterAtoms.ts
  • web/oss/src/components/pages/app-management/store/appWorkflowStore.ts
  • web/oss/src/components/pages/app-management/store/index.ts
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalInner.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/types.ts
  • web/oss/src/components/pages/evaluations/onlineEvaluation/OnlineEvaluationDrawer.tsx
  • web/oss/src/components/pages/observability/assets/constants.ts
  • web/oss/tests/playwright/acceptance/auto-evaluation/index.ts
  • web/oss/tests/playwright/acceptance/auto-evaluation/tests.ts
  • web/oss/tests/playwright/acceptance/human-annotation/index.ts
  • web/oss/tests/playwright/acceptance/human-annotation/tests.ts
  • web/packages/agenta-entities/src/runnable/evaluatorTransforms.ts
  • web/packages/agenta-entities/src/runnable/index.ts
  • web/packages/agenta-entities/src/runnable/portHelpers.ts
  • web/packages/agenta-entities/src/shared/execution/valueExtraction.ts
  • web/packages/agenta-entities/src/workflow/core/index.ts
  • web/packages/agenta-entities/src/workflow/core/schema.ts
  • web/packages/agenta-entities/src/workflow/index.ts
  • web/packages/agenta-entities/src/workflow/state/evaluatorUtils.ts
  • web/packages/agenta-entities/src/workflow/state/helpers.ts
  • web/packages/agenta-entities/src/workflow/state/index.ts
  • web/packages/agenta-entities/src/workflow/state/molecule.ts
  • web/packages/agenta-entities/src/workflow/state/store.ts
  • web/packages/agenta-entity-ui/package.json
  • web/packages/agenta-entity-ui/src/DrillInView/SchemaControls/CodeEditorControl.tsx
  • web/packages/agenta-entity-ui/src/DrillInView/components/MoleculeDrillInContext.tsx
  • web/packages/agenta-entity-ui/src/DrillInView/components/index.ts
  • web/packages/agenta-entity-ui/src/index.ts
  • web/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsx
  • web/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsx
  • web/packages/agenta-entity-ui/src/workflow/index.ts
  • web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsx
  • web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/store.ts
  • web/packages/agenta-playground-ui/src/components/adapters/VariableControlAdapter.tsx
  • web/packages/agenta-playground/src/state/controllers/executionItemController.ts
  • web/packages/agenta-playground/src/state/execution/index.ts
  • web/packages/agenta-playground/src/state/execution/selectors.ts
  • web/packages/agenta-shared/src/utils/index.ts
  • web/packages/agenta-shared/src/utils/templateVariable.ts
  • web/packages/agenta-ui/src/Editor/index.ts
  • web/packages/agenta-ui/src/Editor/plugins/code/index.tsx
  • web/packages/agenta-ui/src/Editor/plugins/token/TokenNode.ts
  • web/packages/agenta-ui/src/Editor/plugins/token/TokenPathSuggestionsContext.tsx
  • web/packages/agenta-ui/src/Editor/plugins/token/TokenTooltipPlugin.tsx
  • web/packages/agenta-ui/src/Editor/plugins/token/TokenTypeaheadPlugin.tsx
  • web/packages/agenta-ui/src/Editor/plugins/token/extensions/tokenBehavior.tsx
💤 Files with no reviewable changes (8)
  • web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.ts
  • web/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsx
  • web/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsx
  • web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsx
  • web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsx
  • web/oss/src/components/Evaluators/assets/types.ts
  • web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
  • web/oss/src/components/Evaluators/assets/getColumns.tsx

Comment on lines +1086 to 1091
if inputs is not None:
context.update(**inputs)
context.update(
**{
"prediction": outputs,
"outputs": outputs,
"inputs": inputs,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Reserve or protect the synthetic inputs envelope key.

These changes overwrite any real caller-supplied field named inputs with the synthetic envelope object. After that, root-level prompt references to inputs no longer see the user value, which is a breaking behavior change unless "inputs" is now a reserved field name everywhere. Please either reject reserved names up front or preserve the original field before injecting the envelope.

Also applies to: 2211-2218, 2283-2290

Comment on lines +262 to 266
(value: string, meta?: {label?: string; isEvaluator?: boolean}) => {
if (value === selectedAppId) return
setSelectedAppId(value)
setSelectedWorkflowMeta(value ? (meta ?? null) : null)
setSelectedTestsetId("")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Allow metadata updates even when the selected workflow ID is unchanged.

The early return on same value skips setSelectedWorkflowMeta, so label/kind fallback can remain stale.

🔧 Minimal fix
-        (value: string, meta?: {label?: string; isEvaluator?: boolean}) => {
-            if (value === selectedAppId) return
+        (value: string, meta?: {label?: string; isEvaluator?: boolean}) => {
+            if (value === selectedAppId) {
+                if (value && meta) setSelectedWorkflowMeta(meta)
+                return
+            }
             setSelectedAppId(value)
             setSelectedWorkflowMeta(value ? (meta ?? null) : null)

Comment on lines +174 to +179
const onSelectRow = useCallback(
(selectedRowKeys: React.Key[]) => {
if (disabled) return
const selectedId = selectedRowKeys[0] as string | undefined
if (!selectedId) {
onSelectWorkflow("")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix single-selection behavior: current checkbox model can keep the old selection.

This picker stores a single selectedWorkflowId, but checkbox mode sends multi-key arrays and you always take the first key. Selecting another row can keep the previous ID selected.

✅ Suggested fix (radio selection)
-    const onSelectRow = useCallback(
-        (selectedRowKeys: React.Key[]) => {
+    const onSelectRow = useCallback(
+        (selectedRowKeys: React.Key[]) => {
             if (disabled) return
-            const selectedId = selectedRowKeys[0] as string | undefined
+            const selectedId = selectedRowKeys.at(-1) as string | undefined
             if (!selectedId) {
                 onSelectWorkflow("")
                 return
             }
             const row = tableRows.find((r) => r.workflowId === selectedId)
             onSelectWorkflow(selectedId, {
                 label: row?.name,
                 isEvaluator: row?.isEvaluator,
             })
         },
         [disabled, onSelectWorkflow, tableRows],
     )

     const rowSelection = useMemo(
         () => ({
-            type: "checkbox" as const,
+            type: "radio" as const,
             selectedRowKeys: selectedWorkflowId ? [selectedWorkflowId] : [],
             onChange: (keys: React.Key[]) => onSelectRow(keys),
             getCheckboxProps: () => ({disabled}),
             selectOnRowClick: !disabled,
         }),
         [selectedWorkflowId, onSelectRow, disabled],
     )

Also applies to: 191-199

Comment on lines 75 to +79
const previewEvaluators = useMemo(
() => (evaluators || []).filter((e) => e.flags?.is_feedback !== true),
() =>
(evaluators || []).filter(
(e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,
),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

has_url-only filtering can hide valid online evaluators

On Line 78, requiring e.flags?.has_url === true is narrower than the platform’s online-capable evaluator logic (e.g., code/LLM/handler-backed evaluators). This can remove valid options and block live-eval creation.

Suggested fix
-import {
+import {
     evaluatorConfigRevisionsListDataAtom,
     evaluatorConfigRevisionsQueryStateAtom,
     evaluatorTemplatesDataAtom,
     evaluatorTemplatesQueryAtom,
+    isOnlineCapableEvaluator,
 } from "@agenta/entities/workflow"
@@
-    const previewEvaluators = useMemo(
-        () =>
-            (evaluators || []).filter(
-                (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,
-            ),
-        [evaluators],
-    )
+    const previewEvaluators = useMemo(
+        () =>
+            (evaluators || []).filter(
+                (e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e),
+            ),
+        [evaluators],
+    )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const previewEvaluators = useMemo(
() => (evaluators || []).filter((e) => e.flags?.is_feedback !== true),
() =>
(evaluators || []).filter(
(e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,
),
import {
evaluatorConfigRevisionsListDataAtom,
evaluatorConfigRevisionsQueryStateAtom,
evaluatorTemplatesDataAtom,
evaluatorTemplatesQueryAtom,
isOnlineCapableEvaluator,
} from "@agenta/entities/workflow"
const previewEvaluators = useMemo(
() =>
(evaluators || []).filter(
(e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e),
),
[evaluators],
)

Comment on lines +85 to +117
const {value, isParsable} = useMemo(() => {
if (!parentRaw) return {value: "", isParsable: true}
try {
const parsed = JSON.parse(parentRaw) as unknown
if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
const raw = (parsed as Record<string, unknown>)[subPath]
return {value: raw == null ? "" : String(raw), isParsable: true}
}
return {value: "", isParsable: false}
} catch {
return {value: "", isParsable: false}
}
}, [parentRaw, subPath])

const handleChange = useCallback(
(nextVal: string) => {
let parsed: Record<string, unknown> = {}
if (parentRaw) {
try {
const p = JSON.parse(parentRaw) as unknown
if (p && typeof p === "object" && !Array.isArray(p)) {
parsed = {...(p as Record<string, unknown>)}
}
} catch {
// non-JSON parent — start fresh; overwrite handled by isParsable gate
}
}
parsed[subPath] = nextVal
setCellValue({
testcaseId,
column: parentKey,
value: JSON.stringify(parsed),
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Walk dotted sub-paths instead of indexing them literally.

Line 90 and Line 112 treat subPath as a single property name, but getPortSubPaths() is intentionally allowed to return multi-segment hints like a.b.c. In flat view that reads/writes {"a.b.c": ...} instead of traversing a -> b -> c, so nested fields are edited incorrectly.

Suggested fix
+const splitPath = (path: string) => path.split(".").filter(Boolean)
+
+function getValueAtPath(obj: Record<string, unknown>, path: string): unknown {
+    return splitPath(path).reduce<unknown>((acc, segment) => {
+        return acc && typeof acc === "object" && !Array.isArray(acc)
+            ? (acc as Record<string, unknown>)[segment]
+            : undefined
+    }, obj)
+}
+
+function setValueAtPath(obj: Record<string, unknown>, path: string, value: string) {
+    const segments = splitPath(path)
+    let cursor = obj
+
+    for (const segment of segments.slice(0, -1)) {
+        const next = cursor[segment]
+        cursor[segment] =
+            next && typeof next === "object" && !Array.isArray(next)
+                ? {...(next as Record<string, unknown>)}
+                : {}
+        cursor = cursor[segment] as Record<string, unknown>
+    }
+
+    cursor[segments[segments.length - 1]!] = value
+}
+
 const {value, isParsable} = useMemo(() => {
     if (!parentRaw) return {value: "", isParsable: true}
     try {
         const parsed = JSON.parse(parentRaw) as unknown
         if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
-            const raw = (parsed as Record<string, unknown>)[subPath]
+            const raw = getValueAtPath(parsed as Record<string, unknown>, subPath)
             return {value: raw == null ? "" : String(raw), isParsable: true}
         }
         return {value: "", isParsable: false}
@@
-    parsed[subPath] = nextVal
+    setValueAtPath(parsed, subPath, nextVal)
     setCellValue({
         testcaseId,
         column: parentKey,
         value: JSON.stringify(parsed),
     })

Comment on lines +79 to +80
const explicit = xParams?.language as string | undefined
if (explicit) return explicit
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate the explicit schema language before returning it.

Only the dynamic candidates are normalized and checked. An explicit value like "Python" or a typo bypasses the guard and gets cast into SharedEditor, which skips the fallback and can leave syntax highlighting in an undefined state.

Suggested fix
-        const explicit = xParams?.language as string | undefined
-        if (explicit) return explicit
+        const explicit =
+            typeof xParams?.language === "string" ? xParams.language.toLowerCase() : undefined
+        if (explicit && isSupportedLanguage(explicit)) return explicit
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const explicit = xParams?.language as string | undefined
if (explicit) return explicit
const explicit =
typeof xParams?.language === "string" ? xParams.language.toLowerCase() : undefined
if (explicit && isSupportedLanguage(explicit)) return explicit

Comment on lines +174 to +186
const jsonDefault = useMemo(() => {
if (portType === "array") return "[]"
const props =
portSchema && typeof portSchema === "object"
? (portSchema as {properties?: Record<string, unknown>}).properties
: null
if (!props || typeof props !== "object") return "{}"
const keys = Object.keys(props)
if (keys.length === 0) return "{}"
const obj: Record<string, string> = {}
for (const k of keys) obj[k] = ""
return JSON.stringify(obj, null, 2)
}, [portType, portSchema])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t render a JSON value that won’t be submitted.

These lines make empty object/array cells look populated, but the underlying testcase cell stays "" and the comment below confirms the run payload remains empty. In this flow that is pretty misleading: evaluator inputs can appear present in the editor and still resolve as missing at execution time. Either persist the seed before execution, or surface the schema-derived shape as placeholder/help text instead of value.

Also applies to: 221-223, 239-249

Comment on lines +895 to 903
if (language === "json" || language === "yaml") {
const currentParsed = safeJson5Parse(currentEditorContent)
const incomingParsed = safeJson5Parse(initialValue)
if (!isEqual(currentParsed, incomingParsed)) {
needsDispatch = true
// Force only when we're replacing real content; on first
// populate, the handler should run normally.
forceUpdate = hasExistingContent
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Inspect safeJson5Parse implementation and callers:"
rg -n -C4 '\bsafeJson5Parse\b' --glob '*.{ts,tsx,js,jsx}'

echo
echo "Inspect the YAML equality gate in the editor plugin:"
sed -n '895,903p' web/packages/agenta-ui/src/Editor/plugins/code/index.tsx

Repository: Agenta-AI/agenta

Length of output: 8839


🏁 Script executed:

# Check the implementation of tryParsePartialJson
rg -n -A 10 'export function tryParsePartialJson' web/packages/agenta-shared/src/utils/jsonParsing.ts

Repository: Agenta-AI/agenta

Length of output: 540


🏁 Script executed:

# Check the imports in the editor plugin file
head -n 30 web/packages/agenta-ui/src/Editor/plugins/code/index.tsx

Repository: Agenta-AI/agenta

Length of output: 923


🏁 Script executed:

# Get the full implementation of tryParsePartialJson
sed -n '32,80p' web/packages/agenta-shared/src/utils/jsonParsing.ts

Repository: Agenta-AI/agenta

Length of output: 2007


Use a YAML parser in the equality gate.

This branch treats YAML like JSON5. When YAML strings that aren't valid JSON fail safeJson5Parse(), both fallback to tryParsePartialJson(), which applies JSON heuristics and returns null for unparseable content. Two different YAML documents can both return null, making isEqual() treat them as identical and skip needsDispatch. External YAML updates never reach the editor. Parse YAML with yaml.load() instead, and fall back to string comparison when parsing fails.

Suggested fix
-            if (language === "json" || language === "yaml") {
+            if (language === "json") {
                 const currentParsed = safeJson5Parse(currentEditorContent)
                 const incomingParsed = safeJson5Parse(initialValue)
                 if (!isEqual(currentParsed, incomingParsed)) {
                     needsDispatch = true
-                    // Force only when we're replacing real content; on first
-                    // populate, the handler should run normally.
                     forceUpdate = hasExistingContent
                 }
                 return
             }
+
+            if (language === "yaml") {
+                let currentParsed: unknown = currentEditorContent?.trim() ?? ""
+                let incomingParsed: unknown = initialValue?.trim() ?? ""
+
+                try {
+                    currentParsed = yaml.load(currentEditorContent)
+                } catch {}
+
+                try {
+                    incomingParsed = yaml.load(initialValue)
+                } catch {}
+
+                if (!isEqual(currentParsed, incomingParsed)) {
+                    needsDispatch = true
+                    forceUpdate = hasExistingContent
+                }
+                return
+            }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (language === "json" || language === "yaml") {
const currentParsed = safeJson5Parse(currentEditorContent)
const incomingParsed = safeJson5Parse(initialValue)
if (!isEqual(currentParsed, incomingParsed)) {
needsDispatch = true
// Force only when we're replacing real content; on first
// populate, the handler should run normally.
forceUpdate = hasExistingContent
}
if (language === "json") {
const currentParsed = safeJson5Parse(currentEditorContent)
const incomingParsed = safeJson5Parse(initialValue)
if (!isEqual(currentParsed, incomingParsed)) {
needsDispatch = true
forceUpdate = hasExistingContent
}
return
}
if (language === "yaml") {
let currentParsed: unknown = currentEditorContent?.trim() ?? ""
let incomingParsed: unknown = initialValue?.trim() ?? ""
try {
currentParsed = yaml.load(currentEditorContent)
} catch {}
try {
incomingParsed = yaml.load(initialValue)
} catch {}
if (!isEqual(currentParsed, incomingParsed)) {
needsDispatch = true
forceUpdate = hasExistingContent
}
return
}

Comment on lines 87 to +96
updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {
const text = this.getTextContent()
if (text !== dom.textContent) {
dom.textContent = text
return true
}
return false
const textChanged = text !== dom.textContent
if (textChanged) dom.textContent = text
// Re-apply styles even when text didn't change at this exact node —
// edits to a nearby path segment can change validity without
// changing this node's own text, but when it DOES change we also
// need to re-evaluate because the new text may be valid/invalid.
applyTokenStyles(dom, text)
return textChanged
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

In Lexical 0.40, what does a custom TextNode.updateDOM(...) returning true mean? If the node already updates textContent/styles in place, should updateDOM return false to avoid DOM replacement?

💡 Result:

In Lexical 0.40, a custom TextNode.updateDOM(...) returning true signals to Lexical that the DOM element needs to be replaced with a new one created by createDOM. Returning false indicates that the existing DOM element was successfully updated in place, avoiding replacement. Yes, if the node already updates textContent/styles in place (e.g., via super.updateDOM and additional style changes), it should return false to prevent unnecessary DOM replacement and preserve performance/DOM state like selection anchors.

Citations:


Return false from updateDOM() after mutating the span in place.

This method already updates textContent and styles directly on the DOM element. Returning textChanged signals Lexical to recreate the DOM node on text edits, which breaks selection/hover continuity and defeats the in-place update optimization. Per Lexical's node contract, return false to indicate the DOM was successfully updated in place.

Suggested fix
     updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {
         const text = this.getTextContent()
         const textChanged = text !== dom.textContent
         if (textChanged) dom.textContent = text
         // Re-apply styles even when text didn't change at this exact node —
         // edits to a nearby path segment can change validity without
         // changing this node's own text, but when it DOES change we also
         // need to re-evaluate because the new text may be valid/invalid.
         applyTokenStyles(dom, text)
-        return textChanged
+        return false
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {
const text = this.getTextContent()
if (text !== dom.textContent) {
dom.textContent = text
return true
}
return false
const textChanged = text !== dom.textContent
if (textChanged) dom.textContent = text
// Re-apply styles even when text didn't change at this exact node —
// edits to a nearby path segment can change validity without
// changing this node's own text, but when it DOES change we also
// need to re-evaluate because the new text may be valid/invalid.
applyTokenStyles(dom, text)
return textChanged
updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {
const text = this.getTextContent()
const textChanged = text !== dom.textContent
if (textChanged) dom.textContent = text
// Re-apply styles even when text didn't change at this exact node —
// edits to a nearby path segment can change validity without
// changing this node's own text, but when it DOES change we also
// need to re-evaluate because the new text may be valid/invalid.
applyTokenStyles(dom, text)
return false
}

Comment on lines +251 to 257
node.setTextContent(suggestion.tokenText)
// Position cursor just before the closing `}}` so the user
// can keep typing (e.g. drill further into a path).
navigateCursor({
nodeKey: node.getKey(),
offset: node.getTextContent().length,
offset: suggestion.tokenText.length - 2,
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Selecting a suggestion in the middle of a token drops the suffix.

Line 285 opens the menu anywhere inside {{...}}, but Line 251 rewrites the entire token to suggestion.tokenText. If the caret sits in the middle of {{$.inputs.country}}, accepting a suggestion replaces everything after the caret as well. The safe fix is to keep autocomplete end-of-token only until the replacement logic becomes caret-aware.

Suggested fix
-                    if (match && offsetPos >= 2 && offsetPos <= text.length - 2) {
+                    if (match && offsetPos === text.length - 2) {
                         setInputQuery(match[1])
                         const dom = editor.getElementByKey(node.getKey())
                         if (dom) {

Also applies to: 285-286

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Frontend lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants