[Frontend Feat] Enable running evaluations on evaluators#4237
[Frontend Feat] Enable running evaluations on evaluators#4237
Conversation
…nd unify workflow store for apps and evaluators
- Add `transformAgDataForTestset` to unwrap nested subject data from evaluator annotation spans, stripping internal bookkeeping keys and reshaping to `{inputs, outputs, score}` for testset compatibility
- Detect evaluator annotation spans via `trace_type === "annotation"` and `is_evaluator` flag or evaluator references
- Apply transformation to both live and original data for consistent
…cture - Remove `transformAgDataForTestset` and evaluator annotation span detection logic that unwrapped nested subject data - Replace `leafTracePathsAtom` and `traceDataPathsAtom` with `canonicalTracePathsAtom` that returns only `data.inputs` and `data.outputs` when present - Update auto-mapping to seed from canonical envelope paths instead of all leaf paths - Preserve evaluator annotation spans' nested structure in testcases to maintain replay
…chat handlers
- Expose inputs both flattened at root (e.g., `{{country}}`) and nested under `inputs` key (e.g., `{{$.inputs.country}}`) for consistent JSONPath resolution across completion, chat, and evaluator prompts
- Clear `input_keys` after validation to prevent `PromptTemplate.format()` from rejecting the injected envelope meta-key as extra
- Exclude `messages` from nested view in chat handler since it is handled separately
…yground - Wrap `PlaygroundConfigSection` in `PlaygroundNodeTokenPathProvider` to scope suggestions to the node's chain context - Split token path provider into global (inputs-only) and scoped (adds outputs when upstream exists) variants - Update `useInputsSource` to accept `scopedEntityId` and read entity-specific input ports when provided - Update `useOutputsSource` to accept `upstreamEntityId` and read upstream output ports directly
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
…traction to prevent ReDoS
- Replace regex-based extraction with prefix/suffix checks and string slicing
- Prevents polynomial backtracking on adversarial inputs (addresses CodeQL js/redos)
- Handle Jinja whitespace control (`{%-` / `-%}`) via explicit slice trimming
- Maintain identical behavior for `{{...}}`, `{%...%}`, and `{#...#}` wrappers
Replace legacy fallback mode with first-class flat token support. Both `{{country}}` and `{{$.inputs.country}}` now use the same suggestion pipeline, with flat tokens implicitly drilling into `$.inputs.*` to surface testcase columns and port schemas.
Wrap type pills in tooltips to show full labels on hover and apply max-width truncation to prevent long evaluator names (e.g., `__main__.MyEvaluator`) from breaking column layouts.
Add `is_managed` flag checks to exclude user-deployed Python evaluators (e.g., `user:custom:__main__.MyEval:latest`) from the Evaluators page. These SDK-registered evaluators aren't first-class catalog entries and should only appear in workflow execution contexts. Apply filter at both workflow-id cache level and revision query level for defense-in-depth.
…ndered nested config
Implement multi-tier language detection for CodeEditorControl: explicit schema override → sibling field lookup via `languageFromField` → heuristic on `runtime` field → python fallback. Add `useOptionalDrillIn` hook to safely access drill-in context from shared controls that may render outside the provider tree.
Railway Preview Environment
|
|
Thanks @ardaerzin I think the filtering view could be improved. The number of tabs is very large and always overflow. My comments:
I have created some designs that might be helpful CleanShot.2026-04-30.at.13.37.20.mp4CleanShot.2026-04-30.at.13.38.13.mp4 |
@mmabrouk here's the approach I took
|
|
Thanks @ardaerzin looks fine to me (although redundant imo). |
|
@mmabrouk there's a remove option after a selection is made
|
|
Why not have it simply as an option like in the design? Feels like less obvious. Sorry too nitpicky |
Replace `null` placeholder + clear button pattern with an explicit "All types" first option in the workflow type filter Select. Simplifies state handling by eliminating nullable semantics and ensures the reset path is always visible as a clickable option rather than requiring hover to reveal a clear button.
@mmabrouk fair enough. pushed this change
|
|
Can I be nitpiky again and say that we should remove the padding for the subitems otherwise it looks weird |
…orkflow type filter Remove extra left indent from Ant Design's grouped select options to ensure visual alignment with the ungrouped "All types" entry. The default `.ant-select-item-option-grouped` padding creates unintended hierarchy when only one or two groups are visible.
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis pull request introduces a comprehensive token path suggestion system for the playground editor, refactors evaluator component display using shared ChangesSDK Resolver Context & Evaluator Envelope Handling
Evaluator Component Display Consolidation
Token Path Suggestion System
Template Variable Validation & Port Grouping
Workflow Type Resolution Enhancement
Evaluation Modal Flow Refactoring
App/Workflow Management Filter Model
Workflow Display Components
Testcase Editor & Variable Control Enhancement
Value Extraction & Code Editor Enhancement
Trace Drawer & Workflow Revision Drawer
Testset Auto-Mapping Simplification
Observability Filter Enhancement
Sequence Diagram(s)sequenceDiagram
participant Editor as Token Editor
participant PathProvider as PlaygroundTokenPathProvider
participant NodeProvider as PlaygroundNodeTokenPathProvider
participant InputSource as useInputsSource
participant OutputSource as useOutputsSource
participant ParametersSource as useParametersSource
participant Testcase as useTestcaseSource
participant Atom as Jotai Atoms
Editor->>PathProvider: mount with {{$...}} token input
PathProvider->>NodeProvider: scope to entity in DAG
NodeProvider->>Atom: read nodeChainContextAtomFamily
Atom-->>NodeProvider: {allowedSlots, upstreamEntityId}
NodeProvider->>InputSource: create with scoped inputs schema
NodeProvider->>OutputSource: create with upstream entity (if exists)
NodeProvider->>ParametersSource: create with aggregated params
NodeProvider->>Testcase: create with observed testcases
Editor->>NodeProvider: request suggestions for $.inpu[query]
NodeProvider->>InputSource: getSuggestions(["inpu"], query)
InputSource->>Atom: read schemaMap + observedTestcases
Atom-->>InputSource: merge schema keys + testcase data keys
InputSource-->>NodeProvider: [label, hint] suggestions
NodeProvider-->>Editor: render dropdown with suggestions
Editor->>Editor: select suggestion, write token text
Editor->>PathProvider: token committed to document
sequenceDiagram
participant User as User
participant Modal as NewEvaluationModal
participant WorkflowSelector as SelectWorkflowSection
participant Store as appWorkflowStore
participant Filter as workflowFilterAtoms
User->>Modal: open evaluation wizard
Modal->>Filter: set workflowTypeFilterAtom='all'
Modal->>Filter: set workflowInvokableOnlyAtom=true
Modal->>WorkflowSelector: render workflow picker
WorkflowSelector->>Store: query workflows with typeFilter + invokableOnly
Store->>Store: fetch + filter (non-human, has_url=true)
Store-->>WorkflowSelector: paginated workflow list
User->>WorkflowSelector: toggle 'Show evaluators' on
WorkflowSelector->>Filter: set workflowTypeFilterAtom='all'
Store->>Store: refetch including evaluator subtypes
User->>WorkflowSelector: select workflow + capture metadata
WorkflowSelector->>Modal: onSelectWorkflow(id, {label, isEvaluator})
Modal->>Modal: store selectedWorkflowMeta for display
Modal->>Modal: advance to 'Revision' tab
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 15
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
web/packages/agenta-entities/src/workflow/state/store.ts (1)
1082-1110:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
flatSchemaForReNestis still nested for local clones.These branches read
flatSchemaForReNestfromlocalData, but local drafts created bycreateLocalDraftFromWorkflowRevision()are cloned fromworkflowEntityAtomFamily(...), so their evaluator params/schema are already normalized. In that case this variable never captures the original flat schema with the hidden markers you want to preserve, and the post-draft re-nest can still drop hidden/advanced fields on cloned evaluators after presets are applied. Consider cloning from the flat source (getFlatSourceData) or persisting the original flat schema alongside the local draft before normalizing it.Also applies to: 1155-1168, 1270-1298, 1343-1356
web/packages/agenta-entities/src/workflow/state/molecule.ts (1)
792-805:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFilter system fields after grouping to avoid leaking reserved nested inputs.
At Line 793, system fields are filtered on raw variable paths before grouping. With dotted paths (which this new grouped flow explicitly supports), a reserved top-level field can slip through (e.g.
inputs.context.user_idwon’t matchcontextdirectly). Filter on grouped keys instead.Suggested fix
- const vars = extractVariablesFromConfig(params as Record<string, unknown>).filter( - (key) => !systemFields.has(key), - ) + const vars = extractVariablesFromConfig(params as Record<string, unknown>) if (vars.length > 0) { return groupTemplateVariables(vars) - .filter((group) => group.envelope === "inputs") + .filter( + (group) => + group.envelope === "inputs" && + !systemFields.has(group.key), + ) .map((group) => ({ key: group.key, name: group.name, type: group.type, required: true, ...(group.subPaths ? {schema: buildSubPathSchema(group.subPaths)} : {}), })) }web/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsx (1)
122-125:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUpdate gating copy to “workflow” instead of “application.”
These messages now sit behind a workflow selector that can return apps or evaluators, so “application” is misleading.
✏️ Suggested wording tweak
- Select an application first to load this section. + Select a workflow first to load this section. - Please select an application to continue configuring the evaluation. + Please select a workflow to continue configuring the evaluation.Also applies to: 153-156
🧹 Nitpick comments (5)
web/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.ts (1)
183-194: 💤 Low valueMinor: Redundant condition check.
When
mappingData.length === 0,isMapColumnExistis guaranteed to befalse(sincesome()on an empty array always returnsfalse). The!isMapColumnExistcheck is therefore redundant in this condition.This isn't a bug—the logic works correctly—but simplifying to just
canonicalPaths.length > 0 && mappingData.length === 0makes the intent clearer: seed mappings only when none exist and canonical paths are available.♻️ Optional simplification
useEffect(() => { - if (!isMapColumnExist && canonicalPaths.length > 0 && mappingData.length === 0) { + if (canonicalPaths.length > 0 && mappingData.length === 0) { setMappingData( canonicalPaths.map((path) => ({ id: createMappingId(), data: path, column: "create", newColumn: path.split(".").pop() || path, })), ) } -}, [isMapColumnExist, canonicalPaths, mappingData.length, setMappingData]) +}, [canonicalPaths, mappingData.length, setMappingData])web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsx (1)
185-190: ⚡ Quick winAdd a focused regression test for the stacked-open path.
This path now depends on
stacked: trueto avoid focus-trap conflicts; a small UI/state test around this call would help prevent subtle regressions.web/packages/agenta-entities/src/workflow/state/molecule.ts (1)
753-761: ⚡ Quick winExtract grouped-variable → port mapping into a shared helper.
The same mapping logic appears in two branches (Line 753-761 and Line 797-805). Pulling this into one helper will prevent branch drift and keep future tweaks (e.g., schema hint behavior) consistent.
Also applies to: 797-805
web/oss/src/components/pages/app-management/store/appWorkflowStore.ts (1)
341-371: 💤 Low valueConsider caching
filterInvokableWorkflowsresults to avoid redundant filtering.Both
appWorkflowTotalCountQueryAtomandappWorkflowCountQueryAtomcallfilterInvokableWorkflowsindependently wheninvokableOnlyis true, and so doesfetchPage. While TanStack Query caches the underlyingqueryWorkflowsresponse, the bulk revision fetch and filtering logic runs separately each time.If this becomes a performance concern, consider memoizing the invokable entries or using a shared query atom for the filtered results.
web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx (1)
155-160: ⚡ Quick winDebounce search before writing into shared workflow filters.
Right now each keypress updates store state and table search deps immediately; this is noisy and causes avoidable refresh churn.
As per coding guidelines, "Debounce search inputs and filters; throttle scroll and resize handlers."
Also applies to: 259-264
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: aa239025-fd5f-42b1-8bea-33c3d3c11434
📒 Files selected for processing (78)
sdk/agenta/sdk/engines/running/handlers.pyweb/oss/src/components/Evaluators/Table/assets/evaluatorColumns.tsxweb/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsxweb/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsxweb/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsxweb/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.tsweb/oss/src/components/Evaluators/assets/getColumns.tsxweb/oss/src/components/Evaluators/assets/types.tsweb/oss/src/components/Evaluators/store/evaluatorsPaginatedStore.tsweb/oss/src/components/Playground/Components/PlaygroundTestcaseEditor.tsxweb/oss/src/components/Playground/Components/PlaygroundVariantConfig/index.tsxweb/oss/src/components/Playground/OSSPlaygroundShell.tsxweb/oss/src/components/Playground/PlaygroundTokenPath/atoms.tsweb/oss/src/components/Playground/PlaygroundTokenPath/chainContext.tsweb/oss/src/components/Playground/PlaygroundTokenPath/index.tsxweb/oss/src/components/Playground/PlaygroundTokenPath/sources/inputs.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/outputs.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/parameters.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/shared.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/testcase.tsweb/oss/src/components/Playground/PlaygroundTokenPath/types.tsweb/oss/src/components/SharedDrawers/AddToTestsetDrawer/atoms/drawerState.tsweb/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.tsweb/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsxweb/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/index.tsxweb/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.tsweb/oss/src/components/pages/app-management/components/ApplicationManagementSection.tsxweb/oss/src/components/pages/app-management/components/appWorkflowColumns.tsxweb/oss/src/components/pages/app-management/store/appWorkflowFilterAtoms.tsweb/oss/src/components/pages/app-management/store/appWorkflowStore.tsweb/oss/src/components/pages/app-management/store/index.tsweb/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalInner.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/types.tsweb/oss/src/components/pages/evaluations/onlineEvaluation/OnlineEvaluationDrawer.tsxweb/oss/src/components/pages/observability/assets/constants.tsweb/oss/tests/playwright/acceptance/auto-evaluation/index.tsweb/oss/tests/playwright/acceptance/auto-evaluation/tests.tsweb/oss/tests/playwright/acceptance/human-annotation/index.tsweb/oss/tests/playwright/acceptance/human-annotation/tests.tsweb/packages/agenta-entities/src/runnable/evaluatorTransforms.tsweb/packages/agenta-entities/src/runnable/index.tsweb/packages/agenta-entities/src/runnable/portHelpers.tsweb/packages/agenta-entities/src/shared/execution/valueExtraction.tsweb/packages/agenta-entities/src/workflow/core/index.tsweb/packages/agenta-entities/src/workflow/core/schema.tsweb/packages/agenta-entities/src/workflow/index.tsweb/packages/agenta-entities/src/workflow/state/evaluatorUtils.tsweb/packages/agenta-entities/src/workflow/state/helpers.tsweb/packages/agenta-entities/src/workflow/state/index.tsweb/packages/agenta-entities/src/workflow/state/molecule.tsweb/packages/agenta-entities/src/workflow/state/store.tsweb/packages/agenta-entity-ui/package.jsonweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/CodeEditorControl.tsxweb/packages/agenta-entity-ui/src/DrillInView/components/MoleculeDrillInContext.tsxweb/packages/agenta-entity-ui/src/DrillInView/components/index.tsweb/packages/agenta-entity-ui/src/index.tsweb/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsxweb/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsxweb/packages/agenta-entity-ui/src/workflow/index.tsweb/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsxweb/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/store.tsweb/packages/agenta-playground-ui/src/components/adapters/VariableControlAdapter.tsxweb/packages/agenta-playground/src/state/controllers/executionItemController.tsweb/packages/agenta-playground/src/state/execution/index.tsweb/packages/agenta-playground/src/state/execution/selectors.tsweb/packages/agenta-shared/src/utils/index.tsweb/packages/agenta-shared/src/utils/templateVariable.tsweb/packages/agenta-ui/src/Editor/index.tsweb/packages/agenta-ui/src/Editor/plugins/code/index.tsxweb/packages/agenta-ui/src/Editor/plugins/token/TokenNode.tsweb/packages/agenta-ui/src/Editor/plugins/token/TokenPathSuggestionsContext.tsxweb/packages/agenta-ui/src/Editor/plugins/token/TokenTooltipPlugin.tsxweb/packages/agenta-ui/src/Editor/plugins/token/TokenTypeaheadPlugin.tsxweb/packages/agenta-ui/src/Editor/plugins/token/extensions/tokenBehavior.tsx
💤 Files with no reviewable changes (8)
- web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.ts
- web/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsx
- web/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsx
- web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsx
- web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsx
- web/oss/src/components/Evaluators/assets/types.ts
- web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
- web/oss/src/components/Evaluators/assets/getColumns.tsx
| if inputs is not None: | ||
| context.update(**inputs) | ||
| context.update( | ||
| **{ | ||
| "prediction": outputs, | ||
| "outputs": outputs, | ||
| "inputs": inputs, | ||
| } |
There was a problem hiding this comment.
Reserve or protect the synthetic inputs envelope key.
These changes overwrite any real caller-supplied field named inputs with the synthetic envelope object. After that, root-level prompt references to inputs no longer see the user value, which is a breaking behavior change unless "inputs" is now a reserved field name everywhere. Please either reject reserved names up front or preserve the original field before injecting the envelope.
Also applies to: 2211-2218, 2283-2290
| (value: string, meta?: {label?: string; isEvaluator?: boolean}) => { | ||
| if (value === selectedAppId) return | ||
| setSelectedAppId(value) | ||
| setSelectedWorkflowMeta(value ? (meta ?? null) : null) | ||
| setSelectedTestsetId("") |
There was a problem hiding this comment.
Allow metadata updates even when the selected workflow ID is unchanged.
The early return on same value skips setSelectedWorkflowMeta, so label/kind fallback can remain stale.
🔧 Minimal fix
- (value: string, meta?: {label?: string; isEvaluator?: boolean}) => {
- if (value === selectedAppId) return
+ (value: string, meta?: {label?: string; isEvaluator?: boolean}) => {
+ if (value === selectedAppId) {
+ if (value && meta) setSelectedWorkflowMeta(meta)
+ return
+ }
setSelectedAppId(value)
setSelectedWorkflowMeta(value ? (meta ?? null) : null)| const onSelectRow = useCallback( | ||
| (selectedRowKeys: React.Key[]) => { | ||
| if (disabled) return | ||
| const selectedId = selectedRowKeys[0] as string | undefined | ||
| if (!selectedId) { | ||
| onSelectWorkflow("") |
There was a problem hiding this comment.
Fix single-selection behavior: current checkbox model can keep the old selection.
This picker stores a single selectedWorkflowId, but checkbox mode sends multi-key arrays and you always take the first key. Selecting another row can keep the previous ID selected.
✅ Suggested fix (radio selection)
- const onSelectRow = useCallback(
- (selectedRowKeys: React.Key[]) => {
+ const onSelectRow = useCallback(
+ (selectedRowKeys: React.Key[]) => {
if (disabled) return
- const selectedId = selectedRowKeys[0] as string | undefined
+ const selectedId = selectedRowKeys.at(-1) as string | undefined
if (!selectedId) {
onSelectWorkflow("")
return
}
const row = tableRows.find((r) => r.workflowId === selectedId)
onSelectWorkflow(selectedId, {
label: row?.name,
isEvaluator: row?.isEvaluator,
})
},
[disabled, onSelectWorkflow, tableRows],
)
const rowSelection = useMemo(
() => ({
- type: "checkbox" as const,
+ type: "radio" as const,
selectedRowKeys: selectedWorkflowId ? [selectedWorkflowId] : [],
onChange: (keys: React.Key[]) => onSelectRow(keys),
getCheckboxProps: () => ({disabled}),
selectOnRowClick: !disabled,
}),
[selectedWorkflowId, onSelectRow, disabled],
)Also applies to: 191-199
| const previewEvaluators = useMemo( | ||
| () => (evaluators || []).filter((e) => e.flags?.is_feedback !== true), | ||
| () => | ||
| (evaluators || []).filter( | ||
| (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true, | ||
| ), |
There was a problem hiding this comment.
has_url-only filtering can hide valid online evaluators
On Line 78, requiring e.flags?.has_url === true is narrower than the platform’s online-capable evaluator logic (e.g., code/LLM/handler-backed evaluators). This can remove valid options and block live-eval creation.
Suggested fix
-import {
+import {
evaluatorConfigRevisionsListDataAtom,
evaluatorConfigRevisionsQueryStateAtom,
evaluatorTemplatesDataAtom,
evaluatorTemplatesQueryAtom,
+ isOnlineCapableEvaluator,
} from "@agenta/entities/workflow"
@@
- const previewEvaluators = useMemo(
- () =>
- (evaluators || []).filter(
- (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true,
- ),
- [evaluators],
- )
+ const previewEvaluators = useMemo(
+ () =>
+ (evaluators || []).filter(
+ (e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e),
+ ),
+ [evaluators],
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const previewEvaluators = useMemo( | |
| () => (evaluators || []).filter((e) => e.flags?.is_feedback !== true), | |
| () => | |
| (evaluators || []).filter( | |
| (e) => e.flags?.is_feedback !== true && e.flags?.has_url === true, | |
| ), | |
| import { | |
| evaluatorConfigRevisionsListDataAtom, | |
| evaluatorConfigRevisionsQueryStateAtom, | |
| evaluatorTemplatesDataAtom, | |
| evaluatorTemplatesQueryAtom, | |
| isOnlineCapableEvaluator, | |
| } from "@agenta/entities/workflow" | |
| const previewEvaluators = useMemo( | |
| () => | |
| (evaluators || []).filter( | |
| (e) => e.flags?.is_feedback !== true && isOnlineCapableEvaluator(e), | |
| ), | |
| [evaluators], | |
| ) |
| const {value, isParsable} = useMemo(() => { | ||
| if (!parentRaw) return {value: "", isParsable: true} | ||
| try { | ||
| const parsed = JSON.parse(parentRaw) as unknown | ||
| if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) { | ||
| const raw = (parsed as Record<string, unknown>)[subPath] | ||
| return {value: raw == null ? "" : String(raw), isParsable: true} | ||
| } | ||
| return {value: "", isParsable: false} | ||
| } catch { | ||
| return {value: "", isParsable: false} | ||
| } | ||
| }, [parentRaw, subPath]) | ||
|
|
||
| const handleChange = useCallback( | ||
| (nextVal: string) => { | ||
| let parsed: Record<string, unknown> = {} | ||
| if (parentRaw) { | ||
| try { | ||
| const p = JSON.parse(parentRaw) as unknown | ||
| if (p && typeof p === "object" && !Array.isArray(p)) { | ||
| parsed = {...(p as Record<string, unknown>)} | ||
| } | ||
| } catch { | ||
| // non-JSON parent — start fresh; overwrite handled by isParsable gate | ||
| } | ||
| } | ||
| parsed[subPath] = nextVal | ||
| setCellValue({ | ||
| testcaseId, | ||
| column: parentKey, | ||
| value: JSON.stringify(parsed), | ||
| }) |
There was a problem hiding this comment.
Walk dotted sub-paths instead of indexing them literally.
Line 90 and Line 112 treat subPath as a single property name, but getPortSubPaths() is intentionally allowed to return multi-segment hints like a.b.c. In flat view that reads/writes {"a.b.c": ...} instead of traversing a -> b -> c, so nested fields are edited incorrectly.
Suggested fix
+const splitPath = (path: string) => path.split(".").filter(Boolean)
+
+function getValueAtPath(obj: Record<string, unknown>, path: string): unknown {
+ return splitPath(path).reduce<unknown>((acc, segment) => {
+ return acc && typeof acc === "object" && !Array.isArray(acc)
+ ? (acc as Record<string, unknown>)[segment]
+ : undefined
+ }, obj)
+}
+
+function setValueAtPath(obj: Record<string, unknown>, path: string, value: string) {
+ const segments = splitPath(path)
+ let cursor = obj
+
+ for (const segment of segments.slice(0, -1)) {
+ const next = cursor[segment]
+ cursor[segment] =
+ next && typeof next === "object" && !Array.isArray(next)
+ ? {...(next as Record<string, unknown>)}
+ : {}
+ cursor = cursor[segment] as Record<string, unknown>
+ }
+
+ cursor[segments[segments.length - 1]!] = value
+}
+
const {value, isParsable} = useMemo(() => {
if (!parentRaw) return {value: "", isParsable: true}
try {
const parsed = JSON.parse(parentRaw) as unknown
if (parsed && typeof parsed === "object" && !Array.isArray(parsed)) {
- const raw = (parsed as Record<string, unknown>)[subPath]
+ const raw = getValueAtPath(parsed as Record<string, unknown>, subPath)
return {value: raw == null ? "" : String(raw), isParsable: true}
}
return {value: "", isParsable: false}
@@
- parsed[subPath] = nextVal
+ setValueAtPath(parsed, subPath, nextVal)
setCellValue({
testcaseId,
column: parentKey,
value: JSON.stringify(parsed),
})| const explicit = xParams?.language as string | undefined | ||
| if (explicit) return explicit |
There was a problem hiding this comment.
Validate the explicit schema language before returning it.
Only the dynamic candidates are normalized and checked. An explicit value like "Python" or a typo bypasses the guard and gets cast into SharedEditor, which skips the fallback and can leave syntax highlighting in an undefined state.
Suggested fix
- const explicit = xParams?.language as string | undefined
- if (explicit) return explicit
+ const explicit =
+ typeof xParams?.language === "string" ? xParams.language.toLowerCase() : undefined
+ if (explicit && isSupportedLanguage(explicit)) return explicit📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const explicit = xParams?.language as string | undefined | |
| if (explicit) return explicit | |
| const explicit = | |
| typeof xParams?.language === "string" ? xParams.language.toLowerCase() : undefined | |
| if (explicit && isSupportedLanguage(explicit)) return explicit |
| const jsonDefault = useMemo(() => { | ||
| if (portType === "array") return "[]" | ||
| const props = | ||
| portSchema && typeof portSchema === "object" | ||
| ? (portSchema as {properties?: Record<string, unknown>}).properties | ||
| : null | ||
| if (!props || typeof props !== "object") return "{}" | ||
| const keys = Object.keys(props) | ||
| if (keys.length === 0) return "{}" | ||
| const obj: Record<string, string> = {} | ||
| for (const k of keys) obj[k] = "" | ||
| return JSON.stringify(obj, null, 2) | ||
| }, [portType, portSchema]) |
There was a problem hiding this comment.
Don’t render a JSON value that won’t be submitted.
These lines make empty object/array cells look populated, but the underlying testcase cell stays "" and the comment below confirms the run payload remains empty. In this flow that is pretty misleading: evaluator inputs can appear present in the editor and still resolve as missing at execution time. Either persist the seed before execution, or surface the schema-derived shape as placeholder/help text instead of value.
Also applies to: 221-223, 239-249
| if (language === "json" || language === "yaml") { | ||
| const currentParsed = safeJson5Parse(currentEditorContent) | ||
| const incomingParsed = safeJson5Parse(initialValue) | ||
| if (!isEqual(currentParsed, incomingParsed)) { | ||
| needsDispatch = true | ||
| // Force only when we're replacing real content; on first | ||
| // populate, the handler should run normally. | ||
| forceUpdate = hasExistingContent | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Inspect safeJson5Parse implementation and callers:"
rg -n -C4 '\bsafeJson5Parse\b' --glob '*.{ts,tsx,js,jsx}'
echo
echo "Inspect the YAML equality gate in the editor plugin:"
sed -n '895,903p' web/packages/agenta-ui/src/Editor/plugins/code/index.tsxRepository: Agenta-AI/agenta
Length of output: 8839
🏁 Script executed:
# Check the implementation of tryParsePartialJson
rg -n -A 10 'export function tryParsePartialJson' web/packages/agenta-shared/src/utils/jsonParsing.tsRepository: Agenta-AI/agenta
Length of output: 540
🏁 Script executed:
# Check the imports in the editor plugin file
head -n 30 web/packages/agenta-ui/src/Editor/plugins/code/index.tsxRepository: Agenta-AI/agenta
Length of output: 923
🏁 Script executed:
# Get the full implementation of tryParsePartialJson
sed -n '32,80p' web/packages/agenta-shared/src/utils/jsonParsing.tsRepository: Agenta-AI/agenta
Length of output: 2007
Use a YAML parser in the equality gate.
This branch treats YAML like JSON5. When YAML strings that aren't valid JSON fail safeJson5Parse(), both fallback to tryParsePartialJson(), which applies JSON heuristics and returns null for unparseable content. Two different YAML documents can both return null, making isEqual() treat them as identical and skip needsDispatch. External YAML updates never reach the editor. Parse YAML with yaml.load() instead, and fall back to string comparison when parsing fails.
Suggested fix
- if (language === "json" || language === "yaml") {
+ if (language === "json") {
const currentParsed = safeJson5Parse(currentEditorContent)
const incomingParsed = safeJson5Parse(initialValue)
if (!isEqual(currentParsed, incomingParsed)) {
needsDispatch = true
- // Force only when we're replacing real content; on first
- // populate, the handler should run normally.
forceUpdate = hasExistingContent
}
return
}
+
+ if (language === "yaml") {
+ let currentParsed: unknown = currentEditorContent?.trim() ?? ""
+ let incomingParsed: unknown = initialValue?.trim() ?? ""
+
+ try {
+ currentParsed = yaml.load(currentEditorContent)
+ } catch {}
+
+ try {
+ incomingParsed = yaml.load(initialValue)
+ } catch {}
+
+ if (!isEqual(currentParsed, incomingParsed)) {
+ needsDispatch = true
+ forceUpdate = hasExistingContent
+ }
+ return
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (language === "json" || language === "yaml") { | |
| const currentParsed = safeJson5Parse(currentEditorContent) | |
| const incomingParsed = safeJson5Parse(initialValue) | |
| if (!isEqual(currentParsed, incomingParsed)) { | |
| needsDispatch = true | |
| // Force only when we're replacing real content; on first | |
| // populate, the handler should run normally. | |
| forceUpdate = hasExistingContent | |
| } | |
| if (language === "json") { | |
| const currentParsed = safeJson5Parse(currentEditorContent) | |
| const incomingParsed = safeJson5Parse(initialValue) | |
| if (!isEqual(currentParsed, incomingParsed)) { | |
| needsDispatch = true | |
| forceUpdate = hasExistingContent | |
| } | |
| return | |
| } | |
| if (language === "yaml") { | |
| let currentParsed: unknown = currentEditorContent?.trim() ?? "" | |
| let incomingParsed: unknown = initialValue?.trim() ?? "" | |
| try { | |
| currentParsed = yaml.load(currentEditorContent) | |
| } catch {} | |
| try { | |
| incomingParsed = yaml.load(initialValue) | |
| } catch {} | |
| if (!isEqual(currentParsed, incomingParsed)) { | |
| needsDispatch = true | |
| forceUpdate = hasExistingContent | |
| } | |
| return | |
| } |
| updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean { | ||
| const text = this.getTextContent() | ||
| if (text !== dom.textContent) { | ||
| dom.textContent = text | ||
| return true | ||
| } | ||
| return false | ||
| const textChanged = text !== dom.textContent | ||
| if (textChanged) dom.textContent = text | ||
| // Re-apply styles even when text didn't change at this exact node — | ||
| // edits to a nearby path segment can change validity without | ||
| // changing this node's own text, but when it DOES change we also | ||
| // need to re-evaluate because the new text may be valid/invalid. | ||
| applyTokenStyles(dom, text) | ||
| return textChanged |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Lexical 0.40, what does a custom TextNode.updateDOM(...) returning true mean? If the node already updates textContent/styles in place, should updateDOM return false to avoid DOM replacement?
💡 Result:
In Lexical 0.40, a custom TextNode.updateDOM(...) returning true signals to Lexical that the DOM element needs to be replaced with a new one created by createDOM. Returning false indicates that the existing DOM element was successfully updated in place, avoiding replacement. Yes, if the node already updates textContent/styles in place (e.g., via super.updateDOM and additional style changes), it should return false to prevent unnecessary DOM replacement and preserve performance/DOM state like selection anchors.
Citations:
- 1: https://lexical.dev/docs/concepts/nodes
- 2: https://www.mintlify.com/facebook/lexical/concepts/nodes
Return false from updateDOM() after mutating the span in place.
This method already updates textContent and styles directly on the DOM element. Returning textChanged signals Lexical to recreate the DOM node on text edits, which breaks selection/hover continuity and defeats the in-place update optimization. Per Lexical's node contract, return false to indicate the DOM was successfully updated in place.
Suggested fix
updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean {
const text = this.getTextContent()
const textChanged = text !== dom.textContent
if (textChanged) dom.textContent = text
// Re-apply styles even when text didn't change at this exact node —
// edits to a nearby path segment can change validity without
// changing this node's own text, but when it DOES change we also
// need to re-evaluate because the new text may be valid/invalid.
applyTokenStyles(dom, text)
- return textChanged
+ return false
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean { | |
| const text = this.getTextContent() | |
| if (text !== dom.textContent) { | |
| dom.textContent = text | |
| return true | |
| } | |
| return false | |
| const textChanged = text !== dom.textContent | |
| if (textChanged) dom.textContent = text | |
| // Re-apply styles even when text didn't change at this exact node — | |
| // edits to a nearby path segment can change validity without | |
| // changing this node's own text, but when it DOES change we also | |
| // need to re-evaluate because the new text may be valid/invalid. | |
| applyTokenStyles(dom, text) | |
| return textChanged | |
| updateDOM(_prevNode: TokenNode, dom: HTMLElement): boolean { | |
| const text = this.getTextContent() | |
| const textChanged = text !== dom.textContent | |
| if (textChanged) dom.textContent = text | |
| // Re-apply styles even when text didn't change at this exact node — | |
| // edits to a nearby path segment can change validity without | |
| // changing this node's own text, but when it DOES change we also | |
| // need to re-evaluate because the new text may be valid/invalid. | |
| applyTokenStyles(dom, text) | |
| return false | |
| } |
| node.setTextContent(suggestion.tokenText) | ||
| // Position cursor just before the closing `}}` so the user | ||
| // can keep typing (e.g. drill further into a path). | ||
| navigateCursor({ | ||
| nodeKey: node.getKey(), | ||
| offset: node.getTextContent().length, | ||
| offset: suggestion.tokenText.length - 2, | ||
| }) |
There was a problem hiding this comment.
Selecting a suggestion in the middle of a token drops the suffix.
Line 285 opens the menu anywhere inside {{...}}, but Line 251 rewrites the entire token to suggestion.tokenText. If the caret sits in the middle of {{$.inputs.country}}, accepting a suggestion replaces everything after the caret as well. The safe fix is to keep autocomplete end-of-token only until the replacement logic becomes caret-aware.
Suggested fix
- if (match && offsetPos >= 2 && offsetPos <= text.length - 2) {
+ if (match && offsetPos === text.length - 2) {
setInputQuery(match[1])
const dom = editor.getElementByKey(node.getKey())
if (dom) {Also applies to: 285-286






Summary
can now select an evaluator workflow and run an evaluation on it.
Testing
Verified locally
ran an evaluation on an evaluator
Added or updated tests
QA follow-up
need to test more evaluations on evaluators
Checklist
Contributor Resources