Conversation
…nd unify workflow store for apps and evaluators
- Add `transformAgDataForTestset` to unwrap nested subject data from evaluator annotation spans, stripping internal bookkeeping keys and reshaping to `{inputs, outputs, score}` for testset compatibility
- Detect evaluator annotation spans via `trace_type === "annotation"` and `is_evaluator` flag or evaluator references
- Apply transformation to both live and original data for consistent
…cture - Remove `transformAgDataForTestset` and evaluator annotation span detection logic that unwrapped nested subject data - Replace `leafTracePathsAtom` and `traceDataPathsAtom` with `canonicalTracePathsAtom` that returns only `data.inputs` and `data.outputs` when present - Update auto-mapping to seed from canonical envelope paths instead of all leaf paths - Preserve evaluator annotation spans' nested structure in testcases to maintain replay
…chat handlers
- Expose inputs both flattened at root (e.g., `{{country}}`) and nested under `inputs` key (e.g., `{{$.inputs.country}}`) for consistent JSONPath resolution across completion, chat, and evaluator prompts
- Clear `input_keys` after validation to prevent `PromptTemplate.format()` from rejecting the injected envelope meta-key as extra
- Exclude `messages` from nested view in chat handler since it is handled separately
…yground - Wrap `PlaygroundConfigSection` in `PlaygroundNodeTokenPathProvider` to scope suggestions to the node's chain context - Split token path provider into global (inputs-only) and scoped (adds outputs when upstream exists) variants - Update `useInputsSource` to accept `scopedEntityId` and read entity-specific input ports when provided - Update `useOutputsSource` to accept `upstreamEntityId` and read upstream output ports directly
…traction to prevent ReDoS
- Replace regex-based extraction with prefix/suffix checks and string slicing
- Prevents polynomial backtracking on adversarial inputs (addresses CodeQL js/redos)
- Handle Jinja whitespace control (`{%-` / `-%}`) via explicit slice trimming
- Maintain identical behavior for `{{...}}`, `{%...%}`, and `{#...#}` wrappers
Replace legacy fallback mode with first-class flat token support. Both `{{country}}` and `{{$.inputs.country}}` now use the same suggestion pipeline, with flat tokens implicitly drilling into `$.inputs.*` to surface testcase columns and port schemas.
Wrap type pills in tooltips to show full labels on hover and apply max-width truncation to prevent long evaluator names (e.g., `__main__.MyEvaluator`) from breaking column layouts.
Add `is_managed` flag checks to exclude user-deployed Python evaluators (e.g., `user:custom:__main__.MyEval:latest`) from the Evaluators page. These SDK-registered evaluators aren't first-class catalog entries and should only appear in workflow execution contexts. Apply filter at both workflow-id cache level and revision query level for defense-in-depth.
…ndered nested config
Implement multi-tier language detection for CodeEditorControl: explicit schema override → sibling field lookup via `languageFromField` → heuristic on `runtime` field → python fallback. Add `useOptionalDrillIn` hook to safely access drill-in context from shared controls that may render outside the provider tree.
Replace `null` placeholder + clear button pattern with an explicit "All types" first option in the workflow type filter Select. Simplifies state handling by eliminating nullable semantics and ensures the reset path is always visible as a clickable option rather than requiring hover to reveal a clear button.
…orkflow type filter Remove extra left indent from Ant Design's grouped select options to ensure visual alignment with the ungrouped "All types" entry. The default `.ant-select-item-option-grouped` padding creates unintended hierarchy when only one or two groups are visible.
…handling - Add type guards for schema.properties access in parameters source - Replace `||` with `??` for nullish coalescing in trace drawer - Preserve workflow metadata when re-selecting same app in evaluation modal - Update test regex to match "variant" or "revision" text - Validate language hint before passing to code editor - Fix TokenNode updateDOM to always return false and preserve selection - Restrict token typeahead to end-of-token
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
📝 WalkthroughSummary by CodeRabbit
WalkthroughBumps many package versions across Python and JS manifests and tightens several dependency constraints; rewrites three check_deps.load_locked_versions() implementations to be marker-aware; and adds a token-path/typeahead system with Playground-scoped sources, token plugins (tooltip/typeahead), workflow/evaluator UI/store changes, and several schema/port extraction and re-nesting adjustments. ChangesVersion & Dependency Metadata
Marker-aware lockfile parsing
Token-path / Typeahead system (Playground + Editor integration)
Playground UX & token scoping integrations
Workflow / Evaluator UI and stores
Evaluations & selection flows
Workflow/entity runtime: template grouping, port extraction, renesting & schemas
Playground / Testcase editor and mapping behavior
Editor / Code editor & Variable Control tweaks
Sequence Diagram(s)sequenceDiagram
participant Editor as Editor (Lexical)
participant Provider as TokenPathSuggestionsProvider
participant Inputs as useInputsSource
participant Outputs as useOutputsSource
participant Params as useParametersSource
participant Testcases as useTestcaseSource
Editor->>Provider: request getSuggestions(prefix, query)
Provider->>Inputs: if envelope === "inputs" -> getSuggestions(afterSlot, query)
Provider->>Outputs: if envelope === "outputs" -> getSuggestions(afterSlot, query)
Provider->>Params: if envelope === "parameters" -> getSuggestions(afterSlot, query)
Provider->>Testcases: if envelope === "testcase" -> getSuggestions(afterSlot, query)
Inputs->>PlaygroundState: read inputPortSchemaMap / observedTestcases
Outputs->>PlaygroundState: read outputPortSchemaMap
Params->>PlaygroundState: read aggregatedParametersSchemaAtom
Testcases->>PlaygroundState: read observedTestcasesAtom
Inputs->>Provider: return suggestions[]
Outputs->>Provider: return suggestions[]
Params->>Provider: return suggestions[]
Testcases->>Provider: return suggestions[]
Provider->>Editor: merged suggestions[] (label, hint, tokenText)
(Note: rectangle colors omitted in sequence text; nodes represent components interacting at runtime.) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
api/check_deps.py (1)
99-100: ⚡ Quick winMove imports to module level.
import sysandfrom packaging.markers import Marker, InvalidMarkershould be moved to module scope. These are used inload_locked_versions(), a helper function for configuration lookup. As per the coding guideline, avoid local imports inside helper functions unless there is a proven circular dependency—none exists here.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 2f25de5c-be22-42a5-9c3a-e794d3687731
⛔ Files ignored due to path filters (3)
api/poetry.lockis excluded by!**/*.locksdk/poetry.lockis excluded by!**/*.lockservices/poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (6)
api/check_deps.pyapi/pyproject.tomlsdk/check_deps.pysdk/pyproject.tomlservices/check_deps.pyservices/pyproject.toml
🚧 Files skipped from review as they are similar to previous changes (1)
- sdk/pyproject.toml
Railway Preview Environment
Updated at 2026-05-05T16:52:49.483Z |
…AI/agenta into feature/eval-evaluations
[Frontend Feat] Enable running evaluations on evaluators
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sdk/agenta/sdk/engines/running/handlers.py (1)
2198-2231:⚠️ Potential issue | 🟠 Major | ⚡ Quick winReject
messagesas a reserved input key inchat_v0.
chat_v0still stripsinputs["messages"]out of_variablesand excludes it from the nestedinputsenvelope, but the new reserved-key check does not block it. A caller-provided business field namedmessagesis therefore silently reinterpreted as chat history instead of prompt data.Suggested fix
- _reject_reserved_input_keys(inputs) + _reject_reserved_input_keys( + None + if inputs is None + else {k: v for k, v in inputs.items() if k in {"inputs", "messages"}} + )
🧹 Nitpick comments (7)
web/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsx (1)
104-104: ⚡ Quick winMemoize the Drawer mask config instead of creating it inline.
{blur: true}is recreated every render. Use a memoized value (or a module-level constant) for a stable prop reference.♻️ Proposed refactor
+ const drawerMask = useMemo( + () => (isEvaluatorDrawer || isStacked ? {blur: true} : false), + [isEvaluatorDrawer, isStacked], + ) ... - mask={isEvaluatorDrawer || isStacked ? {blur: true} : false} + mask={drawerMask}As per coding guidelines, "Avoid inline functions and objects in render; use
useCallbackto create stable callbacks outside of render logic".web/packages/agenta-entities/src/runnable/evaluatorTransforms.ts (1)
122-139: 💤 Low valueConsider using Sets for
hiddenKeysandadvancedKeysfor consistency.
allowedKeysuses aSetfor O(1) lookups, buthiddenKeysandadvancedKeysuse arrays with.includes()(O(n) per call). For small evaluator schemas this is fine, but Sets would be more consistent.♻️ Optional: Use Sets for consistent O(1) lookups
-const hiddenKeys = Object.entries(schemaProps) - .filter(([, prop]) => prop["x-ag-type"] === "hidden") - .map(([key]) => key) -const advancedKeys = Object.entries(schemaProps) - .filter( - ([, prop]) => prop["x-advanced"] === true || prop["x-ag-ui-advanced"] === true, - ) - .map(([key]) => key) +const hiddenKeys = new Set( + Object.entries(schemaProps) + .filter(([, prop]) => prop["x-ag-type"] === "hidden") + .map(([key]) => key) +) +const advancedKeys = new Set( + Object.entries(schemaProps) + .filter( + ([, prop]) => prop["x-advanced"] === true || prop["x-ag-ui-advanced"] === true, + ) + .map(([key]) => key) +) const primaryData: Record<string, unknown> = {} const advancedData: Record<string, unknown> = {} for (const [key, value] of Object.entries(flat)) { if (!allowedKeys.has(key)) continue - if (hiddenKeys.includes(key)) continue - if (advancedKeys.includes(key)) { + if (hiddenKeys.has(key)) continue + if (advancedKeys.has(key)) { advancedData[key] = value } else { primaryData[key] = value } }web/packages/agenta-entities/src/workflow/state/store.ts (1)
1265-1363: ⚖️ Poor tradeoffConsider extracting shared evaluator normalization logic.
workflowBaseEntityAtomFamilyandworkflowEntityAtomFamilycontain nearly identical code for evaluator normalization and draft re-nesting (~150 duplicated lines). While the separation is intentional (base skips schema resolution to avoid side effects), the shared normalization logic could be extracted to a helper function.web/packages/agenta-ui/src/Editor/plugins/token/TokenTooltipPlugin.tsx (1)
108-117: 💤 Low valueConsider caching bounding rect to avoid repeated calls.
getBoundingClientRect()is called four times. While browsers may cache this within a single frame, extracting to a variable improves readability and ensures a single call.♻️ Suggested refactor
+ const rect = target.getBoundingClientRect() <span style={{ position: "fixed", - left: target.getBoundingClientRect().left, - top: target.getBoundingClientRect().top, - width: target.getBoundingClientRect().width, - height: target.getBoundingClientRect().height, + left: rect.left, + top: rect.top, + width: rect.width, + height: rect.height, pointerEvents: "none", }} />web/packages/agenta-ui/src/Editor/plugins/token/TokenTypeaheadPlugin.tsx (1)
385-405: 💤 Low valueConsider using a more stable key than index.
The key
${suggestion.label}::${index}includes the index which helps with duplicate labels, but if the suggestion list order changes frequently, this could cause unnecessary re-renders. Since suggestions can have duplicate labels (e.g., same key from different sources), the current approach is acceptable but could be improved by including the hint in the key.♻️ Optional improvement
- key={`${suggestion.label}::${index}`} + key={`${suggestion.label}::${suggestion.hint ?? ""}::${index}`}web/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsx (1)
78-85: 💤 Low valueConsider handling the case when templates haven't loaded yet.
The
getDefaultStore()pattern correctly solves the isolated Provider issue, buttemplatescould be an empty array if the evaluator catalog hasn't loaded yet. Currently, line 91 falls back toworkflowKeywhich is reasonable, but you may want to add a loading indicator or ensure the catalog is pre-fetched before rendering table cells.web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsx (1)
155-160: ⚡ Quick winDebounce the search updates before they hit the store.
Every keypress currently propagates straight into the shared atom, which means this picker can retrigger
queryWorkflowsand the revision batch fetch on each input event. Please debounce the store update, ideally via the shared debounce hook, before querying.As per coding guidelines, "Debounce search inputs and filters; throttle scroll and resize handlers" and "Use
@agenta/shared/hooksfor shared React hooks like useDebounceInput".Also applies to: 259-264
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 4ead469f-15d9-49fb-a360-00c66a9c1c9f
📒 Files selected for processing (78)
sdk/agenta/sdk/engines/running/handlers.pyweb/oss/src/components/Evaluators/Table/assets/evaluatorColumns.tsxweb/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsxweb/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsxweb/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsxweb/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.tsweb/oss/src/components/Evaluators/assets/getColumns.tsxweb/oss/src/components/Evaluators/assets/types.tsweb/oss/src/components/Evaluators/store/evaluatorsPaginatedStore.tsweb/oss/src/components/Playground/Components/PlaygroundTestcaseEditor.tsxweb/oss/src/components/Playground/Components/PlaygroundVariantConfig/index.tsxweb/oss/src/components/Playground/OSSPlaygroundShell.tsxweb/oss/src/components/Playground/PlaygroundTokenPath/atoms.tsweb/oss/src/components/Playground/PlaygroundTokenPath/chainContext.tsweb/oss/src/components/Playground/PlaygroundTokenPath/index.tsxweb/oss/src/components/Playground/PlaygroundTokenPath/sources/inputs.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/outputs.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/parameters.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/shared.tsweb/oss/src/components/Playground/PlaygroundTokenPath/sources/testcase.tsweb/oss/src/components/Playground/PlaygroundTokenPath/types.tsweb/oss/src/components/SharedDrawers/AddToTestsetDrawer/atoms/drawerState.tsweb/oss/src/components/SharedDrawers/AddToTestsetDrawer/hooks/useTestsetDrawer.tsweb/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/components/TraceTypeHeader/index.tsxweb/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/index.tsxweb/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.tsweb/oss/src/components/pages/app-management/components/ApplicationManagementSection.tsxweb/oss/src/components/pages/app-management/components/appWorkflowColumns.tsxweb/oss/src/components/pages/app-management/store/appWorkflowFilterAtoms.tsweb/oss/src/components/pages/app-management/store/appWorkflowStore.tsweb/oss/src/components/pages/app-management/store/index.tsweb/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalContent.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/NewEvaluationModalInner.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectWorkflowSection.tsxweb/oss/src/components/pages/evaluations/NewEvaluation/types.tsweb/oss/src/components/pages/evaluations/onlineEvaluation/OnlineEvaluationDrawer.tsxweb/oss/src/components/pages/observability/assets/constants.tsweb/oss/tests/playwright/acceptance/auto-evaluation/index.tsweb/oss/tests/playwright/acceptance/auto-evaluation/tests.tsweb/oss/tests/playwright/acceptance/human-annotation/index.tsweb/oss/tests/playwright/acceptance/human-annotation/tests.tsweb/packages/agenta-entities/src/runnable/evaluatorTransforms.tsweb/packages/agenta-entities/src/runnable/index.tsweb/packages/agenta-entities/src/runnable/portHelpers.tsweb/packages/agenta-entities/src/shared/execution/valueExtraction.tsweb/packages/agenta-entities/src/workflow/core/index.tsweb/packages/agenta-entities/src/workflow/core/schema.tsweb/packages/agenta-entities/src/workflow/index.tsweb/packages/agenta-entities/src/workflow/state/evaluatorUtils.tsweb/packages/agenta-entities/src/workflow/state/helpers.tsweb/packages/agenta-entities/src/workflow/state/index.tsweb/packages/agenta-entities/src/workflow/state/molecule.tsweb/packages/agenta-entities/src/workflow/state/store.tsweb/packages/agenta-entity-ui/package.jsonweb/packages/agenta-entity-ui/src/DrillInView/SchemaControls/CodeEditorControl.tsxweb/packages/agenta-entity-ui/src/DrillInView/components/MoleculeDrillInContext.tsxweb/packages/agenta-entity-ui/src/DrillInView/components/index.tsweb/packages/agenta-entity-ui/src/index.tsweb/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsxweb/packages/agenta-entity-ui/src/workflow/WorkflowTypeTag.tsxweb/packages/agenta-entity-ui/src/workflow/index.tsweb/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/WorkflowRevisionDrawer.tsxweb/packages/agenta-playground-ui/src/components/WorkflowRevisionDrawer/store.tsweb/packages/agenta-playground-ui/src/components/adapters/VariableControlAdapter.tsxweb/packages/agenta-playground/src/state/controllers/executionItemController.tsweb/packages/agenta-playground/src/state/execution/index.tsweb/packages/agenta-playground/src/state/execution/selectors.tsweb/packages/agenta-shared/src/utils/index.tsweb/packages/agenta-shared/src/utils/templateVariable.tsweb/packages/agenta-ui/src/Editor/index.tsweb/packages/agenta-ui/src/Editor/plugins/code/index.tsxweb/packages/agenta-ui/src/Editor/plugins/token/TokenNode.tsweb/packages/agenta-ui/src/Editor/plugins/token/TokenPathSuggestionsContext.tsxweb/packages/agenta-ui/src/Editor/plugins/token/TokenTooltipPlugin.tsxweb/packages/agenta-ui/src/Editor/plugins/token/TokenTypeaheadPlugin.tsxweb/packages/agenta-ui/src/Editor/plugins/token/extensions/tokenBehavior.tsx
💤 Files with no reviewable changes (8)
- web/oss/src/components/Evaluators/assets/cells/EvaluatorTagsCell.tsx
- web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/types.ts
- web/oss/src/components/Evaluators/assets/cells/TableDropdownMenu/index.tsx
- web/oss/src/components/Evaluators/assets/types.ts
- web/oss/src/components/SharedDrawers/TraceDrawer/components/TraceContent/utils/index.ts
- web/oss/src/components/Evaluators/assets/cells/EvaluatorTypePill.tsx
- web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectAppSection.tsx
- web/oss/src/components/Evaluators/assets/getColumns.tsx
✅ Files skipped from review due to trivial changes (8)
- web/oss/src/components/pages/evaluations/NewEvaluation/Components/SelectVariantSection.tsx
- web/packages/agenta-shared/src/utils/index.ts
- web/packages/agenta-entity-ui/src/workflow/index.ts
- web/packages/agenta-entities/src/workflow/state/index.ts
- web/packages/agenta-entities/src/workflow/index.ts
- web/packages/agenta-entity-ui/src/workflow/WorkflowKindTag.tsx
- web/packages/agenta-entities/src/runnable/index.ts
- web/oss/src/components/pages/app-management/store/index.ts
New version v0.99.0 in