Skip to content

Skill Error codes for pick and place manipulation#2010

Open
mustafab0 wants to merge 5 commits intolegacy-dev-dont-mergefrom
mustafa/feature/manipulation-skill-error-codes
Open

Skill Error codes for pick and place manipulation#2010
mustafab0 wants to merge 5 commits intolegacy-dev-dont-mergefrom
mustafa/feature/manipulation-skill-error-codes

Conversation

@mustafab0
Copy link
Copy Markdown
Contributor

Problem

Manipulation skills return prose like "Error: Robot not found". The agent has to parse strings to know what failed — blocks structured retries. Phase 1 of the pick-and-place reliability plan.

Solution

Skills now return a SkillResult dataclass with success, message, error_code, duration_ms. Error codes are domain enums (CommonSkillError, ManipulationError). MCP auto-detects the new agent_encode() method, so the LLM sees structured JSON. Every return logs SKILL <name> result=<code> duration_ms=<n> server-side. All 18 manipulation skills migrated. Drive-by: enabled gripper on xarm7-planner-coordinator.

Breaking Changes

Skills return SkillResult instead of str. Replace "Error:" in result with result.is_success() / result.error_code.

How to Test

  1. uv run pytest dimos/agents/test_skill_result.py dimos/manipulation/
  2. dimos run xarm7-planner-coordinator, then python -i -m dimos.manipulation.planning.examples.manipulation_client`
  3. _client.set_gripper(0.85) returns a SkillResult dataclass; logs show SKILL set_gripper result=OK ...

closes DIM-786

@mustafab0 mustafab0 marked this pull request as ready for review May 7, 2026 20:39
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 7, 2026

Greptile Summary

This PR replaces free-form string returns from all 18 manipulation skills with a typed SkillResult dataclass carrying success, error_code (an Enum), message, duration_ms, and optional metadata. The MCP server's existing agent_encode hook auto-detects the new method on SkillResult and forwards structured JSON to the LLM, requiring no changes to the MCP layer.

  • dimos/agents/skill_result.py (new): defines SkillResult, CommonSkillError, and the skill_timing context manager that stamps wall-clock duration and emits a structured SKILL <name> result=<code> duration_ms=<n> log line on every return path.
  • dimos/manipulation/skill_errors.py (new): adds the ManipulationError enum with 10 domain-specific codes covering grasp failures, planning, gripper errors, and world-monitor availability.
  • manipulation_module.py / pick_and_place_module.py: all public skill methods and the private _lift_if_low / _preview_execute_wait helpers migrated to SkillResult; string-based error checks in callers replaced with result.is_success() / result.error_code.

Confidence Score: 4/5

Safe to merge; the migration is consistent across all 18 skills with correct error-code mappings and a well-tested wire contract

All skill methods are correctly migrated and the structured return type flows cleanly through the existing MCP agent_encode hook without requiring any server-side changes. The main gaps are in skill_timing (exception logging and duration mutation in composed skills) and the look/scan_objects semantic change — all quality concerns rather than correctness bugs.

dimos/agents/skill_result.py (skill_timing exception path and stamp mutation) and dimos/manipulation/pick_and_place_module.py (look/scan_objects empty-result semantics)

Important Files Changed

Filename Overview
dimos/agents/skill_result.py New file: defines SkillResult dataclass, CommonSkillError enum, and skill_timing context manager; skill_timing has a logging gap on unhandled exceptions and stamp() mutates result in-place which causes duration info to be overwritten in composed skills
dimos/manipulation/skill_errors.py New file: clean ManipulationError enum with 10 domain-specific codes; IK_FAILED and COLLISION_AT_START are defined but unused in Phase 1 (reserved for future use)
dimos/manipulation/manipulation_module.py Migrates 9 skill methods from str to SkillResult; _lift_if_low and _preview_execute_wait private helpers also converted; all error paths correctly mapped to structured codes
dimos/manipulation/pick_and_place_module.py Migrates 9 pick-and-place skills; look and scan_objects now return NO_OBJECTS_VISIBLE failure instead of an informational success when empty, which is a notable semantic change for callers
dimos/agents/test_skill_result.py Comprehensive tests for SkillResult factories, agent_encode wire contract, skill_timing duration measurement, and log format; good coverage of cross-domain enum usage
dimos/manipulation/blueprints.py Drive-by: adds add_gripper=True to xarm7 planner coordinator config
dimos/manipulation/test_manipulation_unit.py Updated test_reset_not_during_execution to assert result.is_success() and result.error_code; all other tests operate on non-skill methods and are unaffected by the type change
dimos/manipulation/test_pick_and_place_unit.py Updated test_place_back_no_pick_pose_errors to check result.error_code is ManipulationError.NO_PRIOR_POSE and result.message; other tests target pure-logic helpers unaffected by return type change

Sequence Diagram

sequenceDiagram
    participant Agent as LLM Agent
    participant MCP as MCP Server
    participant PAP as PickAndPlaceModule
    participant MM as ManipulationModule

    Agent->>MCP: tool call: pick_and_place(object, x, y, z)
    MCP->>PAP: pick_and_place(...)
    activate PAP
    Note over PAP: skill_timing("pick_and_place")
    PAP->>PAP: pick(object_name)
    activate PAP
    Note over PAP: skill_timing("pick")
    PAP->>MM: _lift_if_low()
    MM-->>PAP: SkillResult.ok()
    PAP->>MM: plan_to_pose / _preview_execute_wait
    MM-->>PAP: SkillResult.ok/fail
    PAP-->>PAP: stamp(result) logs SKILL pick
    deactivate PAP
    PAP->>PAP: place(x, y, z)
    activate PAP
    Note over PAP: skill_timing("place")
    PAP->>MM: _lift_if_low / plan / execute
    MM-->>PAP: SkillResult
    PAP-->>PAP: stamp(result) logs SKILL place
    deactivate PAP
    PAP-->>PAP: stamp(place_result) logs SKILL pick_and_place
    deactivate PAP
    MCP->>MCP: hasattr(result, agent_encode) True
    MCP-->>Agent: "JSON {success, error_code, message, duration_ms}"
Loading

Reviews (1): Last reviewed commit: "updated manipulation blueprints and test..." | Re-trigger Greptile

Comment on lines +100 to +124
@contextmanager
def skill_timing(name: str | None = None) -> Iterator[Callable[[SkillResult], SkillResult]]:
"""Stamp a ``SkillResult`` with the elapsed wall time at exit, and log it.

If ``name`` is supplied, a structured ``logger.info`` line is emitted on
every ``stamp(...)`` call (i.e. every skill return point) with the skill
name, the outcome code, and the elapsed duration. Pass the skill's own
method name so the log is greppable.

Usage::

with skill_timing("set_gripper") as stamp:
...
return stamp(SkillResult.ok("done"))
"""
t0 = time.monotonic()

def stamp(result: SkillResult) -> SkillResult:
result.duration_ms = (time.monotonic() - t0) * 1000.0
if name is not None:
code = result.error_code.name if result.error_code is not None else "OK"
logger.info(f"SKILL {name} result={code} duration_ms={result.duration_ms:.1f}")
return result

yield stamp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 skill_timing silently drops structured logging on exception exit

The PR description guarantees "Every return logs SKILL <name> result=<code> duration_ms=<n> server-side." However, stamp() is only called on explicit return paths inside the with block — any unhandled exception raised before stamp() (e.g., an AttributeError from a bad robot state, or a timeout in a downstream RPC call) silently exits the context manager without emitting the SKILL ... result=... log line. The MCP server's outer except Exception will still log a generic error, but the structured SKILL line won't appear. Adding a try/finally to emit a log on exception exit (e.g., result=EXCEPTION) would close this gap and make production debugging easier.

Comment on lines 487 to +501

Args:
robot_name: Robot context (only needed for multi-arm setups).
"""
obstacles = self.refresh_obstacles(0.0)

detections = self._detection_snapshot
if not detections:
return "No objects visible from current position"

lines = [f"Currently see {len(detections)} object(s):"]
for det in detections:
c = det.center
lines.append(
f" - {det.name} [id={det.object_id[:8]}]: ({c.x:.3f}, {c.y:.3f}, {c.z:.3f})"
)
with skill_timing("look") as stamp:
obstacles = self.refresh_obstacles(0.0)

detections = self._detection_snapshot
if not detections:
return stamp(
SkillResult.fail(
ManipulationError.NO_OBJECTS_VISIBLE,
"No objects visible from current position",
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 look() now reports a failure when no objects are visible

Previously look() returned an informational success string ("No objects visible from current position") when nothing was detected — a perfectly normal outcome. With this change, a caller that loops on result.is_success() will interpret "no objects" as a transient error and retry indefinitely. The code for scan_objects has the same pattern. Whether "no objects visible" constitutes a failure vs. a valid empty-result depends on the agent's strategy; at a minimum the docstring should note that NO_OBJECTS_VISIBLE is the expected "empty" code so callers know to handle it specially.

Comment on lines +117 to +122
def stamp(result: SkillResult) -> SkillResult:
result.duration_ms = (time.monotonic() - t0) * 1000.0
if name is not None:
code = result.error_code.name if result.error_code is not None else "OK"
logger.info(f"SKILL {name} result={code} duration_ms={result.duration_ms:.1f}")
return result
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 stamp() mutates duration_ms on already-stamped inner SkillResult objects

In composed skills (pick_and_place, drop_on, scan_objects, place_back), the outer stamp() call overwrites duration_ms on a SkillResult that was already stamped by the inner skill's own skill_timing. For example, pick_and_place calls stamp(self.place(...)), so the returned SkillResult.duration_ms reflects the full pick_and_place elapsed time rather than the isolated place time. Logging is still consistent (each inner skill logs its own time), but a programmatic caller reading result.duration_ms on a failed inner result gets the outer skill's elapsed clock rather than the duration of the specific operation that failed. Cloning the result before stamping, or stamping only freshly-created results, would prevent this.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

⚠️ JUnit XML file not found

The CLI was unable to find any JUnit XML files to upload.
For more help, visit our troubleshooting guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant