Skill Error codes for pick and place manipulation#2010
Skill Error codes for pick and place manipulation#2010mustafab0 wants to merge 5 commits intolegacy-dev-dont-mergefrom
Conversation
Greptile SummaryThis PR replaces free-form string returns from all 18 manipulation skills with a typed
Confidence Score: 4/5Safe to merge; the migration is consistent across all 18 skills with correct error-code mappings and a well-tested wire contract All skill methods are correctly migrated and the structured return type flows cleanly through the existing MCP dimos/agents/skill_result.py (skill_timing exception path and stamp mutation) and dimos/manipulation/pick_and_place_module.py (look/scan_objects empty-result semantics) Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as LLM Agent
participant MCP as MCP Server
participant PAP as PickAndPlaceModule
participant MM as ManipulationModule
Agent->>MCP: tool call: pick_and_place(object, x, y, z)
MCP->>PAP: pick_and_place(...)
activate PAP
Note over PAP: skill_timing("pick_and_place")
PAP->>PAP: pick(object_name)
activate PAP
Note over PAP: skill_timing("pick")
PAP->>MM: _lift_if_low()
MM-->>PAP: SkillResult.ok()
PAP->>MM: plan_to_pose / _preview_execute_wait
MM-->>PAP: SkillResult.ok/fail
PAP-->>PAP: stamp(result) logs SKILL pick
deactivate PAP
PAP->>PAP: place(x, y, z)
activate PAP
Note over PAP: skill_timing("place")
PAP->>MM: _lift_if_low / plan / execute
MM-->>PAP: SkillResult
PAP-->>PAP: stamp(result) logs SKILL place
deactivate PAP
PAP-->>PAP: stamp(place_result) logs SKILL pick_and_place
deactivate PAP
MCP->>MCP: hasattr(result, agent_encode) True
MCP-->>Agent: "JSON {success, error_code, message, duration_ms}"
Reviews (1): Last reviewed commit: "updated manipulation blueprints and test..." | Re-trigger Greptile |
| @contextmanager | ||
| def skill_timing(name: str | None = None) -> Iterator[Callable[[SkillResult], SkillResult]]: | ||
| """Stamp a ``SkillResult`` with the elapsed wall time at exit, and log it. | ||
|
|
||
| If ``name`` is supplied, a structured ``logger.info`` line is emitted on | ||
| every ``stamp(...)`` call (i.e. every skill return point) with the skill | ||
| name, the outcome code, and the elapsed duration. Pass the skill's own | ||
| method name so the log is greppable. | ||
|
|
||
| Usage:: | ||
|
|
||
| with skill_timing("set_gripper") as stamp: | ||
| ... | ||
| return stamp(SkillResult.ok("done")) | ||
| """ | ||
| t0 = time.monotonic() | ||
|
|
||
| def stamp(result: SkillResult) -> SkillResult: | ||
| result.duration_ms = (time.monotonic() - t0) * 1000.0 | ||
| if name is not None: | ||
| code = result.error_code.name if result.error_code is not None else "OK" | ||
| logger.info(f"SKILL {name} result={code} duration_ms={result.duration_ms:.1f}") | ||
| return result | ||
|
|
||
| yield stamp |
There was a problem hiding this comment.
skill_timing silently drops structured logging on exception exit
The PR description guarantees "Every return logs SKILL <name> result=<code> duration_ms=<n> server-side." However, stamp() is only called on explicit return paths inside the with block — any unhandled exception raised before stamp() (e.g., an AttributeError from a bad robot state, or a timeout in a downstream RPC call) silently exits the context manager without emitting the SKILL ... result=... log line. The MCP server's outer except Exception will still log a generic error, but the structured SKILL line won't appear. Adding a try/finally to emit a log on exception exit (e.g., result=EXCEPTION) would close this gap and make production debugging easier.
|
|
||
| Args: | ||
| robot_name: Robot context (only needed for multi-arm setups). | ||
| """ | ||
| obstacles = self.refresh_obstacles(0.0) | ||
|
|
||
| detections = self._detection_snapshot | ||
| if not detections: | ||
| return "No objects visible from current position" | ||
|
|
||
| lines = [f"Currently see {len(detections)} object(s):"] | ||
| for det in detections: | ||
| c = det.center | ||
| lines.append( | ||
| f" - {det.name} [id={det.object_id[:8]}]: ({c.x:.3f}, {c.y:.3f}, {c.z:.3f})" | ||
| ) | ||
| with skill_timing("look") as stamp: | ||
| obstacles = self.refresh_obstacles(0.0) | ||
|
|
||
| detections = self._detection_snapshot | ||
| if not detections: | ||
| return stamp( | ||
| SkillResult.fail( | ||
| ManipulationError.NO_OBJECTS_VISIBLE, | ||
| "No objects visible from current position", | ||
| ) | ||
| ) |
There was a problem hiding this comment.
look() now reports a failure when no objects are visible
Previously look() returned an informational success string ("No objects visible from current position") when nothing was detected — a perfectly normal outcome. With this change, a caller that loops on result.is_success() will interpret "no objects" as a transient error and retry indefinitely. The code for scan_objects has the same pattern. Whether "no objects visible" constitutes a failure vs. a valid empty-result depends on the agent's strategy; at a minimum the docstring should note that NO_OBJECTS_VISIBLE is the expected "empty" code so callers know to handle it specially.
| def stamp(result: SkillResult) -> SkillResult: | ||
| result.duration_ms = (time.monotonic() - t0) * 1000.0 | ||
| if name is not None: | ||
| code = result.error_code.name if result.error_code is not None else "OK" | ||
| logger.info(f"SKILL {name} result={code} duration_ms={result.duration_ms:.1f}") | ||
| return result |
There was a problem hiding this comment.
stamp() mutates duration_ms on already-stamped inner SkillResult objects
In composed skills (pick_and_place, drop_on, scan_objects, place_back), the outer stamp() call overwrites duration_ms on a SkillResult that was already stamped by the inner skill's own skill_timing. For example, pick_and_place calls stamp(self.place(...)), so the returned SkillResult.duration_ms reflects the full pick_and_place elapsed time rather than the isolated place time. Logging is still consistent (each inner skill logs its own time), but a programmatic caller reading result.duration_ms on a failed inner result gets the outer skill's elapsed clock rather than the duration of the specific operation that failed. Cloning the result before stamping, or stamping only freshly-created results, would prevent this.
|
Problem
Manipulation skills return prose like
"Error: Robot not found". The agent has to parse strings to know what failed — blocks structured retries. Phase 1 of the pick-and-place reliability plan.Solution
Skills now return a
SkillResultdataclass withsuccess,message,error_code,duration_ms. Error codes are domain enums (CommonSkillError,ManipulationError). MCP auto-detects the newagent_encode()method, so the LLM sees structured JSON. Every return logsSKILL <name> result=<code> duration_ms=<n>server-side. All 18 manipulation skills migrated. Drive-by: enabled gripper onxarm7-planner-coordinator.Breaking Changes
Skills return
SkillResultinstead ofstr. Replace"Error:" in resultwithresult.is_success()/result.error_code.How to Test
uv run pytest dimos/agents/test_skill_result.py dimos/manipulation/, thenpython -i -m dimos.manipulation.planning.examples.manipulation_client`_client.set_gripper(0.85)returns aSkillResultdataclass; logs showSKILL set_gripper result=OK ...closes DIM-786