Skip to content

Pim/feat/g1 groot wbc#2018

Open
Nabla7 wants to merge 22 commits intodevfrom
pim/feat/g1-groot-wbc
Open

Pim/feat/g1 groot wbc#2018
Nabla7 wants to merge 22 commits intodevfrom
pim/feat/g1-groot-wbc

Conversation

@Nabla7
Copy link
Copy Markdown
Collaborator

@Nabla7 Nabla7 commented May 8, 2026

Problem

dimos doesn't have a whole-body locomotion controller for the Unitree G1 humanoid. The ControlCoordinator / WholeBodyAdapter machinery from #1954 is set up to host one — there just isn't a task that drives the legs and waist.

Closes DIM-XXX

Solution

A GrootWBCTask that runs the GR00T balance + walk ONNX policies inside the coordinator tick loop, and two blueprints (unitree-g1-groot-wbc real-hw + unitree-g1-groot-wbc-sim) that compose it. Sim and real run identical task code; only the WholeBodyAdapter underneath differs — sim uses an in-process MuJoCo adapter, real hardware uses Mustafa's transport_lcm bridge to G1WholeBodyConnection (no new G1 low-level adapter).

The coordinator gains activate / dry_run / twist_command ports duck-typed to any task exposing arm() / set_dry_run() / set_velocity_command(). TaskConfig gains hardware_id, default_positions, auto_arm, auto_dry_run, default_ramp_seconds, decimation. WholeBodyConfig carries per-joint kp/kd; ConnectedWholeBody resolves them when building MotorCommands. WebsocketVisModule exposes Arm + Dry-Run buttons that publish on the new ports — the operator activates real-hw runs from the dashboard.

Real-hardware safety profile: comes up unarmed + dry-run + 10-s ramp from current pose to the bent-knee default. Sim auto-arms with no ramp.

G1WholeBodyConnection patched for macOS: passive Init(None, 0) + Read()-per-tick (the callback-based variant doesn't fire reliably under cyclonedds on Darwin), and mode_machine hardcoded to the G1's static value with a one-shot warning if a future firmware reports something else.

Breaking Changes

None. Additive — existing blueprints unchanged. WholeBodyConfig is new (kp/kd were previously direct fields on HardwareComponent; the migration is a constructor rename).

How to Test

Unit:

uv run pytest dimos/control/tasks/test_groot_wbc_task.py \
              dimos/control/test_control.py \
              dimos/robot/unitree/g1/blueprints/basic/test_unitree_g1_groot_wbc.py \
              dimos/robot/test_all_blueprints_generation.py

Sim (no hardware needed):

DIMOS_MUJOCO_VIEW=1 dimos run unitree-g1-groot-wbc-sim

MuJoCo viewer pops, robot stands. Open http://localhost:7779/, WASD walks the robot.

Real hardware (requires a G1 + ethernet + the [unitree-dds] extra installed; CYCLONEDDS_HOME exported in the shell):

ROBOT_INTERFACE=<nic> dimos run unitree-g1-groot-wbc

Robot connects in dev-mode damping; dashboard at http://localhost:7779/. Click Arm (10-s ramp to default pose), inspect the dry-run command stream, then toggle Dry Run OFF to hand control to the policy. WASD = cmd_vel.

Contributor License Agreement

  • I have read and approved the CLA.

Nabla7 added 22 commits May 6, 2026 21:35
…era_pose, _publish_tf refactor, spec.py docstring expansion)
The DDS callback registered via ChannelSubscriber.Init(handler, 10)
doesn't fire reliably on macOS, so the adapter timed out waiting for
the first LowState to capture mode_machine.  Switch to passive-mode
Init(None, 0) and Read() per tick.  mode_machine is now hardcoded to
the static value for the 29-DOF G1 instead of read-back-then-echoed.

Also adds has_motor_states() to satisfy the post-refactor
WholeBodyAdapter Protocol, makes _release_sport_mode null-tolerant
(CheckMode returns (status, None) once nothing is active), drops the
now-unused _on_low_state callback.

WebsocketVisModule: restore the arm / disarm / set_dry_run socket.io
handlers and the matching activate / dry_run Out[Bool] ports that an
earlier cleanup pass dropped — the dashboard buttons now publish on
the LCM topics the coordinator already subscribes to.

TransportWholeBodyAdapter: trivial Protocol fix — implement read_odom
returning None so isinstance(adapter, WholeBodyAdapter) passes the
runtime_checkable type check in ControlCoordinator._setup_hardware.
The earlier cleanup pass over-deleted: openarm manipulator support,
the dimos/utils/workspace.py module, the audio STT node_whisper
edits, CI configs, README, command-center package-lock churn, and
two go2 LFS DBs.  None of those are part of the GR00T WBC story.

Restore each to origin/dev's content.  Drop one path that was added
on this branch but doesn't exist on dev (api/requirements.txt).

PR diff is now 24 files / +2623 / -210, all GR00T-WBC-relevant.
Both Mustafa's #1954 (already on dev) and the GR00T-WBC PR added
this method.  After the merge, both definitions live in
coordinator.py — Python uses the second, the first becomes dead
code, and the transport adapter (which depends on hardware_id
being passed) silently picks up its default ``"wholebody"`` topic
prefix instead of the component's id.

Collapse to one definition that passes the union of kwargs both
adapters need: dof, hardware_id, network_interface, domain_id,
address, plus **adapter_kwargs.  Adapters tolerate extras via
their constructor's **_ catch-all.
Switch unitree-g1-groot-wbc to the architecture Mustafa landed in
#1954 (G1WholeBodyConnection Module + TransportWholeBodyAdapter via
LCM bridge), matching unitree-g1-coordinator.  Single G1 low-level
pattern in the codebase now.

  * Delete dimos/hardware/whole_body/unitree/g1/adapter.py — the
    UnitreeG1LowLevelAdapter (single-process DDS) is gone.
  * Rewrite unitree_g1_groot_wbc.py to compose G1WholeBodyConnection
    + ControlCoordinator(adapter_type="transport_lcm") + dashboard
    via autoconnect.
  * Patch wholebody_connection.py to apply the macOS-cyclonedds fixes
    the dropped monolith was carrying:
      - Init(None, 0) instead of Init(callback, 10)
      - Hardcode mode_machine = 5 (the static value for 29-DOF G1)
      - publish_loop polls subscriber.Read() per tick
      - Drop the now-unused _on_low_state callback
_MJCF_PATH was a relative path "data/mujoco_sim/g1_gear_wbc.xml"
which only resolved when dimos was launched from the repo root.
The mujoco viewer subprocess inherits CWD from the launching shell,
so running ``dimos run unitree-g1-groot-wbc-sim`` from any other
directory tripped FileNotFoundError on viewer startup.

get_data("mujoco_sim") resolves the absolute install path and
auto-extracts the LFS tarball on first run.
  1. unitree_g1_groot_wbc_sim.py: resolve _MJCF_PATH via get_data so
     both MujocoSimModule and the DIMOS_MUJOCO_VIEW=1 subprocess open
     the file regardless of shell CWD.  Earlier the relative path
     "data/mujoco_sim/g1_gear_wbc.xml" only worked from the repo root.

  2. wholebody_connection.py: hardcoded mode_machine = 5 has no
     fallback if a future G1 firmware revision changes the value.
     Add a one-shot warning the first time we read a LowState whose
     mode_machine doesn't match.  Self-disables after the first log
     line so it doesn't spam the operator.

  3. test_unitree_g1_groot_wbc.py: composition smoke test verifying
     the three deployed modules, bridge adapter binding (transport_lcm,
     confirming the migration off the deleted unitree_g1 monolith),
     real-hw safety profile (auto_arm=False, auto_dry_run=True,
     default_ramp_seconds=10.0), servo_arms defaults, and all bridge
     + dashboard LCM topics.  No DDS, no hardware — just module
     imports.
Two safety fixes for back-to-back ``dimos run`` cycles on real
hardware:

  1. ``stop()`` now sends a final ``LowCmd_`` with ``mode=0x00``
     (motors disabled) for every motor slot before closing the DDS
     publisher.  Previously the last commanded pose lingered in the
     motor controllers — when the next process opened a publisher
     and re-released sport mode, those stale stiff commands fought
     the new participant during the handoff window, producing
     audible gearbox stress.

  2. ``_release_sport_mode()`` bails immediately if the first
     ``CheckMode()`` reports nothing active (``result is None`` or
     ``{"name": ""}``).  A clean second start now logs "Sport mode
     already released — skipping ReleaseMode" and skips the
     loop-and-poll dance entirely, removing the handoff window.
``dimos/project/test_no_sections.py`` forbids ``# -----`` separator
banners and ``# --- text ---`` inline section headers as a project
style rule.  Three files in this PR carried the pattern from earlier
drafts; convert inline-section to plain ``# text`` and drop pure
separators.

No code-behaviour change.
  * tick_loop.py / groot_wbc_task.py: import the class
    (``from dimos.msgs.geometry_msgs.PoseStamped import PoseStamped``,
    same for ``Twist``), not the module — module-as-type tripped
    mypy ``[valid-type]``.
  * coordinator.py: when constructing ``GrootWBCTask``, narrow ``hw``
    to ``ConnectedWholeBody`` via isinstance before passing
    ``hw.adapter`` (else mypy unions the manipulator + whole-body
    adapter types).  Also raises a clear TypeError if a non-whole-body
    component is referenced by a groot_wbc task.
  * test_unitree_g1_groot_wbc.py: type ``_coordinator_kwargs() ->
    dict[str, Any]`` so list/iter usages on the values typecheck.

``mypy dimos`` is clean (670 files).
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 8, 2026

Greptile Summary

Adds a GrootWBCTask that runs the GR00T balance/walk ONNX policies inside the coordinator tick loop, wired to two new blueprints (unitree-g1-groot-wbc real-hw and unitree-g1-groot-wbc-sim). The coordinator gains activate/dry_run/twist_command routing, ConnectedWholeBody resolves per-joint PD gains into MotorCommands, and the MuJoCo adapter communicates via an extended SHM layout that adds IMU and PD-gain/torque channels.

  • GrootWBCTask implements a 6-frame obs-history inference loop with an arming ramp (current pose → default_15), dry-run mode, and decimated ONNX inference; ControlCoordinator gains duck-typed arm()/disarm()/set_dry_run() dispatch via new activate/dry_run In-ports wired to dashboard buttons.
  • SimMujocoG1WholeBodyAdapter bridges coordinator ↔ MujocoSimModule via SHM; the sim's pre-step hook applies PD+feedforward torques from the latched gains instead of hard position commands.
  • G1WholeBodyConnection is patched for macOS cyclonedds reliability (Init(None,0) passive subscriber, hardcoded mode_machine=5 with one-shot mismatch warning) and emits a safe-stop lowcmd on disconnect.

Confidence Score: 3/5

The coordinator, blueprint wiring, and G1 DDS connection are solid. The core task's state-machine flags shared between the tick-loop and the operator-button subscription path need locking before this is used on real hardware.

The armed/arming/arm_pending flags are written from the LCM subscription thread and read/written from the 500 Hz tick thread without any lock, and arm() doesn't short-circuit when a ramp is already in progress. On real hardware these could allow a disarm command to be silently lost or a ramp to restart mid-motion.

dimos/control/tasks/groot_wbc_task.py (state-machine thread safety and arm() guard) and dimos/hardware/whole_body/mujoco/g1/adapter.py (SHM command write ordering)

Important Files Changed

Filename Overview
dimos/control/tasks/groot_wbc_task.py Core GR00T WBC task: state-machine flags (_arm_pending/_arming/_armed) are unprotected across the tick-loop and subscription threads, arm() mid-ramp silently restarts the ramp, and dry_run doesn't suppress ramp-phase commands.
dimos/control/coordinator.py Adds activate/dry_run In ports, duck-typed arm()/disarm()/set_dry_run() routing, and groot_wbc task factory; logic is clean and consistent with existing patterns.
dimos/control/hardware_interface.py Adds ConnectedWholeBody: resolves per-joint PD gains from wb_config, converts position commands to MotorCommand, non-blocking initialization. Logic looks correct.
dimos/hardware/whole_body/mujoco/g1/adapter.py Sim WholeBodyAdapter via SHM; write_motor_commands makes four sequential writes that transiently set CMD_MODE_POSITION before switching to CMD_MODE_PD_TAU, creating a race window with the sim pre-step hook.
dimos/robot/unitree/g1/wholebody_connection.py G1 DDS module patched for macOS: passive Init(None,0) subscriber, hardcoded mode_machine=5 with one-shot warning, safe-stop lowcmd on disconnect. Changes are well-reasoned and safe.
dimos/simulation/engines/mujoco_shm.py Extends SHM layout with IMU and per-joint PD gain/torque buffers; protocol is well-structured. Sequence counters are written by one side and read by the other, avoiding most races.
dimos/simulation/engines/mujoco_sim_module.py Adds IMU publishing and PD+feedforward pre-step hook; latches all four PD pieces before applying torques. Logic is correct but vulnerable to the partial-write race from the adapter side.
dimos/robot/unitree/g1/blueprints/basic/unitree_g1_groot_wbc.py Real-hardware blueprint: comes up unarmed + dry-run, 10s ramp, Arm/Dry-Run dashboard buttons wired via LCM. Transport mappings look complete and correct.
dimos/robot/unitree/g1/blueprints/basic/_groot_wbc_common.py Shared joint lists and PD gain tables lifted verbatim from g1-control-api config; single source of truth for both blueprints. Values look correct.
dimos/web/websocket_vis/websocket_vis_module.py Adds activate/dry_run Out[DimosBool] ports for Arm and Dry-Run dashboard buttons; additive change, no regressions to existing navigation UI.

Sequence Diagram

sequenceDiagram
    participant Op as Operator Dashboard
    participant WsVis as WebsocketVisModule
    participant Coord as ControlCoordinator
    participant WBC as GrootWBCTask
    participant CWB as ConnectedWholeBody
    participant Adapter as WholeBodyAdapter
    participant HW as G1 Hardware or MuJoCo

    Op->>WsVis: Click Arm
    WsVis->>Coord: "activate Bool=True"
    Coord->>WBC: arm()
    Note over WBC: _arm_pending = True

    loop 500 Hz Tick Loop
        Coord->>CWB: read_state()
        CWB->>Adapter: read_motor_states + read_imu
        Adapter-->>CWB: MotorState x29 + IMUState
        CWB-->>Coord: JointStateSnapshot
        Coord->>WBC: compute(state)
        Note over WBC: Ramp current pose to default_15 over 10s
        WBC-->>Coord: JointCommandOutput 15 joints
        Coord->>CWB: write_command positions SERVO_POSITION
        CWB->>Adapter: write_motor_commands MotorCommand q kp kd
        Adapter->>HW: DDS lowcmd or SHM PD tau
    end

    Op->>WsVis: Toggle Dry-Run OFF
    WsVis->>Coord: "dry_run Bool=False"
    Coord->>WBC: set_dry_run False
    Note over WBC: Policy outputs now emitted live

    Op->>WsVis: WASD key
    WsVis->>Coord: cmd_vel Twist
    Coord->>WBC: set_velocity_command vx vy wz
    Note over WBC: Walk model selected when cmd_norm exceeds threshold
Loading

Reviews (1): Last reviewed commit: "fix(g1-wbc): mypy errors uncovered by fu..." | Re-trigger Greptile

Comment on lines +256 to +270
# pure damping (kd-only) — this mirrors the reference backend's
# "hold current pose" inactive state.
self._active = False
self._armed = False
self._arming = False
self._arm_pending = False
self._dry_run = bool(config.auto_dry_run)
self._arming_duration = 0.0
self._arming_start_t = 0.0
self._ramp_start: np.ndarray | None = None
self._last_dry_run_log_t: float = 0.0

self._cmd_lock = threading.Lock()
self._cmd = np.zeros(3, dtype=np.float32)
self._last_cmd_time: float = 0.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 State-machine flags are unprotected across threads

arm(), disarm(), and set_dry_run() are called from the LCM/subscription thread (via _on_activatehandler()), while compute() runs in the TickLoop daemon thread. The flags _arm_pending, _arming, _armed, and the arrays _obs_buf/_last_action/_tick_count are accessed by both threads without a lock.

Concrete race on real hardware: the operator clicks Arm (sets _arm_pending=True), immediately clicks Disarm (sets _arm_pending=False, _arming=False), but compute() had already read _arm_pending=True and is mid-execution of the arm block — it then sets _arming=True, starting a ramp that the operator just cancelled. Similarly, _reset_policy_state() inside disarm() can zero _obs_buf while compute() is mid-inference on the same buffer.

The velocity command _cmd/_last_cmd_time pair is already protected by _cmd_lock; the same pattern should be applied (a single _state_lock) to the armed/arming/arm_pending flags and the inference state arrays.

Comment on lines +509 to +533
def arm(self, ramp_seconds: float | None = None) -> bool:
"""Begin the arming sequence.

``compute()`` will snapshot the current joint positions on the
next tick, lerp toward ``default_15`` over ``ramp_seconds``,
then flip ``_armed`` true and hand control to the ONNX policy.
A ramp of 0 arms immediately with no interpolation (what sim
uses — the subprocess already holds the MJCF's default pose).

Safe to call redundantly; subsequent calls while already armed
are ignored. No-op if the task is not ``_active``.
"""
if not self._active:
logger.warning(f"GrootWBCTask '{self._name}' arm() called before start() — ignoring")
return False
if self._armed:
logger.info(f"GrootWBCTask '{self._name}' already armed — arm() ignored")
return False
ramp = ramp_seconds if ramp_seconds is not None else self._config.default_ramp_seconds
self._arming_duration = max(0.0, float(ramp))
self._arm_pending = True
logger.info(
f"GrootWBCTask '{self._name}' arm requested (ramp={self._arming_duration:.1f}s)"
)
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 arm() during active ramp restarts it silently

arm() guards against self._armed but not against self._arming (ramp in progress) or self._arm_pending (queued). If the operator clicks Arm a second time during the 10-second ramp, _arm_pending is set again. On the next compute() tick the ramp-start snapshot is reset to the current mid-ramp pose and _arming_start_t is reset to now — the full ramp duration restarts from wherever the robot is. On real hardware, rapidly double-clicking or a network duplicate delivery could extend or thrash the ramp indefinitely. Adding an early-return for self._arming or self._arm_pending (analogous to the existing self._armed guard) would make the method idempotent once motion has started.

Comment on lines +189 to +208
def write_motor_commands(self, commands: list[MotorCommand]) -> bool:
if not self._connected or self._shm is None:
return False
if len(commands) != _NUM_MOTORS:
logger.error(
f"SimMujocoG1WholeBodyAdapter: expected {_NUM_MOTORS} commands, got {len(commands)}"
)
return False
# Flatten the per-motor command into per-joint arrays. POS_STOP
# ("no command") is replaced with 0.0 — the engine's PD only
# acts when kp > 0 anyway, so a zeroed q is harmless.
q = [cmd.q if cmd.q != POS_STOP else 0.0 for cmd in commands]
kp = [cmd.kp for cmd in commands]
kd = [cmd.kd for cmd in commands]
tau = [cmd.tau for cmd in commands]
self._shm.write_position_command(q)
self._shm.write_kp_command(kp)
self._shm.write_kd_command(kd)
self._shm.write_tau_command(tau)
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Partial-write window exposes CMD_MODE_POSITION to the sim pre-step hook

write_motor_commands issues four sequential SHM writes. write_position_command(q) sets CTRL_COMMAND_MODE = CMD_MODE_POSITION and increments SEQ_POSITION_CMD. write_kp_command(kp) then overwrites the mode to CMD_MODE_PD_TAU.

If the MuJoCo engine's _apply_shm_commands pre-step hook fires in between those two writes it observes CMD_MODE_POSITION, follows the else branch in the position-read block (engine.write_joint_command(JointState(position=...))), and bypasses the PD+feedforward torque path entirely for that step — the robot receives a direct hard position command instead of a soft PD torque. The GR00T policy expects the latter; a hard position step could cause a visible jerk in simulation.

A single-write approach (set mode only once after all four data fields are written, or use a transaction counter that the sim polls) would close this window.

Comment on lines +328 to +349
# Arming: lerp ramp_start → default_15 over arming_duration.
if self._arming:
assert self._ramp_start is not None
elapsed = state.t_now - self._arming_start_t
alpha = (
1.0 if self._arming_duration <= 0.0 else min(1.0, elapsed / self._arming_duration)
)
target = self._ramp_start + alpha * (self._default_15 - self._ramp_start)
self._last_targets = target.tolist()
if alpha >= 1.0:
self._arming = False
self._armed = True
self._reset_policy_state()
logger.info(
f"GrootWBCTask '{self._name}' ramp complete — policy armed "
f"({'dry-run' if self._dry_run else 'live'})"
)
return JointCommandOutput(
joint_names=self._joint_names_list,
positions=self._last_targets,
mode=ControlMode.SERVO_POSITION,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Ramp-phase commands are emitted even when dry_run=True

During the arming ramp (_arming=True), compute() always returns a JointCommandOutput, regardless of the _dry_run flag. The check if self._dry_run: return None only applies in the armed/inference branch below.

The real-hardware blueprint starts with auto_dry_run=True. When the operator then clicks Arm, the robot immediately begins interpolating toward default_15 and committing those position targets to the motors — despite dry-run being active. The PR description implies this is intentional, but the behaviour conflicts with the intuitive meaning of "dry run = no motor torques." At minimum, a comment in the ramp block and in the arm() docstring should call out that dry-run does not suppress ramp-phase commands.

@leshy
Copy link
Copy Markdown
Contributor

leshy commented May 8, 2026

2026-05-08_13-14

DIMOS_MUJOCO_VIEW=1 dimos run unitree-g1-groot-wbc-sim

crashes for me, I think you forgot to put something into LFS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants