Skip to content

Ivan/record replay wrap#1976

Open
leshy wants to merge 40 commits intolegacy-dev-dont-mergefrom
ivan/record-replay-wrap
Open

Ivan/record replay wrap#1976
leshy wants to merge 40 commits intolegacy-dev-dont-mergefrom
ivan/record-replay-wrap

Conversation

@leshy
Copy link
Copy Markdown
Contributor

@leshy leshy commented May 5, 2026

Problem

Closes DIM-XXX

Solution

Breaking Changes

How to Test

Contributor License Agreement

  • I have read and approved the CLA.

@leshy leshy marked this pull request as ready for review May 5, 2026 07:15
Comment thread dimos/record/record_replay.py Outdated
Comment thread dimos/core/module.py Outdated
Comment thread dimos/core/module.py
_SANITIZE_RE = re.compile(r"[^A-Za-z0-9_]")


def topic_to_stream_name(channel: str) -> str:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no need for update_stream_meta and store metadata interventions to save the type of a stream and full stream name, given this function is reversible? streams know their type in their metadata, so they can reconstruct full topics?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the function is reversible. /color/image and /color_image both get transformed to _color_image, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no you are right

Comment thread dimos/core/coordination/module_coordinator.py
Comment thread dimos/core/coordination/module_coordinator.py
Comment thread dimos/record/record_replay.py
Comment thread dimos/memory2/observationstore/sqlite.py
@dimensionalOS dimensionalOS deleted a comment from greptile-apps Bot May 5, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 5, 2026

Greptile Summary

This PR introduces a RecordReplay class backed by SQLite for recording and replaying pub/sub topics, wires it into ModuleCoordinator so that --replay-file and --option record_path= are handled automatically at blueprint build time, and ships a Textual TUI (recorder) for interactive recording, playback, seek, and trim.

  • RecordReplay: new core class with start_recording, stop_recording, play, seek, trim, delete_range, pause/resume, and async with support.
  • ModuleCoordinator.replay(): auto-disables modules whose outputs are covered by a recording, then starts playback.
  • RecorderApp: full Textual TUI with LCM spy integration, sparklines, timeline bar, and keybinding-driven workflow.

Confidence Score: 3/5

Not ready to merge: the module-level docstring advertises two broken API calls that raise TypeError, and the RecordReplay object created in replay() leaks its SQLite connection and background task.

The primary usage docstring shows rec.start_recording([LCM()]) missing the required topics argument and rec.play(pubsub=LCM(), ...) with a nonexistent parameter — both raise TypeError immediately. The RecordReplay instance in ModuleCoordinator.replay() is a local variable never stored on the coordinator, so close() is never called on shutdown and the SQLite handle and playback task both leak.

dimos/record/record_replay.py (broken docstring examples), dimos/core/coordination/module_coordinator.py (leaked RecordReplay instance, no LCM publish during playback)

Important Files Changed

Filename Overview
dimos/record/record_replay.py New RecordReplay class; module docstring contains two broken API examples that raise TypeError for users who copy them.
dimos/core/coordination/module_coordinator.py Adds replay() classmethod and auto-record wiring; RecordReplay never stored/closed (previously flagged), messages not published to LCM during playback (previously flagged), malformed warning string, no log when record_path set but record_modules empty.
dimos/core/module.py Adds start_recording_outputs / stop_recording_outputs RPCs wired into stop(); logic appears sound.
dimos/utils/cli/recorder/run_recorder.py New Textual TUI for interactive recording and playback; seek and trim operations look correct.
dimos/memory2/observationstore/sqlite.py Adds delete_range with proper commit/rollback outside the if-ids branch.
dimos/core/coordination/blueprints.py Adds record_modules field and default_record_modules() builder; propagated correctly through autoconnect.
dimos/record/test_record_replay.py New async test suite covering record, query, duration, trim, delete, seek, playback, and multi-pubsub.

Comments Outside Diff (1)

  1. dimos/record/record_replay.py, line 663-668 (link)

    P1 Module docstring shows invalid start_recording and play signatures

    The usage example at line 664 calls rec.start_recording([LCM()]) with only one positional argument, but the actual signature is start_recording(pubsubs, topics)topics is required. Any user copying this example will immediately get TypeError: start_recording() missing 1 required positional argument: 'topics'.

    Line 668 likewise shows rec.play(pubsub=LCM(), speed=1.0), but play() has no pubsub parameter (see line 851). Calling it that way raises TypeError: play() got an unexpected keyword argument 'pubsub'.

Reviews (5): Last reviewed commit: "Merge remote-tracking branch 'origin/dev..." | Re-trigger Greptile

result.append(info)
return tuple(result)

def play(self, speed: float = 1.0) -> None:
Copy link
Copy Markdown
Contributor

@paul-nechifor paul-nechifor May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment at the start of the file says rec.play(pubsub=LCM(), speed=1.0). Where is this play actually publishing?

rec.stop_recording()

# Replay into LCM (viewable via rerun-bridge)
rec.play(pubsub=LCM(), speed=1.0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have docs for replay in docs. Note, BTW, that rec.play doesn't take pubsub currently.

Comment on lines +164 to +165
stream_name = topic_to_stream_name(topic.pattern)
self._store.stream(stream_name, type(msg)).append(msg)
Copy link
Copy Markdown
Contributor

@paul-nechifor paul-nechifor May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not storing the topic. How does the stream know where the message should be replayed to?

The previous PR stored the topic. This tries to guess that the topic is.

else:
from typing import TypeGuard as TypeIs

RERUN_GRPC_PORT = 9876
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 9877 to 9876 intentional? There are hardcoded places where the 9877 is still used:

dimos/core/test_daemon.py:47:    grpc_port: int = 9877,
dimos/core/test_daemon.py:143:        entry = _make_entry(pid=os.getpid(), grpc_port=9877)
dimos/core/test_daemon.py:146:        conflict = check_port_conflicts(grpc_port=9877)
dimos/core/test_daemon.py:154:        conflict = check_port_conflicts(grpc_port=9877)
dimos/core/run_registry.py:47:    grpc_port: int = 9877
dimos/core/run_registry.py:144:def check_port_conflicts(grpc_port: int = 9877) -> RunEntry | None:
dimos/visualization/rerun/bridge.py:196:    connect_url: str = "rerun+http://127.0.0.1:9877/proxy"


@rpc
def start(self) -> None:
# Delay import to reduce import time (~2.4s)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed from dimos.msgs.sensor_msgs.PointCloud2 import register_colormap_annotation but left the comment.

# Conflicts:
#	dimos/core/global_config.py
#	dimos/robot/cli/dimos.py
#	dimos/robot/unitree/type/lidar.py
#	pyproject.toml
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
1883 1 1882 59
View the top 1 failed test(s) by shortest run time
dimos.robot.cli.test_cli_startup::test_help_startup_time
Stack Traces | 10.2s run time
def test_help_startup_time() -> None:
        """`dimos --help` must finish in under {HELP_TIMEOUT_SECONDS}s."""
        start = time.monotonic()
        result = subprocess.run(
            [sys.executable, "-m", "dimos.robot.cli.dimos", "--help"],
            capture_output=True,
            text=True,
            timeout=HELP_TIMEOUT_SECONDS + 5,  # hard kill safety margin
        )
        elapsed = time.monotonic() - start
        assert result.returncode == 0, f"dimos --help failed:\n{result.stderr}"
>       assert elapsed < HELP_TIMEOUT_SECONDS, (
            f"dimos --help took {elapsed:.1f}s (limit: {HELP_TIMEOUT_SECONDS}s). "
            f"Check for heavy imports in the CLI entrypoint or GlobalConfig."
        )
E       AssertionError: dimos --help took 10.2s (limit: 8s). Check for heavy imports in the CLI entrypoint or GlobalConfig.
E       assert 10.191117729060352 < 8

.../robot/cli/test_cli_startup.py:60: AssertionError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants