Skip to content

dogfood: run diffscope on evalops platform PRs — replace / complement Cursor Bugbot #90

@haasonsaas

Description

@haasonsaas

Diffscope advertises "adaptive learning to suppress low-value recurring feedback" and "scoped custom context." That's exactly what evalops platform PRs need — Cursor Bugbot currently reviews there (see `evalops/platform#601`, `#603`, `#613`) and its hit rate is 2-for-3 in the last session (one real bug caught, two false positives).

The evalops org doesn't dogfood its own code-review engine on its own repos today. That's a miss on both sides: diffscope loses the best possible training signal (a busy monorepo with adaptive feedback patterns), and platform PRs don't benefit from diffscope's adaptive-suppression feature that's supposed to reduce noise over time.

Ask

  • Install diffscope as a GitHub Action on `evalops/platform`, side-by-side with Cursor Bugbot initially (not replacing — comparing)
  • Scoped rules matching the "same template" pattern this codebase produces (populated-field-zero-consumer, sort comparator coverage, etc.)
  • Track false-positive rate over 30 days; if diffscope's adaptive-suppression works as advertised, consider promoting it to the primary review surface
  • If it goes well, roll out to the other active repos (maestro-internal, chat, console, ensemble)

Related context

  • Cursor Bugbot's recent false-positive on `evalops/platform#613` (case-glob pattern claim, refuted empirically in the PR thread) — that's the kind of signal adaptive-suppression should learn to quiet
  • `evalops/platform#613`'s rank-coverage-check tool is exactly the kind of structural pattern diffscope's custom-context feature could ingest for higher-precision reviews

Scope

~3 days. Install + config + baseline metrics run. The adaptive learning happens after.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions