Skip to content

FluffyLabs/jam-testing

Repository files navigation

jam-testing

Smoke testing, performance measurement, and fuzz testing suite for JAM implementations. Each team provides a Docker image that speaks the JAM Fuzz protocol, and the suite runs three stages against it:

  • Minifuzz — runs the bare-minimum forks/no_forks protocol examples first, then replays STF-based traces for fallback, safrole, storage, and storage_light suites and validates that the implementation returns the expected responses. Acts as a gate: if any minifuzz suite fails, performance tests are skipped.
  • Picofuzz — runs the same four STF suites (fallback, safrole, storage, storage_light) but does not check responses. Its only purpose is to measure block import performance (timings are displayed on the dashboard).
  • Fuzz testing — one implementation (the "source") generates random blocks and another (the "target") must process them without crashing. Currently graymatter is available as a fuzz source, run against both the JAM tiny and full specs. Every team gets two demo fuzz jobs on a shared runner — one per spec — sized so each fits the runner's budget (5 000 blocks for tiny, 500 for full); dedicated long-running runs cover both specs in a single matrix.

Status

The Performance column covers minifuzz (conformance gate) + picofuzz (timing). Demo (tiny) and Demo (full) are short fuzz runs on a shared runner executing the JAM tiny and full specs respectively (5 000 blocks for tiny, 500 for full — full-spec blocks are heavier, so the count is scaled down to fit a comparable wall-time budget). Long-run is a dedicated, multi-hour fuzz run that exercises both specs in a matrix (single badge — red if either spec fails). Targets pick which spec to run from the JAM_FUZZ_SPEC environment variable; the matching --spec <value> is also passed to the graymatter source command by the workflow.

Team Performance Demo (tiny) Demo (full) Long-run
typeberry Performance: typeberry Demo (tiny): typeberry Demo (full): typeberry Fuzz: typeberry
pyjamaz Performance: pyjamaz Demo (tiny): pyjamaz Demo (full): pyjamaz
boka Performance: boka Demo (tiny): boka Demo (full): boka
turbojam Performance: turbojam Demo (tiny): turbojam Demo (full): turbojam Fuzz: turbojam
graymatter Performance: graymatter Demo (tiny): graymatter Demo (full): graymatter
jam4s Performance: jam4s Demo (tiny): jam4s Demo (full): jam4s
pbnjam Performance: pbnjam Demo (tiny): pbnjam Demo (full): pbnjam
javajam Performance: javajam Demo (tiny): javajam Demo (full): javajam
jamforge Performance: jamforge Demo (tiny): jamforge Demo (full): jamforge
jotl Performance: jotl Demo (tiny): jotl Demo (full): jotl
jamzilla Performance: jamzilla Demo (tiny): jamzilla Demo (full): jamzilla
jamzilla-int Performance: jamzilla-int Demo (tiny): jamzilla-int Demo (full): jamzilla-int
jampy Performance: jampy Demo (tiny): jampy Demo (full): jampy
jampy-recompiler Performance: jampy-recompiler Demo (tiny): jampy-recompiler Demo (full): jampy-recompiler
new-jamneration Performance: new-jamneration Demo (tiny): new-jamneration Demo (full): new-jamneration
vinwolf Performance: vinwolf Demo (tiny): vinwolf Demo (full): vinwolf
jamduna Performance: jamduna Demo (tiny): jamduna Demo (full): jamduna
jamzig Performance: jamzig Demo (tiny): jamzig Demo (full): jamzig
tessera Performance: tessera Demo (tiny): tessera Demo (full): tessera
tsjam Performance: tsjam Demo (tiny): tsjam Demo (full): tsjam
jambda Performance: jambda Demo (tiny): jambda Demo (full): jambda
jamixir Performance: jamixir Demo (tiny): jamixir Demo (full): jamixir
spacejam Performance: spacejam Demo (tiny): spacejam Demo (full): spacejam

Long-running fuzzing (dedicated)

If your team wants extended fuzz runs (more blocks, multiple runs, dedicated runner), reach out by commenting on issue #1. We'll set up a dedicated <team>-fuzz.yml workflow with a self-hosted runner labeled for your team. Long-running workflows run a [tiny, full] matrix on a single badge — both specs share one workflow file, so size num_blocks so the per-spec budget × 2 fits your runner's wall-time window.

How it works

  1. A reusable GitHub Actions workflow pulls your Docker image, starts it with a shared Unix socket volume, and runs tests against it.
  2. Minifuzz runs first as a gate. It has two stages:
    • Bare-minimum examples (forks, no_forks) — validates protocol basics.
    • STF conformance (fallback, safrole, storage, storage_light) — replays pre-captured request-response pairs and checks that the implementation returns the expected responses. If any minifuzz suite fails, picofuzz is skipped entirely.
  3. Picofuzz runs the same four STF suites and collects per-trace timing statistics (it does not verify responses).
  4. Each team has its own workflow file (e.g. typeberry-performance.yml) that passes team-specific config (image, command, env vars, memory) to the reusable workflow.
  5. Tests run on a self-hosted runner. Timing results (CSV with per-trace percentiles) are uploaded as artifacts and displayed on the dashboard.

Readiness detection

The suite needs to know when your implementation is ready to accept connections on the Unix socket. Two modes are supported:

  • Log pattern (recommended if your impl prints a startup message): set readiness_pattern to a regex matching your ready log line.
  • Socket probe (default): polls for the socket file to appear inside the Docker volume. Works with any implementation, no config needed.

Adding your team

  1. Provide a Docker image that speaks the JAM Fuzz protocol and follows the standard target packaging. The image must be publicly pullable (or accessible to the runner).

    The harness sets the standard env vars on every target container, and your image must read its configuration from them rather than from CLI args:

    Env var Value
    JAM_FUZZ 1
    JAM_FUZZ_SPEC tiny or full (per-workflow)
    JAM_FUZZ_DATA_PATH /shared/data
    JAM_FUZZ_SOCK_PATH /shared/jam_target.sock
    JAM_FUZZ_LOG_LEVEL debug

    Concretely:

    • Your image must support both tiny and full and pick the right one from JAM_FUZZ_SPEC.
    • Your image must read the socket path from JAM_FUZZ_SOCK_PATH and be launchable with its own default CMDdo not set docker_cmd in any of the workflow files below. New teams should leave it unset.
    • Anything you put in docker_env is appended after the standard vars and can override them; use it for impl-specific tuning only.
    • JAM_FUZZ_DATA_PATH is wiped between sequential fuzz-source runs to match official testing's fresh-init behavior.

    docker_cmd exists only as a backwards-compatibility shim for a few already-onboarded targets that predate standard packaging — when set, the legacy {TARGET_SOCK} placeholder is substituted with JAM_FUZZ_SOCK_PATH. Full-spec workflows must omit it unconditionally.

  2. Create the performance workflow at .github/workflows/<team>-performance.yml:

    name: "Performance: myteam"
    
    on:
      schedule:
        - cron: '0 6 * * *'
      workflow_dispatch:
    
    jobs:
      test:
        uses: ./.github/workflows/reusable-picofuzz.yml
        with:
          target_name: myteam
          docker_image: 'ghcr.io/myorg/myimage:latest'
          # Optional overrides (do NOT set docker_cmd — see step 1):
          # docker_env: 'MY_VAR=value'
          # docker_memory: '512m'
          # docker_platform: 'linux/amd64'
          # readiness_pattern: 'Server ready'
  3. Create the two demo fuzz workflows. One for tiny, one for full. Both rely on standard target packaging (env-only invocation); neither should set docker_cmd:

    # .github/workflows/myteam-demo-tiny.yml
    name: "Demo (tiny): myteam"
    
    on:
      schedule:
        - cron: '0 18 * * *'
      workflow_dispatch:
      pull_request:
        paths:
          - '.github/workflows/myteam-demo-tiny.yml'
          - '.github/workflows/demo-source.yml'
    
    permissions:
      contents: read
      issues: write
    
    jobs:
      demo:
        uses: ./.github/workflows/demo-source.yml
        with:
          target_name: myteam
          docker_image: 'ghcr.io/myorg/myimage:latest'
          spec: tiny
          mention: yourgithub
    # .github/workflows/myteam-demo-full.yml
    # Identical to demo-tiny except: spec: full and num_blocks scaled down.
    name: "Demo (full): myteam"
    
    on:
      schedule:
        - cron: '0 18 * * *'
      workflow_dispatch:
      pull_request:
        paths:
          - '.github/workflows/myteam-demo-full.yml'
          - '.github/workflows/demo-source.yml'
    
    permissions:
      contents: read
      issues: write
    
    jobs:
      demo:
        uses: ./.github/workflows/demo-source.yml
        with:
          target_name: myteam
          docker_image: 'ghcr.io/myorg/myimage:latest'
          spec: full
          num_blocks: 500   # full-spec blocks are heavier; tiny uses the 5000 default
          mention: yourgithub

    Your target image must support both tiny and full (selected via JAM_FUZZ_SPEC). The --spec <value> argument is passed to the graymatter source by the workflow; your target receives no spec-related CLI args.

  4. Create a team directory at teams/<team>/ for any team-specific scripts or data you might add later.

  5. Open a PR and trigger the workflows via workflow_dispatch to verify everything works.

Workflow inputs reference

Input Required Default Description
target_name yes Your implementation name
docker_image yes Full image reference
docker_cmd no "" Legacy compat shim — new teams should leave this unset. Override container command for pre-standard-packaging targets; {TARGET_SOCK} is substituted with JAM_FUZZ_SOCK_PATH. Must be empty in any full-spec workflow.
docker_env no "" Space-separated KEY=VALUE pairs passed as -e flags
docker_memory no "512m" Container memory limit
docker_platform no "linux/amd64" Platform for docker pull
readiness_pattern no "" Regex matched against stdout to detect readiness
timeout_minutes no 10 Per-suite timeout
test_suites no all four JSON array of picofuzz suite names to run
minifuzz_suites no all six JSON array of minifuzz suite names to run
spec no "tiny" (Demo / long-run only) JAM spec to test against (tiny or full). The reusable workflow sets JAM_FUZZ_SPEC env on the target and appends --spec <value> to the graymatter source.

Running locally

# Install dependencies
npm ci

# Build the picofuzz Docker image
npm run build-docker -w @fluffylabs/picofuzz

# Pull the target image
docker pull --platform=linux/amd64 ghcr.io/fluffylabs/typeberry:latest

# Prepare results directory
mkdir -p ./picofuzz-result

# Run a single suite
TARGET_NAME=typeberry \
TARGET_IMAGE='ghcr.io/fluffylabs/typeberry:latest' \
TARGET_CMD='--version=1 fuzz-target {TARGET_SOCK}' \
TARGET_READINESS_PATTERN='PVM Backend' \
npx tsx --test tests/picofuzz/fallback.test.ts

Environment variables

Variable Required Description
TARGET_NAME yes Implementation name
TARGET_IMAGE yes Docker image to test
TARGET_CMD no Container command override ({TARGET_SOCK} is replaced with the socket path). Empty/unset uses the image's default CMD.
TARGET_ENV no Space-separated KEY=VALUE pairs
TARGET_MEMORY no Container memory limit (default 512m)
TARGET_READINESS_PATTERN no Regex for log-based readiness; default is socket-probe

Regenerating minifuzz traces

The minifuzz-traces/ directory contains pre-captured request-response pairs generated by running picofuzz in capture mode against typeberry (reference implementation). These traces must be regenerated whenever the STF test data in picofuzz-stf-data/ is updated:

./minifuzz-traces/populate.sh

The script builds picofuzz, pulls typeberry, and runs capture for all four suites. It tracks the STF data version and skips regeneration if already up-to-date.

Project structure

.github/workflows/
  ci.yml                        # Unit tests for the suite itself
  deploy-dashboard.yml          # Builds and deploys the dashboard to GitHub Pages
  reusable-picofuzz.yml         # Core reusable workflow (minifuzz + picofuzz)
  demo-source.yml               # Reusable demo fuzz source workflow (tiny|full)
  graymatter-fuzz-source.yml    # Reusable long-running fuzz source workflow
  <team>-performance.yml        # Per-team performance workflow files
  <team>-demo-tiny.yml          # Per-team demo fuzz against the tiny spec
  <team>-demo-full.yml          # Per-team demo fuzz against the full spec
  <team>-fuzz.yml               # Per-team long-running fuzz (matrix over [tiny, full])
                                #   — only for teams with dedicated runners
agents.md                     # Contributor / AI-agent guide for this repo
minifuzz/                     # Minifuzz Docker image (Python fuzz example runner)
minifuzz-traces/              # Captured request-response pairs from typeberry
  populate.sh                 # Script to regenerate traces
  {suite}/                    # Per-suite trace files (fallback, safrole, etc.)
picofuzz/                     # Picofuzz tool (fuzz protocol client + capture mode)
scripts/                      # Helpers used by deploy-dashboard.yml (CSV → JSON, history)
tests/
  common.ts                   # Target startup & shared helpers
  external-process.ts         # Docker process management
  config.test.ts              # Unit tests over the workflow config files
  minifuzz/
    common.ts                 # Minifuzz test harness
    *.test.ts                 # Minifuzz suites (forks, no_forks, fallback, safrole, ...)
  picofuzz/
    common.ts                 # Picofuzz test harness
    *.test.ts                 # Per-suite test files
  fuzz-source/
    common.ts                 # Fuzz source test harness
    fuzz.test.ts              # Fuzz source test entry point
teams/<team>/                 # Team-specific scripts & data
dashboard/                    # Git submodule: jam-conformance-dashboard (Next.js app)
picofuzz-stf-data/            # Git submodule: STF test traces
picofuzz-conformance-data/    # Git submodule: jam-conformance (minifuzz examples)

About

JAM Testing self-service

Resources

Stars

Watchers

Forks

Contributors