Skip to content

vitest worker SIGBUS on /dev/shm-constrained hosts with task cache enabled #353

@TheHolyWaffle

Description

@TheHolyWaffle

Describe the bug

On any host with a small /dev/shm (e.g. the default 64 MiB on GitLab Kubernetes runners, or Docker's default 64 MiB), vp run ... test reproducibly SIGBUSes a vitest forked worker as soon as the task's file-access tracking has written enough path records to fill /dev/shm:

Bus error (core dumped)
⎯⎯⎯⎯⎯⎯ Unhandled Errors ⎯⎯⎯⎯⎯⎯
Error: [vitest-pool]: Worker forks emitted error.
Caused by: Error: Worker exited unexpectedly

Probing the child worker env mid-run shows /dev/shm usage climbing fast:

SHM usage: Filesystem      Size  Used Avail Use% Mounted on
shm              64M   60M  4.6M  93% /dev/shm

This matches the 4 GiB shared-memory IPC mapping in crates/fspy/src/ipc.rs (SHM_CAPACITY = 4 * 1024 * 1024 * 1024). Once a child touches pages that can't be backed by /dev/shm, the mapping faults with SIGBUS — and the vitest forked worker process dies mid-test.

Setting cache: false on the task resolves UserCacheConfig::Disabled in vite_task_graph::config, which skips the fspy IPC channel entirely, and the same workload completes cleanly. So this is specifically a fspy::ipc::channel behavior under constrained /dev/shm.

Filed originally at voidzero-dev/vite-plus#1453, closed and moved here since the affected code lives in this repo. Related: #340 (LD_PRELOAD) and the target_env = "musl" gate added in #328 also touch this module.

Reproduction

https://github.com/TheHolyWaffle/vite-plus-sigbus-repro

git clone https://github.com/TheHolyWaffle/vite-plus-sigbus-repro
cd vite-plus-sigbus-repro
npm ci

# Fails — "Bus error (core dumped)" + "Worker exited unexpectedly" after ~10s
docker run --rm --shm-size=64m -v "$PWD":/work -w /work node:24.15.0 \
  bash -c 'npm ci --prefer-offline && npx vp run --filter "*" test --no-cache'

Change the test task in vite.config.ts from { command: 'vp test --run' } to { command: 'vp test --run', cache: false } and re-run — passes cleanly (~49 s). The repo also has a .github/workflows/repro.yml that runs both cases on ubuntu-latest under docker.

System info

  • vite-plus: 0.1.19 (also reproduced on 0.1.16). Pre-fspy versions (0.1.14) are not affected on the same workload.
  • Node: v24.15.0
  • Host for the repro: node:24.15.0 container (glibc, Debian base), aarch64 on Rancher Desktop. Also reproduced on x86_64 GitLab K8s runners with the same node image.
  • Docker: --shm-size=64m (matches typical K8s runner defaults).

Failing logs

=== RUN ===
 RUN  /work

⎯⎯⎯⎯⎯⎯ Unhandled Errors ⎯⎯⎯⎯⎯⎯

Vitest caught 1 unhandled error during the test run.

⎯⎯⎯⎯⎯⎯ Unhandled Error ⎯⎯⎯⎯⎯⎯⎯
Error: [vitest-pool]: Worker forks emitted error.
 ❯ EventEmitter.<anonymous> node_modules/@voidzero-dev/vite-plus-test/dist/chunks/cli-api.lDy4N9kC.js:3444:27
 ❯ EventEmitter.emit node:events:509:28
 ❯ ChildProcess.emitUnexpectedExit node_modules/@voidzero-dev/vite-plus-test/dist/chunks/cli-api.lDy4N9kC.js:3011:24
 ❯ ChildProcess.emit node:events:509:28
 ❯ Process.ChildProcess._handle.onexit node:internal/child_process:295:12

Caused by: Error: Worker exited unexpectedly
 ❯ ChildProcess.emitUnexpectedExit node_modules/@voidzero-dev/vite-plus-test/dist/chunks/cli-api.lDy4N9kC.js:3010:35
 ❯ ChildProcess.emit node:events:509:28
 ❯ Process.ChildProcess._handle.onexit node:internal/child_process:295:12

 Test Files  (1)
      Tests  (1)
     Errors  1 error
   Duration  10.13s

Bus error (core dumped)

Passing run with cache: false:

=== RUN (cache: false) ===
$ vp test --run --no-cache ⊘ cache disabled
 RUN  /work

 ✓ packages/heavy-io-test/src/heavy.spec.ts (1 test) 48896ms
 Test Files  1 passed (1)
      Tests  1 passed (1)
   Duration  49.02s

Possible fixes

  1. Probe /dev/shm size at startup and either reduce SHM_CAPACITY to fit, or fall back to a file/pipe-backed IPC when the backing is too small.
  2. Catch the SIGBUS (or detect the failed write via a sentinel) and degrade gracefully with a warning, instead of letting the test worker die.
  3. The CLI --no-cache flag disables cache hit/store but still resolves the task to UserCacheConfig::Enabled, so fspy tracking still runs. Either make --no-cache imply no tracking (what users typically expect in CI) or document the distinction so users know cache: false is the right knob.

Happy to test patches.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Priority

None yet

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions