Skip to content

Honor futex_waitv clockid and wake without polling#9

Merged
jserv merged 1 commit intomainfrom
futex-waitv
May 5, 2026
Merged

Honor futex_waitv clockid and wake without polling#9
jserv merged 1 commit intomainfrom
futex-waitv

Conversation

@jserv
Copy link
Copy Markdown
Contributor

@jserv jserv commented May 5, 2026

futex_waitv (SYS 449) is SYSCALL_DEFINE5 in the kernel: (waiters, nr_futexes, flags, timeout, clockid). The dispatch wrapper sc_futex_waitv was forwarding only x0..x3, dropping x4. The implementation hardcoded a CLOCK_MONOTONIC->CLOCK_REALTIME deadline conversion, so a guest asking for CLOCK_REALTIME got monotonic semantics regardless of what it passed. Wire x4 through and give sys_futex_waitv a clockid parameter. Validate against CLOCK_REALTIME or CLOCK_MONOTONIC when timeout is non-NULL (Linux returns EINVAL otherwise) and branch the absolute-deadline conversion on clockid.

The pre-existing waitv loop slept on a private cond that no wake site ever signalled; forward progress relied on a 50ms poll. Add optional group_lock / group_cond pointers to futex_waiter_t plus a futex_waiter_notify_group helper. Each wake site (futex_wake, futex_requeue, futex_wake_op both passes, futex_unlock_pi) calls it after marking woken=1 under the bucket lock. waitv blocks on shared.cond directly; the loop's bounded sleep is now 500ms, kept only so exit_group and timeout edges are still observed when no signal arrives.

Lock ordering is bucket -> group_lock; futex_waiter_notify_group is only called by wake sites that already hold the bucket lock, and the waitv thread never holds shared.lock while taking a bucket lock. Stack lifetime of shared.lock and shared.cond is protected by the bucket-lock pairing in waitv_unlink, which synchronizes with every wake's notify_group before the destroys run.

Locked in by tests/test-futex-waitv.c (19 cases): single-element wake returning index 0, multi-element wake returning the woken index, EAGAIN on stale val, eight EINVAL paths covering nr=0, nr>128, top-level flags, reserved!=0, element flags reserved bits, size!=U32, malformed CLOCK_REALTIME / CLOCK_MONOTONIC nsec, unaligned uaddr, NULL waiters_gva, bad clockid with timeout; two EFAULT paths via PROT_NONE pages; the inclusive nr_futexes==128 boundary; and ETIMEDOUT under both CLOCK_MONOTONIC and CLOCK_REALTIME deadlines. The CLOCK_REALTIME case exists specifically to catch a regression of the dropped-clockid bug. Verified against Linux ground truth via tests/qemu-runner.sh; all 19 cases match.


Summary by cubic

Fixes futex_waitv to honor the clockid argument and switches waitv to signal-driven wakeups, removing the 50ms poll and matching Linux behavior. This improves correctness (timeouts) and reduces wake latency.

  • Bug Fixes

    • Forward clockid to sys_futex_waitv; support CLOCK_REALTIME and CLOCK_MONOTONIC, return EINVAL for others when a timeout is provided.
    • Convert absolute deadlines based on the selected clock; validate timespec (EINVAL) and faulting pointers (EFAULT) to mirror Linux.
    • Validate per-element fields (reserved bits, size U32, 4-byte alignment), reject NULL waiters, and enforce 1..128 nr_futexes.
  • Refactors

    • Replace polling with group-condition signaling from all wake sites to unblock waitv; keep a 500ms bounded sleep to observe exit/timeouts.
    • Strengthen waiter unlinking to handle requeue races and enforce lock order (bucket -> group lock).
    • Add tests/test-futex-waitv (19 cases) with a Makefile rule using -lpthread, and register it in tests/manifest.txt.

Written for commit 1423b4a. Summary will update on new commits.

futex_waitv (SYS 449) is SYSCALL_DEFINE5 in the kernel: (waiters,
nr_futexes, flags, timeout, clockid). The dispatch wrapper sc_futex_waitv
was forwarding only x0..x3, dropping x4. The implementation hardcoded
a CLOCK_MONOTONIC->CLOCK_REALTIME deadline conversion, so a guest asking
for CLOCK_REALTIME got monotonic semantics regardless of what it passed.
Wire x4 through and give sys_futex_waitv a clockid parameter. Validate
against CLOCK_REALTIME or CLOCK_MONOTONIC when timeout is non-NULL
(Linux returns EINVAL otherwise) and branch the absolute-deadline
conversion on clockid.

The pre-existing waitv loop slept on a private cond that no wake site
ever signalled; forward progress relied on a 50ms poll. Add optional
group_lock / group_cond pointers to futex_waiter_t plus a
futex_waiter_notify_group helper. Each wake site (futex_wake,
futex_requeue, futex_wake_op both passes, futex_unlock_pi) calls it
after marking woken=1 under the bucket lock. waitv blocks on
shared.cond directly; the loop's bounded sleep is now 500ms, kept only
so exit_group and timeout edges are still observed when no signal
arrives.

Lock ordering is bucket -> group_lock; futex_waiter_notify_group is
only called by wake sites that already hold the bucket lock, and the
waitv thread never holds shared.lock while taking a bucket lock.
Stack lifetime of shared.lock and shared.cond is protected by the
bucket-lock pairing in waitv_unlink, which synchronizes with every
wake's notify_group before the destroys run.

Locked in by tests/test-futex-waitv.c (19 cases): single-element
wake returning index 0, multi-element wake returning the woken
index, EAGAIN on stale val, eight EINVAL paths covering nr=0,
nr>128, top-level flags, reserved!=0, element flags reserved bits,
size!=U32, malformed CLOCK_REALTIME / CLOCK_MONOTONIC nsec,
unaligned uaddr, NULL waiters_gva, bad clockid with timeout; two
EFAULT paths via PROT_NONE pages; the inclusive nr_futexes==128
boundary; and ETIMEDOUT under both CLOCK_MONOTONIC and
CLOCK_REALTIME deadlines. The CLOCK_REALTIME case exists
specifically to catch a regression of the dropped-clockid bug.
Verified against Linux ground truth via tests/qemu-runner.sh; all
19 cases match.
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 6 files

@jserv jserv merged commit 34c741f into main May 5, 2026
5 checks passed
@jserv jserv deleted the futex-waitv branch May 5, 2026 13:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant