Commit 7620898
committed
feat(simd_caps): CPUID 7,1 + AMX-FP16/AVX512-FP16/VP2INTERSECT bits + AMX OS-gate in cpu_ops
Salvages the detection-only subset of closed PR #190 — three real
gaps in the substrate runtime dispatch without inheriting any of
PR #190's consumer-facing additions (no SimdProfile enum, no
public dispatch-identity API, no cpu-* features).
What lands here:
1) CPUID leaf 7,1 read for AMX-FP16 (CPUID.07H.1H:EAX bit 21).
Lives on a different subleaf than the existing AMX bits;
GraniteRapids is the only silicon advertising it today.
Guarded by leaf 7,0 EAX >= 1 so older CPUs that don't expose
subleaf 1 stay correct.
2) Three new SimdCaps fields (additive, all default false on
non-x86):
- avx512fp16 — CPUID.07H.0H:EDX bit 23 — `__m512h` math.
Discriminates SPR-class from CascadeLake/
IceLakeSp/SkylakeX for any future FP16 kernel.
- avx512vp2intersect — CPUID.07H.0H:EDX bit 8 — TigerLake
mobile only; absent from Ice Lake-SP and every
later server part. Exposed for completeness.
- amx_fp16 — CPUID.07H.1H:EAX bit 21 — Granite Rapids.
Plus convenience methods has_avx512_fp16() and has_amx_fp16()
(the latter defense-in-depths the amx_tile bit).
3) AMX OS-state gate in cpu_ops() selection. The CPU-reports-AMX
path now AND-gates on `simd_amx::amx_available()` which runs
the full four-step check (CPUID + OSXSAVE + XCR0[17,18] +
arch_prctl(XCOMP_PERM, 18) on Linux 5.19+). This closes the
SIGILL hole when a hypervisor masks XCR0 or the OS hasn't
honoured the prctl: previously cpu_ops() would route to
CPU_OPS_AMX_INT8 and AMX instructions would SIGILL despite
the CPUID bit. Now it demotes to CPU_OPS_AVX512_VNNI cleanly.
What's deliberately NOT here (rejected from PR #190):
- No `SimdProfile` enum — would expose dispatch identity to
consumer code and invite `match profile { ... }` arms that
defeat the polyfill contract.
- No `cpu-*` cargo features — build-time silicon pinning that
defeats polyfill at an earlier binding time.
- No `simd_profile_probe` example — diagnostic-only, rebuilds
the SimdProfile surface this PR doesn't bring.
- No public dispatch-identity API at any layer. The new bits
are internal substrate detection; consumers continue to use
`crate::simd::*` polyfilled types and `crate::simd_runtime::*`
per-op trampolines.
The new fields slot into existing `cpu_ops()` selection by
extension (e.g. a future AMX-FP16 tier would AND-gate on
`caps.amx_fp16 && simd_amx::amx_available()` between the
AMX-INT8 and AVX-512-VNNI arms). No selection logic uses them
yet — they're laying the runway, not consuming it.
Tests:
- 4 new simd_caps tests: cpuid_extended_bits_smoke,
has_amx_fp16_requires_amx_tile,
x86_extended_bits_are_false_on_non_x86, plus extended
determinism coverage.
- All 6 existing cpu_ops tests still pass; the AMX OS-gate
change passes through transparently on hosts where
amx_available() agrees with CPUID (the typical case).
- fmt + clippy clean on `--features runtime-dispatch`.1 parent c10e1e0 commit 7620898
2 files changed
Lines changed: 133 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
74 | 92 | | |
75 | 93 | | |
76 | 94 | | |
| |||
124 | 142 | | |
125 | 143 | | |
126 | 144 | | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
127 | 148 | | |
128 | 149 | | |
129 | 150 | | |
| |||
143 | 164 | | |
144 | 165 | | |
145 | 166 | | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
146 | 179 | | |
147 | 180 | | |
148 | 181 | | |
| |||
160 | 193 | | |
161 | 194 | | |
162 | 195 | | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
163 | 199 | | |
164 | 200 | | |
165 | 201 | | |
| |||
192 | 228 | | |
193 | 229 | | |
194 | 230 | | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
195 | 234 | | |
196 | 235 | | |
197 | 236 | | |
| |||
221 | 260 | | |
222 | 261 | | |
223 | 262 | | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
224 | 266 | | |
225 | 267 | | |
226 | 268 | | |
| |||
275 | 317 | | |
276 | 318 | | |
277 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
278 | 337 | | |
279 | 338 | | |
280 | 339 | | |
| |||
511 | 570 | | |
512 | 571 | | |
513 | 572 | | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
| 636 | + | |
| 637 | + | |
514 | 638 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
180 | 180 | | |
181 | 181 | | |
182 | 182 | | |
183 | | - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
184 | 192 | | |
185 | 193 | | |
186 | 194 | | |
| |||
0 commit comments