scripts/cfify: don't re-anchor CFA on scratch movq %rsp, %REG#1128
Conversation
CBMC Results (ML-DSA-65, REDUCE-RAM)Full Results (200 proofs)
|
CBMC Results (ML-DSA-44, REDUCE-RAM)Full Results (200 proofs)
|
CBMC Results (ML-DSA-87, REDUCE-RAM)Full Results (200 proofs)
|
CBMC Results (ML-DSA-87)Full Results (200 proofs)
|
CBMC Results (ML-DSA-44)Full Results (200 proofs)
|
CBMC Results (ML-DSA-65)Full Results (200 proofs)
|
The original `mov rsp, %REG` -> `.cfi_def_cfa_register %REG` rule fires on every `movq %rsp, %REG`, including scratch base-pointer copies (e.g. the `rep movsb` source setup in mlkem-native's rej_uniform_avx2_asm.S) with no intent to re-anchor. That misclassifies the scratch copy as an alignment anchor, drops the legitimate `.cfi_adjust_cfa_offset` on the matching `addq $N, %rsp`, and emits a spurious `.cfi_def_cfa_register`. cfify now scans forward to the next ret and only re-anchors when a matching `movq %REG, %rsp` restore is found in the same function body. Ported from mlkem-native commit 6ac47cb41. mldsa-native has no inputs that hit the bad path today (the only `movq %rsp, %REG` site, in keccak_f1600_x4_avx2_asm.S, has a matching restore), so the change is preventative; regenerated assembly is byte-identical. Signed-off-by: Hanno Becker <beckphan@amazon.co.uk>
3fc0719 to
219b81a
Compare
The Cortex-A76 runner is currently erroring due to a full disk. This commit (temporarily) removes it from the benchmarking CI. Signed-off-by: Matthias J. Kannwischer <matthias@zerorisc.com>
mkannwischer
left a comment
There was a problem hiding this comment.
LGTM.
The benchmarking CI was failing due to the A76 runner having a full disk. I can't access that machine anymore, so I have temporarily disabled the benchmarks in a commit stapled onto this PR.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46509 cycles |
46505 cycles |
1.00 |
ML-DSA-44 sign |
131099 cycles |
131088 cycles |
1.00 |
ML-DSA-44 verify |
47318 cycles |
47315 cycles |
1.00 |
ML-DSA-65 keypair |
81691 cycles |
81692 cycles |
1.00 |
ML-DSA-65 sign |
215354 cycles |
215328 cycles |
1.00 |
ML-DSA-65 verify |
79304 cycles |
79306 cycles |
1.00 |
ML-DSA-87 keypair |
132414 cycles |
132407 cycles |
1.00 |
ML-DSA-87 sign |
277494 cycles |
277479 cycles |
1.00 |
ML-DSA-87 verify |
134238 cycles |
134234 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Mac Mini (M1, 2020) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112803 cycles |
112779 cycles |
1.00 |
ML-DSA-44 sign |
401151 cycles |
400873 cycles |
1.00 |
ML-DSA-44 verify |
120185 cycles |
120116 cycles |
1.00 |
ML-DSA-65 keypair |
192892 cycles |
192884 cycles |
1.00 |
ML-DSA-65 sign |
649940 cycles |
649918 cycles |
1.00 |
ML-DSA-65 verify |
192941 cycles |
192952 cycles |
1.00 |
ML-DSA-87 keypair |
318749 cycles |
318774 cycles |
1.00 |
ML-DSA-87 sign |
828889 cycles |
828746 cycles |
1.00 |
ML-DSA-87 verify |
326660 cycles |
326679 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
43960 cycles |
44008 cycles |
1.00 |
ML-DSA-44 sign |
133454 cycles |
133367 cycles |
1.00 |
ML-DSA-44 verify |
46018 cycles |
45934 cycles |
1.00 |
ML-DSA-65 keypair |
76054 cycles |
76220 cycles |
1.00 |
ML-DSA-65 sign |
217883 cycles |
218523 cycles |
1.00 |
ML-DSA-65 verify |
75623 cycles |
75768 cycles |
1.00 |
ML-DSA-87 keypair |
124534 cycles |
124192 cycles |
1.00 |
ML-DSA-87 sign |
276511 cycles |
276611 cycles |
1.00 |
ML-DSA-87 verify |
121693 cycles |
121754 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 4th gen (c7i) (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
94179 cycles |
94276 cycles |
1.00 |
ML-DSA-44 sign |
330067 cycles |
330354 cycles |
1.00 |
ML-DSA-44 verify |
98853 cycles |
99007 cycles |
1.00 |
ML-DSA-65 keypair |
161804 cycles |
161625 cycles |
1.00 |
ML-DSA-65 sign |
538872 cycles |
538458 cycles |
1.00 |
ML-DSA-65 verify |
160314 cycles |
160254 cycles |
1.00 |
ML-DSA-87 keypair |
264240 cycles |
264244 cycles |
1.00 |
ML-DSA-87 sign |
695949 cycles |
695352 cycles |
1.00 |
ML-DSA-87 verify |
266114 cycles |
265677 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
55943 cycles |
55677 cycles |
1.00 |
ML-DSA-44 sign |
165542 cycles |
165561 cycles |
1.00 |
ML-DSA-44 verify |
58036 cycles |
58151 cycles |
1.00 |
ML-DSA-65 keypair |
95639 cycles |
95533 cycles |
1.00 |
ML-DSA-65 sign |
268516 cycles |
267735 cycles |
1.00 |
ML-DSA-65 verify |
96401 cycles |
96655 cycles |
1.00 |
ML-DSA-87 keypair |
154964 cycles |
155415 cycles |
1.00 |
ML-DSA-87 sign |
328632 cycles |
327721 cycles |
1.00 |
ML-DSA-87 verify |
152057 cycles |
151942 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
112675 cycles |
112549 cycles |
1.00 |
ML-DSA-44 sign |
354836 cycles |
354929 cycles |
1.00 |
ML-DSA-44 verify |
117386 cycles |
117407 cycles |
1.00 |
ML-DSA-65 keypair |
194785 cycles |
194820 cycles |
1.00 |
ML-DSA-65 sign |
585048 cycles |
585672 cycles |
1.00 |
ML-DSA-65 verify |
193353 cycles |
193448 cycles |
1.00 |
ML-DSA-87 keypair |
321471 cycles |
321313 cycles |
1.00 |
ML-DSA-87 sign |
751124 cycles |
750153 cycles |
1.00 |
ML-DSA-87 verify |
319101 cycles |
318389 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
46741 cycles |
46670 cycles |
1.00 |
ML-DSA-44 sign |
142957 cycles |
146804 cycles |
0.97 |
ML-DSA-44 verify |
49803 cycles |
51371 cycles |
0.97 |
ML-DSA-65 keypair |
83298 cycles |
82362 cycles |
1.01 |
ML-DSA-65 sign |
228292 cycles |
227896 cycles |
1.00 |
ML-DSA-65 verify |
82882 cycles |
82501 cycles |
1.00 |
ML-DSA-87 keypair |
129997 cycles |
130440 cycles |
1.00 |
ML-DSA-87 sign |
279952 cycles |
280103 cycles |
1.00 |
ML-DSA-87 verify |
129518 cycles |
128666 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 3rd gen (c6a) (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
135142 cycles |
135051 cycles |
1.00 |
ML-DSA-44 sign |
526136 cycles |
527871 cycles |
1.00 |
ML-DSA-44 verify |
148081 cycles |
148230 cycles |
1.00 |
ML-DSA-65 keypair |
225257 cycles |
223667 cycles |
1.01 |
ML-DSA-65 sign |
855205 cycles |
850261 cycles |
1.01 |
ML-DSA-65 verify |
235006 cycles |
232851 cycles |
1.01 |
ML-DSA-87 keypair |
371721 cycles |
372601 cycles |
1.00 |
ML-DSA-87 sign |
1072897 cycles |
1074087 cycles |
1.00 |
ML-DSA-87 verify |
384487 cycles |
384763 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
67270 cycles |
67263 cycles |
1.00 |
ML-DSA-44 sign |
201449 cycles |
201398 cycles |
1.00 |
ML-DSA-44 verify |
70386 cycles |
70245 cycles |
1.00 |
ML-DSA-65 keypair |
119203 cycles |
119311 cycles |
1.00 |
ML-DSA-65 sign |
328313 cycles |
328465 cycles |
1.00 |
ML-DSA-65 verify |
116909 cycles |
116854 cycles |
1.00 |
ML-DSA-87 keypair |
196485 cycles |
196650 cycles |
1.00 |
ML-DSA-87 sign |
424532 cycles |
424697 cycles |
1.00 |
ML-DSA-87 verify |
193215 cycles |
192976 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
AMD EPYC 4th gen (c7a) (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
118824 cycles |
120211 cycles |
0.99 |
ML-DSA-44 sign |
445469 cycles |
450772 cycles |
0.99 |
ML-DSA-44 verify |
128668 cycles |
129282 cycles |
1.00 |
ML-DSA-65 keypair |
201760 cycles |
202767 cycles |
1.00 |
ML-DSA-65 sign |
717592 cycles |
720219 cycles |
1.00 |
ML-DSA-65 verify |
206570 cycles |
210509 cycles |
0.98 |
ML-DSA-87 keypair |
333323 cycles |
333010 cycles |
1.00 |
ML-DSA-87 sign |
915820 cycles |
913423 cycles |
1.00 |
ML-DSA-87 verify |
341127 cycles |
340380 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
61949 cycles |
61829 cycles |
1.00 |
ML-DSA-44 sign |
190920 cycles |
191167 cycles |
1.00 |
ML-DSA-44 verify |
66632 cycles |
66571 cycles |
1.00 |
ML-DSA-65 keypair |
112362 cycles |
112000 cycles |
1.00 |
ML-DSA-65 sign |
319275 cycles |
319742 cycles |
1.00 |
ML-DSA-65 verify |
111903 cycles |
111741 cycles |
1.00 |
ML-DSA-87 keypair |
174508 cycles |
172978 cycles |
1.01 |
ML-DSA-87 sign |
387327 cycles |
385814 cycles |
1.00 |
ML-DSA-87 verify |
173862 cycles |
175361 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton4 (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
128463 cycles |
128473 cycles |
1.00 |
ML-DSA-44 sign |
445201 cycles |
444962 cycles |
1.00 |
ML-DSA-44 verify |
136660 cycles |
136562 cycles |
1.00 |
ML-DSA-65 keypair |
220324 cycles |
220140 cycles |
1.00 |
ML-DSA-65 sign |
717850 cycles |
718759 cycles |
1.00 |
ML-DSA-65 verify |
220959 cycles |
221073 cycles |
1.00 |
ML-DSA-87 keypair |
365911 cycles |
365521 cycles |
1.00 |
ML-DSA-87 sign |
918902 cycles |
917777 cycles |
1.00 |
ML-DSA-87 verify |
371139 cycles |
371495 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton2 (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
213472 cycles |
212366 cycles |
1.01 |
ML-DSA-44 sign |
757344 cycles |
758048 cycles |
1.00 |
ML-DSA-44 verify |
229650 cycles |
229915 cycles |
1.00 |
ML-DSA-65 keypair |
378884 cycles |
378524 cycles |
1.00 |
ML-DSA-65 sign |
1240887 cycles |
1241386 cycles |
1.00 |
ML-DSA-65 verify |
372309 cycles |
372701 cycles |
1.00 |
ML-DSA-87 keypair |
602452 cycles |
603882 cycles |
1.00 |
ML-DSA-87 sign |
1581826 cycles |
1581660 cycles |
1.00 |
ML-DSA-87 verify |
619027 cycles |
618470 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Intel Xeon 3rd gen (c6i) (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
150378 cycles |
150445 cycles |
1.00 |
ML-DSA-44 sign |
543745 cycles |
543209 cycles |
1.00 |
ML-DSA-44 verify |
162977 cycles |
163116 cycles |
1.00 |
ML-DSA-65 keypair |
253816 cycles |
253983 cycles |
1.00 |
ML-DSA-65 sign |
881928 cycles |
883915 cycles |
1.00 |
ML-DSA-65 verify |
261356 cycles |
261379 cycles |
1.00 |
ML-DSA-87 keypair |
425165 cycles |
424631 cycles |
1.00 |
ML-DSA-87 sign |
1135002 cycles |
1139656 cycles |
1.00 |
ML-DSA-87 verify |
436997 cycles |
436849 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
71569 cycles |
71491 cycles |
1.00 |
ML-DSA-44 sign |
211525 cycles |
211355 cycles |
1.00 |
ML-DSA-44 verify |
74932 cycles |
74933 cycles |
1.00 |
ML-DSA-65 keypair |
125928 cycles |
125917 cycles |
1.00 |
ML-DSA-65 sign |
347635 cycles |
348043 cycles |
1.00 |
ML-DSA-65 verify |
123996 cycles |
124061 cycles |
1.00 |
ML-DSA-87 keypair |
206199 cycles |
206736 cycles |
1.00 |
ML-DSA-87 sign |
442889 cycles |
447534 cycles |
0.99 |
ML-DSA-87 verify |
204254 cycles |
204204 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
oqs-bot
left a comment
There was a problem hiding this comment.
Graviton3 (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
137921 cycles |
137978 cycles |
1.00 |
ML-DSA-44 sign |
481844 cycles |
481727 cycles |
1.00 |
ML-DSA-44 verify |
148893 cycles |
148697 cycles |
1.00 |
ML-DSA-65 keypair |
240809 cycles |
240574 cycles |
1.00 |
ML-DSA-65 sign |
784576 cycles |
784999 cycles |
1.00 |
ML-DSA-65 verify |
240727 cycles |
241065 cycles |
1.00 |
ML-DSA-87 keypair |
395694 cycles |
395107 cycles |
1.00 |
ML-DSA-87 sign |
1005598 cycles |
1004833 cycles |
1.00 |
ML-DSA-87 verify |
402780 cycles |
403256 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
822143 cycles |
820141 cycles |
1.00 |
ML-DSA-44 sign |
3231592 cycles |
3223234 cycles |
1.00 |
ML-DSA-44 verify |
918939 cycles |
916931 cycles |
1.00 |
ML-DSA-65 keypair |
1396005 cycles |
1392160 cycles |
1.00 |
ML-DSA-65 sign |
5262428 cycles |
5236614 cycles |
1.00 |
ML-DSA-65 verify |
1469879 cycles |
1465580 cycles |
1.00 |
ML-DSA-87 keypair |
2306109 cycles |
2298772 cycles |
1.00 |
ML-DSA-87 sign |
6643344 cycles |
6614333 cycles |
1.00 |
ML-DSA-87 verify |
2412410 cycles |
2407214 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
224542 cycles |
217154 cycles |
1.03 |
ML-DSA-44 sign |
610065 cycles |
592400 cycles |
1.03 |
ML-DSA-44 verify |
227196 cycles |
214434 cycles |
1.06 |
ML-DSA-65 keypair |
406048 cycles |
388464 cycles |
1.05 |
ML-DSA-65 sign |
1060911 cycles |
1019719 cycles |
1.04 |
ML-DSA-65 verify |
382007 cycles |
369115 cycles |
1.03 |
ML-DSA-87 keypair |
648762 cycles |
655486 cycles |
0.99 |
ML-DSA-87 sign |
1356522 cycles |
1365800 cycles |
0.99 |
ML-DSA-87 verify |
627699 cycles |
634082 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
224542 cycles |
217154 cycles |
1.03 |
ML-DSA-44 verify |
227196 cycles |
214434 cycles |
1.06 |
ML-DSA-65 keypair |
406048 cycles |
388464 cycles |
1.05 |
ML-DSA-65 sign |
1060911 cycles |
1019719 cycles |
1.04 |
ML-DSA-65 verify |
382007 cycles |
369115 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
300634 cycles |
306819 cycles |
0.98 |
ML-DSA-44 sign |
1159057 cycles |
1171685 cycles |
0.99 |
ML-DSA-44 verify |
334977 cycles |
332614 cycles |
1.01 |
ML-DSA-65 keypair |
559097 cycles |
558115 cycles |
1.00 |
ML-DSA-65 sign |
1872283 cycles |
1879578 cycles |
1.00 |
ML-DSA-65 verify |
529139 cycles |
535583 cycles |
0.99 |
ML-DSA-87 keypair |
865428 cycles |
841771 cycles |
1.03 |
ML-DSA-87 sign |
2444447 cycles |
2376953 cycles |
1.03 |
ML-DSA-87 verify |
887445 cycles |
866018 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
268593 cycles |
266643 cycles |
1.01 |
ML-DSA-44 sign |
804184 cycles |
801233 cycles |
1.00 |
ML-DSA-44 verify |
270406 cycles |
269674 cycles |
1.00 |
ML-DSA-65 keypair |
460374 cycles |
462006 cycles |
1.00 |
ML-DSA-65 sign |
1308379 cycles |
1313696 cycles |
1.00 |
ML-DSA-65 verify |
446458 cycles |
448446 cycles |
1.00 |
ML-DSA-87 keypair |
803255 cycles |
794100 cycles |
1.01 |
ML-DSA-87 sign |
1806457 cycles |
1807734 cycles |
1.00 |
ML-DSA-87 verify |
779743 cycles |
773129 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Details
| Benchmark suite | Current: 9458677 | Previous: 4befe1f | Ratio |
|---|---|---|---|
ML-DSA-44 keypair |
455939 cycles |
455727 cycles |
1.00 |
ML-DSA-44 sign |
2117313 cycles |
2115182 cycles |
1.00 |
ML-DSA-44 verify |
549275 cycles |
548271 cycles |
1.00 |
ML-DSA-65 keypair |
766705 cycles |
767842 cycles |
1.00 |
ML-DSA-65 sign |
3453801 cycles |
3456936 cycles |
1.00 |
ML-DSA-65 verify |
852645 cycles |
852036 cycles |
1.00 |
ML-DSA-87 keypair |
1241440 cycles |
1239904 cycles |
1.00 |
ML-DSA-87 sign |
4307303 cycles |
4283490 cycles |
1.01 |
ML-DSA-87 verify |
1364387 cycles |
1367394 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
The original
mov rsp, %REG->.cfi_def_cfa_register %REGrule fires on everymovq %rsp, %REG, including scratch base-pointer copies (e.g. therep movsbsource setup in mlkem-native's rej_uniform_avx2_asm.S) with no intent to re-anchor. That misclassifies the scratch copy as an alignment anchor, drops the legitimate.cfi_adjust_cfa_offseton the matchingaddq $N, %rsp, and emits a spurious.cfi_def_cfa_register.cfify now scans forward to the next ret and only re-anchors when a matching
movq %REG, %rsprestore is found in the same function body.Ported from mlkem-native commit 6ac47cb41. mldsa-native has no inputs that hit the bad path today (the only
movq %rsp, %REGsite, in keccak_f1600_x4_avx2_asm.S, has a matching restore), so the change is preventative; regenerated assembly is byte-identical.