Skip to content

Modernize build, migrate src/ to C++17, and assorted fixes#4

Merged
rantala merged 16 commits into
masterfrom
work
May 16, 2026
Merged

Modernize build, migrate src/ to C++17, and assorted fixes#4
rantala merged 16 commits into
masterfrom
work

Conversation

@rantala
Copy link
Copy Markdown
Owner

@rantala rantala commented May 16, 2026

Summary

16 commits spanning build modernization, completing the C→C++ migration in src/, and a few correctness fixes the static analyzers surfaced along the way.

Build & CI

  • ENABLE_GCC_ANALYZER / ENABLE_CLANG_ANALYZER cmake options (scoped to project sources via OBJECT libraries; external/ excluded). Builds are not warning-clean when either is on — they're diagnostic-surfacing modes, not gates.
  • CMakeLists.txt modernized (target-level properties, OpenMP via find_package, CXX_STANDARD per target); cmake_minimum_required bumped to 3.16.
  • build.yml: refresh runner pins, dependencies, and actions/checkout@v6 (clears the Node.js 20 deprecation annotation).
  • New MARCH cache variable (default native) — useful for -DMARCH=x86-64 when running under Valgrind on AVX-512 hosts.

src/ migration to C++

  • util/vmainfo, util/cpus_allowed, util/timing, routines, sortstring rewritten in C++. The only remaining .c files are in external/. routine.h keeps its extern "C" guard (149 ROUTINE_REGISTER callsites in external/*.c).
  • Standard bumped from C++11 to C++17. Cleanups taking advantage of it (string_view, optional, structured bindings, [[nodiscard]], if-with-init) scoped to sortstring.cpp and util/; algorithm files unchanged.

Correctness fixes

  • vector_realloc: don't lose realloc pointers on failure (GCC analyzer finding).
  • Explicit abort() on every malloc failure across algorithm files.
  • Handle empty input (n == 0) in every sort routine — the abort() guards above could fire on malloc(0) returning NULL (conforming behavior); also fixed a pre-existing segfault in burstsort_mkq_* and an underflow in check_result. Unit-test now starts each k-loop at 0, locking in the contract.
  • median.h: rewrite value-form med3char as max(min(a,b), min(max(a,b), c)) — branchless, clears a clang-analyzer false positive, and shrinks pseudo_median<CharT> static instruction count by 21–33%.

Banner

  • cpu_information now reports turbo/boost state (cross-driver: cpufreq/boost then intel_pstate/no_turbo), per-CPU hardware peak frequency when it exceeds the scaling cap (amd_pstate_max_freq / cpuinfo_max_freq — surfaces P/E split on AMD by value), and explicit (P-core) / (E-core) labels on Intel hybrid.

Funnelsort compile-time

  • Split funnelsort.cpp per K × BufferLayout into independent TUs — parallel builds no longer bottleneck on this file.

Test plan

  • Clean Release build (Ninja + GCC and Clang) passes with zero warnings.
  • ./build/unit-test*** All OK ***.
  • CI green on all 8 matrix cells (Release/Debug × GCC/Clang × ubuntu-22.04/ubuntu-24.04): https://github.com/rantala/string-sorting/actions/runs/25972414260
  • ./sortstring -L, -A, --help, and a full sort run (incl. taskset-restricted) output byte-identical to pre-rewrite binary (modulo ASLR mmap addresses, RSS/Pss, random seed, timing values).
  • External .c routines (quicksort, msd_nilsson) still self-register through the extern "C" routine_register boundary.
  • n == 0 invocation of every registered routine returns without abort or crash.

🤖 Generated with Claude Code

rantala and others added 16 commits May 16, 2026 22:41
Run GCC -fanalyzer on the project's internal sources, scoped via two
new OBJECT libraries so external/ third-party code is not analyzed.
Requires GCC >= 13 for both C and C++ to cover the C++ frontend;
configuration fails fast otherwise.

Enabling the option surfaces -fanalyzer diagnostics across the
internal sources; builds are not warning-clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Run clang-tidy with the clang-analyzer-* check group on the project's
internal sources via CMake's <LANG>_CLANG_TIDY target property; scope
matches ENABLE_GCC_ANALYZER (external/ third-party code excluded).
The configured C/C++ compiler does not need to be Clang. Configuration
fails fast if clang-tidy is not on PATH.

Enabling the option surfaces clang-analyzer-* diagnostics
across the internal sources; builds are not warning-clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bump cmake_minimum_required to 3.16. Move directory-level globals
(include_directories, link_libraries, add_definitions) to target-level
properties on the OBJECT libraries, propagating to the executables via
target_link_libraries. Replace hard-coded -fopenmp with find_package(OpenMP)
and OpenMP::OpenMP_C/CXX. Use C_STANDARD/CXX_STANDARD per target instead of
mutating CMAKE_*_FLAGS. Gate -march=native behind a new BUILD_NATIVE option
(default ON).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Check each dynamic memory allocation and for simplicity abort() on failure.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 07d75ee added "if (!ptr) abort()" guards after every malloc()
in the algorithm files, but several of the affected callsites do
"malloc(n * sizeof(...))" where n is the input size received directly
from main(). On a conforming allocator, "malloc(0)" may legally return
NULL without indicating OOM, in which case the new guard fires and
aborts on an empty input.

The exposed sites are all public sort entry points -- the wrapper
functions that take (unsigned char** strings, size_t n) and allocate
a workspace immediately, before any "n < threshold" early-out can
absorb the empty case. Internal recursive helpers are unaffected
because they already start with an "if (n < 32) insertion_sort;
return;" guard that handles n == 0.

Add "if (n == 0) return;" at the very top of every affected entry:

  mergesort.cpp:           6 entries
  mergesort_lcp.cpp:       8 entries
  mergesort_losertree.cpp: 10 entries
  mergesort_unstable.cpp:  3 entries
  funnelsort_impl.h:       1 (funnelsort_Kway template)
  multikey_simd.cpp:       3 (multikey_simd_b_1/_2/_4)
  msd_a.cpp:               2 (msd_A, msd_A_adaptive)
  msd_a2.cpp:              2 (msd_A2, msd_A2_adaptive)
  msd_ce.cpp:              4 (msd_CE5/6/7/8)
  msd_lsd.cpp:             2 (msd_A_lsd, msd_A_lsd_adaptive templates)

Sortstring's main() rejects empty input before reaching any routine,
and the existing unit-test loops started at k=1, so this bug was not
exercised in practice. Extend test_routines() to start each k-loop
at 0, giving every registered routine n=0 coverage and locking in
the contract that "sorting an empty array succeeds."

Running the extended test exposed a related n == 0 bug in
burstsort_mkq_simpleburst and burstsort_mkq_recursiveburst (the
templates behind all six burstsort_mkq_* registrations): the first
operation is "pseudo_median<CharT>(strings, N, 0)", which dereferences
strings[N/2] -- segfault when N == 0. Same fix: "if (N == 0) return;"
at the top of both templates.

Also fix a latent underflow in util/debug.h's check_result: the loop
"for (size_t i=0; i < n-1; ++i)" wraps to SIZE_MAX iterations when
n == 0. Guard with "if (n < 2) return 0;" at the top.

With these fixes, ./build/unit-test passes end-to-end against the
broadened k=0 coverage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Idea is to speed up parallel builds, previously funnelsort.cpp was one
of slowing files to compile.

Move template machinery to a new header src/funnelsort_impl.h, and split
the per-K instantiations and routine registrations into one TU per
BufferLayout: src/funnelsort_bfs.cpp and src/funnelsort_dfs.cpp.  Each
TU owns its layout's K=4 specialization too.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix GCC analyzer findings where realloc results were assigned directly
to vector storage before checking for failure. Store realloc results in
temporary pointers first so the original allocation remains reachable if
realloc fails and aborts.

Co-authored-by: OpenAI Codex <codex@openai.com>
Default stays "native". Override the -march= value to target a different
ISA level.

Running unit_test executable under Valgrind can trip on AVX-512
instructions, can be avoided for example by building with `-DMARCH=x86-64`.

```
==1437032== valgrind: Unrecognised instruction at address 0x40c34ed.
==1437032==    at 0x40C34ED: mergesort_4way_unstable(unsigned char**, unsigned long, unsigned char**) (mergesort_unstable.cpp:1862)
==1437032==    by 0x40C3648: mergesort_4way_unstable(unsigned char**, unsigned long) (mergesort_unstable.cpp:1879)
==1437032==    by 0x40DD641: test_routines (main.cpp:226)
==1437032==    by 0x40DD641: main (main.cpp:302)
==1437032== Your program just tried to execute an instruction that Valgrind
==1437032== did not recognise.  There are two possible reasons for this.
==1437032== 1. Your program has a bug and erroneously jumped to a non-code
==1437032==    location.  If you are running Memcheck and you just saw a
==1437032==    warning about a bad jump, it's probably your program's fault.
==1437032== 2. The instruction is legitimate but Valgrind doesn't handle it,
==1437032==    i.e. it's Valgrind's fault.  If you think this is the case or
==1437032==    you are not sure, please let us know and we'll try to fix it.
==1437032== Either way, Valgrind will now raise a SIGILL signal which will
==1437032== probably kill your program.
==1437032==
==1437032== Process terminating with default action of signal 4 (SIGILL)
==1437032==  Illegal opcode at address 0x40C34ED
==1437032==    at 0x40C34ED: mergesort_4way_unstable(unsigned char**, unsigned long, unsigned char**) (mergesort_unstable.cpp:1862)
==1437032==    by 0x40C3648: mergesort_4way_unstable(unsigned char**, unsigned long) (mergesort_unstable.cpp:1879)
==1437032==    by 0x40DD641: test_routines (main.cpp:226)
==1437032==    by 0x40DD641: main (main.cpp:302)
```

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the cascading-if implementation of med3char(a, b, c) with the
branchless median identity max(min(a, b), min(max(a, b), c)).
Semantics match on all inputs including ties (a==b -> a, c==a -> a,
c==b -> b, strict order picks the middle).

Clears a clang-analyzer false positive at src/util/median.h:67 where
the analyzer could not prove that every path through the cascading-if
returned an initialized value (three clang-analyzer-core.CallAndMessage
warnings on the pseudo_median call site). std::min and std::max are
fully modeled.

The reference-and-comparator overload is not flagged and stays as-is.

Assembly impact (release build, gcc -O2 -march=native): the value-form
med3char is inlined into pseudo_median<CharT>; the three instantiations
shrink consistently across all four TUs that call them (burstsort_mkq,
multikey_block, multikey_dynamic, multikey_simd):

  pseudo_median<unsigned char>:   146 -> 98  insns (-33%)
  pseudo_median<unsigned short>:  301 -> 222 insns (-26%)
  pseudo_median<unsigned int>:    491 -> 388 insns (-21%)

The cascading-if produced ~8 insns and 4 conditional jumps per nested
med3char call (three string-equality early-outs plus a slow/fast split
the compiler could not fuse). The min/max identity compiles to mostly
straight-line cmp/cmov pairs with a single branch, so the hot path of
pseudo_median is now four nested cmp+cmov sequences followed by ret.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reimplement src/util/vmainfo.c as src/util/vmainfo.cpp using
std::ifstream, std::string, and std::vector. The C ABI is preserved
via extern "C" so the sole caller (src/sortstring.c) keeps working
unchanged. Output layout matches the original byte-for-byte.

The rewrite incidentally clears the clang-analyzer diagnostics
flagged on the C version:
  - 16 clang-analyzer-security.insecureAPI.strcpy (unbounded strcat)
  - 8 clang-analyzer-unix.Stream (getline after EOF)
  - 2 clang-analyzer-unix.Malloc (getline buffer leaked at done:)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the vmainfo C++ rewrite for the remaining /proc-parsing helper.
std::ifstream / std::string / std::ostringstream replace the manual
getline + goto-cleanup + asprintf chains. C linkage is preserved via
extern "C" so src/sortstring.c keeps building unchanged.

Incidental correctness fix: CPU_ALLOC returns uninitialized memory, so
the original cpus_allowed left bits for cleared mask positions in
indeterminate state. The new implementation CPU_ZERO_S's the set
before populating it from the hex string.

Output verified identical to the C version on the unrestricted CPU
set and a taskset-restricted subset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the vmainfo / cpus_allowed C++ rewrite for the remaining util.
Anonymous-namespace statics replace file-scope statics; a small
ms_between() helper deduplicates the per-clock delta computation
(four call sites collapse to one expression each). C linkage is
preserved via extern "C".

Incidental precision fix: the C version divided tv_nsec by an integer
1000000, truncating sub-millisecond resolution for wall-clock and
PROCESS_CPUTIME. user/sys already had microsecond resolution via
tv_usec/1e3. Switching to /1000000.0 makes all five reported timings
consistent at microsecond precision.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Finish the src/ migration: the executable entry point and the routine
registry are the last two C translation units. Convert them and drop
the C-compat scaffolding that was carrying their callers.

Mechanical translation:
- File-scope statics move into anonymous namespaces.
- C-style void-pointer casts become static_cast<>.
- Implicit conversions through munmap drop their casts.
- Stdlib includes use the <c*> form.
- nullptr, std::strcmp, std::qsort, std::fopen etc.

Linkage cleanup:
- routine_register stays extern "C" (called by external/*.c at 149
  ROUTINE_REGISTER sites; src/routine.h keeps its extern "C" guard).
- routine_from_name and routine_get_all become plain C++ linkage
  (only C++ callers remain).
- src/util/{vmainfo,cpus_allowed,timing}.h drop their extern "C"
  guards along with the extern "C" specifiers on the .cpp definitions.
- _GNU_SOURCE defines drop too: g++ predefines it for libstdc++.

Behavior preserved: output of -L / -A / --help / a full sort run
(modulo ASLR mmap addresses and timing values) is byte-identical
to the pre-rewrite binary. Error-path stderr and exit codes match.
external/*.c routines still self-register correctly via the
ROUTINE_REGISTER macro.

clang-analyzer warning count: 29 -> 0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CMakeLists.txt bumps both target properties from CXX_STANDARD 11 to 17.
C_STANDARD stays at 99 (external/*.c is untouched). The algorithm
implementation files in src/ compile under the new standard with no
source changes -- no removed/deprecated C++17 constructs were in use
(no register, auto_ptr, random_shuffle, bind1st/bind2nd, ptr_fun,
unary/binary_function, dynamic exception specs, or trigraphs).

Cleanups taking advantage of the new standard, scoped to sortstring +
util + routines per the user's guidance ("minimal in algorithm files,
extensive in sortstring/util"):

vma_info now returns std::string instead of a malloc'd char*; the
strdup_malloc helper goes away and the caller's free()/strcmp pair
collapses to operator==.

cpus_allowed_list returns std::string ("" means not present).
cpus_allowed returns a cpus_info struct -- the two out-params turn
into a structured-binding consumer. cpu_scaling_{min,max}_freq
return std::optional<int>, retiring the -1 sentinel. Internal
helpers (status_entry, high_bit_order, set_cpu_bits) take
string_view. [[nodiscard]] on all four public entry points.
Incidental fix at the caller: std::free(cpus) -> CPU_FREE(cpus)
(functionally identical on glibc, but spec-correct).

routine_from_name returns std::optional<const routine*> and takes
string_view; the call site uses if-with-init. routine_get_all
returns std::pair, consumed by structured bindings in both
sortstring.cpp and unit-test/main.cpp. qsort -> std::sort with a
typed lambda comparator. [[nodiscard]] on both.

sortstring.cpp: opts bitfields become plain bools with default
member initializers; write_filename is std::string. Two asprintf
sites (log file and write-output path) become std::string
concatenation, dropping the malloc/free/return-code dance. bazename
returns std::string, retiring the static char buf[300] (not
thread-safe, though never actually exercised concurrently here).
[[nodiscard]] on alloc_bytes, alloc_text, alloc_pointers, file_size.

util/debug.h: [[nodiscard]] on check_result.

Verified output byte-identical to pre-change binary for -L, -A,
--help, full sort run (regular and taskset-restricted), and all
error paths (modulo ASLR addresses, random seed, timing values,
and RSS/Pss). Unit tests pass. Algorithm-file object code unchanged
in shape (standard bump only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The existing per-CPU line printed only the kernel governor's scaling
window via scaling_min_freq / scaling_max_freq. On a system with
boost disabled, or on hybrid CPUs where different cores have
different boost ceilings, that window is the same number on every
core and conveys nothing about whether the CPU can actually run
faster than displayed.

Add three signals so the benchmark banner reflects what the silicon
can do:

1. Turbo/boost state -- a single global line. Reads
   /sys/devices/system/cpu/cpufreq/boost first (AMD CPB,
   acpi-cpufreq, amd-pstate); falls back to
   /sys/devices/system/cpu/intel_pstate/no_turbo (intel_pstate,
   inverse-sense). Reported as on/off; the line is omitted when
   neither file is readable.

2. Hardware peak frequency -- per CPU, when it exceeds the
   scaling cap. Reads amd_pstate_max_freq first (the per-core
   boost ceiling reported by amd-pstate; differs across cores on
   AMD hybrid parts), then cpuinfo_max_freq (Intel turbo cap).
   Suppressed when equal to scaling_max_freq so the line stays
   short on non-hybrid / boost-on systems.

3. P-core / E-core label -- per CPU. Resolved from
   /sys/devices/cpu_core/cpus (P) and /sys/devices/cpu_atom/cpus
   (E), the Intel hybrid PMU sysfs. Empty on non-hybrid Intel and
   on AMD (where the hardware-peak value already exposes the
   split, e.g. 5090 MHz Zen 5 vs 3506 MHz Zen 5c on a Ryzen AI
   7 PRO 350).

Sample on the AMD Ryzen AI 7 PRO 350 (boost disabled,
2 GHz scaling cap):

  CPU information:
      CPUs allowed: 0-1,4-5
      Turbo/boost: off
      CPU0, scaling [623MHz .. 2000MHz], hw peak 5090MHz
      CPU1, scaling [623MHz .. 2000MHz], hw peak 3506MHz
      CPU4, scaling [623MHz .. 2000MHz], hw peak 5090MHz
      CPU5, scaling [623MHz .. 2000MHz], hw peak 3506MHz

Incidental cleanup: the scaling range now prints [min .. max]
(was [max .. min]) since "scaling ... hw peak X" only reads
naturally with an ascending range.

API shape: cpu_scaling_min_freq / cpu_scaling_max_freq collapse
into a single cpu_info_for(cpu) returning a struct (scaling
min/max, hw peak, core class). cpu_boost_enabled() returns
std::optional<bool>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rantala rantala merged commit c64640d into master May 16, 2026
16 checks passed
@rantala rantala deleted the work branch May 16, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant