Skip to content

[build] Add CMake build system alongside xmake (Phase 1: CPU + 6 backends) #1156

@voltjia

Description

@voltjia

Goal

Replace xmake with a modern, idiomatic CMake build system for InfiniCore — without breaking existing users — by shipping CMake side-by-side with the existing `xmake.lua` in Phase 1, validated end-to-end on the available test hardware.

The strategic goal beyond Phase 1 is full xmake removal (Phase 3); Phase 1 establishes the topology and proves the per-backend toolchain story on the most-used backends.

Phase 1 scope

  • New top-level `CMakeLists.txt` covering targets equivalent to xmake's `infini-utils`, `infinirt`, `infiniop`, `infiniccl`, `infinicore_cpp_api`, `_infinicore`, plus the test binaries (`infinirt-test`, `infiniop-test`, `infiniccl-test`, `infinicore-test`, `infiniutils-test`).
  • Backends ported and verified end-to-end: CPU, NVIDIA, MetaX, Iluvatar, Moore Threads, Cambricon, Ascend (7 of 11).
  • New parallel CMake CI workflow alongside the existing xmake one. Both must pass to merge.
  • `scripts/cmake_install.py` peer to `scripts/install.py`, preserving the `--=y` UX.
  • `setup.py` becomes build-system-aware via env var `INFINICORE_BUILD_SYSTEM` (default `cmake`, fallback `xmake` for users on Phase-2-deferred backends).
  • README documents both build systems.

Non-goals (Phase 1)

  • Removing `xmake.lua`. It stays untouched. (Phase 3.)
  • CMake support for Hygon DCU, Kunlun XPU, Ali PPU, Qy GPU. Their `ENABLE_*` options are stubs that fail-fast with a redirect to xmake. (Phase 2.)
  • `flash-attn`, `aten`/`torch`, `ninetoothed` integrations. Preserved as no-op options. (Phase 2.)
  • Verified Windows MSVC builds. The MSVC code paths are written but unverified — no Windows test runner in scope. README flags this explicitly.
  • InfiniLM CMake migration. (Phase 3.)
  • Restructuring source code. Only build-system files change.

Design highlights

Modern, idiomatic CMake. Modular subdirectory `CMakeLists.txt` per backend; `find_package` for spdlog/json/pybind11/OpenMP/CUDA/Boost; modern target_link_libraries / target_include_directories / target_compile_features; generator expressions for per-backend flags.

Per-backend integration (Approach A). CUDA-flavored backends (NVIDIA, Iluvatar) use `-DCMAKE_TOOLCHAIN_FILE=cmake/toolchains/.cmake` with `enable_language(CUDA)` and a swapped `CMAKE_CUDA_COMPILER`. Custom-compiler backends (Cambricon `cncc`, Moore `mcc`, MetaX `htcc`/`mxcc`) get helper functions `infinicore_add_bang_library` / `infinicore_add_musa_library` / `infinicore_add_maca_library` that wrap `add_custom_command` per device source and bundle the resulting `.o` files into a normal STATIC library alongside host `.cc` sources.

Ascend special case. Today's xmake invokes `make` inside `src/infiniop/devices/ascend/` which itself runs CMake. The migration subsumes the nested CMake project directly via `add_subdirectory`, dropping the Makefile shim.

Install layout & target naming preserved. Installed shared library filenames are byte-identical (`libinfiniop.so` etc.); pybind module `_infinicore.cpython-3xx-*.so` keeps its soabi suffix; `$INFINI_ROOT` install layout matches xmake exactly. Target names switch from hyphenated (`infiniop-nvidia`) to underscored (`infiniop_nvidia`) internally, but this is invisible externally.

CI. New `.github/workflows/build-cmake.yml` matrix [ubuntu-latest, windows-latest] × [Debug, Release], CPU only (no GPU runners on GHA). Existing `build.yml` (xmake) untouched.

Validation plan

Validated on six remote test servers, one per backend (NVIDIA, MetaX, Iluvatar, Moore, Cambricon, Ascend), with the following per-server flow:

  1. Baseline: `scripts/install.py` (xmake) + `scripts/python_test.py --` — capture baseline pass list.
  2. CMake: `scripts/cmake_install.py` + `pip install .` + `scripts/python_test.py --` — capture CMake pass list.
  3. Diff. Any pass→fail regression blocks the PR; any test failing in both is filed as a pre-existing issue.

PR description will include the per-server pass/fail diff table and link to build logs.

Phase 2 / 3 (not in this issue)

  • Phase 2: CMake support for Hygon, Kunlun, Ali, Qy. Port `flash-attn`, `aten`, `ninetoothed`. Each requires its own toolchain file or helper plus a test box.
  • Phase 3: Delete `xmake.lua`, the xmake CI workflow, the `INFINICORE_BUILD_SYSTEM` switch in `setup.py`. Migrate InfiniLM the same way.

Branch / PR

Branch `issue/`, pushed to `InfiniTensor/InfiniCore`. PR opened against `main`. Commits structured one-per-backend so the change is bisectable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions