Environment:
OS: Windows 11 Pro [Version 10.0.26200.8457]
CPU: Intel Core i9-12900K (Alder Lake Architecture)
GPU: NVIDIA GeForce RTX 3090 (24 GB, CUDA 12.4)
Python: 3.12.10
Package: llama_cpp_python-0.3.39+cu124-cp312-cp312-win_amd64.whl
Issue Description:
When upgrading to v0.3.39, the model failed to load during load_tensors with ValueError: Failed to load model from file.
Since my CPU is a 12th Gen i9-12900K (Alder Lake), the loader was supposed to load ggml-cpu-alderlake.dll. However, looking at the logs, it seems ggml_backend_load_all_from_path() failed to load any CPU backend DLLs, resulting in a registry count of only 1 (CUDA only):
Plaintext
load_backend: loaded CUDA backend from ...\llama_cpp\lib\ggml-cuda.dll
Llama.init: Loaded ggml backend registry count: 1
...
load_tensors: loading model tensors...
ValueError: Failed to load model from file
Downgrading to v0.3.38+cu124.basic completely resolved the issue and the model loaded fine. It seems like there might be a regression or missing runtime dependencies (like LLVM/Clang runtime artifacts) when dynamically loading the alderlake CPU backend on certain Windows environments.
Environment:
OS: Windows 11 Pro [Version 10.0.26200.8457]
CPU: Intel Core i9-12900K (Alder Lake Architecture)
GPU: NVIDIA GeForce RTX 3090 (24 GB, CUDA 12.4)
Python: 3.12.10
Package: llama_cpp_python-0.3.39+cu124-cp312-cp312-win_amd64.whl
Issue Description:
When upgrading to v0.3.39, the model failed to load during load_tensors with ValueError: Failed to load model from file.
Since my CPU is a 12th Gen i9-12900K (Alder Lake), the loader was supposed to load ggml-cpu-alderlake.dll. However, looking at the logs, it seems ggml_backend_load_all_from_path() failed to load any CPU backend DLLs, resulting in a registry count of only 1 (CUDA only):
Plaintext
load_backend: loaded CUDA backend from ...\llama_cpp\lib\ggml-cuda.dll
Llama.init: Loaded ggml backend registry count: 1
...
load_tensors: loading model tensors...
ValueError: Failed to load model from file
Downgrading to v0.3.38+cu124.basic completely resolved the issue and the model loaded fine. It seems like there might be a regression or missing runtime dependencies (like LLVM/Clang runtime artifacts) when dynamically loading the alderlake CPU backend on certain Windows environments.