Enable seminumerical exact exchange calculation with CUDA#183
Enable seminumerical exact exchange calculation with CUDA#183vmitq wants to merge 2 commits intowavefunction91:masterfrom
Conversation
Co-authored-by: Lukas Gergs <lg@terraquantum.swiss>
There was a problem hiding this comment.
Pull request overview
This PR adds support for computing the seminumerical exact-exchange (EXX) contribution to the nuclear gradient on CUDA device backends, and exposes it through the public XCIntegrator / replicated-integrator APIs.
Changes:
- Add
eval_exx_gradAPI plumbing acrossXCIntegratorand replicated integrator layers. - Add device storage + local-work driver support for EXX-gradient intermediates and reduction.
- Extend the standalone driver to load/write/print/compare
EXX_GRADfrom HDF5 references.
Reviewed changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/standalone_driver.cxx | Loads/writes/prints EXX_GRAD reference data and compares norms. |
| src/xc_integrator/xc_data/device/xc_device_stack_data.hpp | Adds device pointers and interface hooks for EXX-gradient buffers. |
| src/xc_integrator/xc_data/device/xc_device_stack_data.cxx | Allocates/zeros/retrieves EXX-gradient intermediates on device. |
| src/xc_integrator/xc_data/device/xc_device_data.hpp | Adds exx_grad term tracking and device-data virtual interface for EXX gradient. |
| src/xc_integrator/shell_batched/shell_batched_replicated_xc_integrator.hpp | Declares eval_exx_grad_ in shell-batched replicated integrator. |
| src/xc_integrator/shell_batched/shell_batched_replicated_xc_integrator_exx_grad.hpp | Adds NYI stub for shell-batched EXX gradient. |
| src/xc_integrator/replicated/replicated_xc_integrator_impl.cxx | Adds replicated pimpl forwarding function eval_exx_grad. |
| src/xc_integrator/replicated/host/shell_batched_replicated_xc_host_integrator.cxx | Wires in the shell-batched EXX-gradient stub header. |
| src/xc_integrator/replicated/host/reference_replicated_xc_host_integrator.hpp | Declares eval_exx_grad_ for reference host integrator. |
| src/xc_integrator/replicated/host/reference_replicated_xc_host_integrator.cxx | Wires in the reference EXX-gradient stub header. |
| src/xc_integrator/replicated/host/reference_replicated_xc_host_integrator_exx_grad.hpp | Adds NYI stub for reference host EXX gradient. |
| src/xc_integrator/replicated/device/shell_batched_replicated_xc_device_integrator.cxx | Wires in the shell-batched EXX-gradient stub header on device build. |
| src/xc_integrator/replicated/device/incore_replicated_xc_device_integrator.hpp | Declares EXX-gradient evaluation and local-work helpers. |
| src/xc_integrator/replicated/device/incore_replicated_xc_device_integrator.cxx | Includes the new EXX-gradient device implementation header. |
| src/xc_integrator/replicated/device/incore_replicated_xc_device_integrator_exx_grad.hpp | Implements EXX gradient evaluation workflow on device. |
| src/xc_integrator/local_work_driver/device/scheme1_magma_base.hpp | Adds EXX-gradient driver API surface for MAGMA scheme (NYI). |
| src/xc_integrator/local_work_driver/device/scheme1_magma_base.cxx | Adds NYI throws for EXX-gradient operations under MAGMA. |
| src/xc_integrator/local_work_driver/device/scheme1_base.hpp | Extends scheme1 base interface with EXX-gradient ops. |
| src/xc_integrator/local_work_driver/device/scheme1_base.cxx | Implements EXX K-derivative accumulation and contraction to basis-function gradients. |
| src/xc_integrator/local_work_driver/device/local_device_work_driver.hpp | Exposes EXX-gradient operations on the local device work driver. |
| src/xc_integrator/local_work_driver/device/local_device_work_driver.cxx | Forwards EXX-gradient calls to the device-driver PIMPL. |
| src/xc_integrator/local_work_driver/device/local_device_work_driver_pimpl.hpp | Adds pure-virtual EXX-gradient hooks for device-driver implementations. |
| src/xc_integrator/local_work_driver/device/cuda/kernels/cublas_extensions.cu | Adds CUDA kernels for matrix row/column reductions used in EXX gradient assembly. |
| src/xc_integrator/local_work_driver/device/common/device_blas.hpp | Declares matrix_reduce_rows/cols device BLAS helpers. |
| src/runtime_environment/device/device_runtime_environment.cxx | Adds DeviceRuntimeEnvironment constructor taking an explicit byte count. |
| src/runtime_environment/device/device_runtime_environment_impl.hpp | Implements explicit-size device-buffer allocation constructor. |
| include/gauxc/xc_integrator/xc_integrator_impl.hpp | Adds eval_exx_grad to the virtual integrator interface and public wrapper. |
| include/gauxc/xc_integrator/replicated/replicated_xc_integrator_impl.hpp | Adds low-level replicated EXX-gradient virtual + public entrypoint. |
| include/gauxc/xc_integrator/replicated/impl.hpp | Implements ReplicatedXCIntegrator::eval_exx_grad_ wrapper returning a vector. |
| include/gauxc/xc_integrator/replicated_xc_integrator.hpp | Plumbs eval_exx_grad_ through replicated integrator type. |
| include/gauxc/xc_integrator/impl.hpp | Adds XCIntegrator::eval_exx_grad public wrapper. |
| include/gauxc/xc_integrator.hpp | Adds public eval_exx_grad API type and method declaration. |
| include/gauxc/runtime_environment/decl.hpp | Declares the new explicit-size DeviceRuntimeEnvironment constructor. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if( integrate_exc_grad ) { | ||
| if( rks ) { | ||
| EXX_GRAD = integrator.eval_exx_grad( P, sn_link_settings ); | ||
| } | ||
| else if( uks ) { | ||
| std::cout << "Warning: eval_exx_grad + UKS NYI!" << std::endl; | ||
| } | ||
| else if( gks ) { | ||
| std::cout << "Warning: eval_exx_grad + GKS NYI!" << std::endl; | ||
| } | ||
| if(!world_rank) { | ||
| std::cout << "EXX Gradient:" << std::endl; | ||
| std::cout << std::scientific << std::setprecision(6); | ||
| for( auto iAt = 0; iAt < mol.size(); ++iAt ) { | ||
| std::cout << " " | ||
| << std::setw(16) << EXX_GRAD[3*iAt + 0] | ||
| << std::setw(16) << EXX_GRAD[3*iAt + 1] | ||
| << std::setw(16) << EXX_GRAD[3*iAt + 2] | ||
| << std::endl; |
There was a problem hiding this comment.
If integrate_exc_grad is enabled but the calculation is UKS/GKS, EXX_GRAD is never populated/resized (only a warning is printed) but is still indexed in the printing loop. This can cause out-of-bounds access/crash. Guard the printout behind the same condition that actually computes EXX_GRAD (RKS), or ensure EXX_GRAD is resized/filled with zeros in the NYI branches.
- Fix allreduce length from nbf to 3*nbf - Guard EXX_GRAD access in standalone_driver for UKS/GKS where the vector is empty - Add missing settings to util::unused - Fix typos
This PR implements a new feature: the calculation of the seminumerical exact exchange contribution to the gradient. Currently, the EXX gradient is supported only for CUDA+cuBLAS.