Skip to content

Enable seminumerical exact exchange calculation with CUDA#183

Open
vmitq wants to merge 2 commits intowavefunction91:masterfrom
vmitq:feature/exx_gradient_cuda
Open

Enable seminumerical exact exchange calculation with CUDA#183
vmitq wants to merge 2 commits intowavefunction91:masterfrom
vmitq:feature/exx_gradient_cuda

Conversation

@vmitq
Copy link
Copy Markdown

@vmitq vmitq commented Mar 13, 2026

This PR implements a new feature: the calculation of the seminumerical exact exchange contribution to the gradient. Currently, the EXX gradient is supported only for CUDA+cuBLAS.

Co-authored-by: Lukas Gergs <lg@terraquantum.swiss>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for computing the seminumerical exact-exchange (EXX) contribution to the nuclear gradient on CUDA device backends, and exposes it through the public XCIntegrator / replicated-integrator APIs.

Changes:

  • Add eval_exx_grad API plumbing across XCIntegrator and replicated integrator layers.
  • Add device storage + local-work driver support for EXX-gradient intermediates and reduction.
  • Extend the standalone driver to load/write/print/compare EXX_GRAD from HDF5 references.

Reviewed changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
tests/standalone_driver.cxx Loads/writes/prints EXX_GRAD reference data and compares norms.
src/xc_integrator/xc_data/device/xc_device_stack_data.hpp Adds device pointers and interface hooks for EXX-gradient buffers.
src/xc_integrator/xc_data/device/xc_device_stack_data.cxx Allocates/zeros/retrieves EXX-gradient intermediates on device.
src/xc_integrator/xc_data/device/xc_device_data.hpp Adds exx_grad term tracking and device-data virtual interface for EXX gradient.
src/xc_integrator/shell_batched/shell_batched_replicated_xc_integrator.hpp Declares eval_exx_grad_ in shell-batched replicated integrator.
src/xc_integrator/shell_batched/shell_batched_replicated_xc_integrator_exx_grad.hpp Adds NYI stub for shell-batched EXX gradient.
src/xc_integrator/replicated/replicated_xc_integrator_impl.cxx Adds replicated pimpl forwarding function eval_exx_grad.
src/xc_integrator/replicated/host/shell_batched_replicated_xc_host_integrator.cxx Wires in the shell-batched EXX-gradient stub header.
src/xc_integrator/replicated/host/reference_replicated_xc_host_integrator.hpp Declares eval_exx_grad_ for reference host integrator.
src/xc_integrator/replicated/host/reference_replicated_xc_host_integrator.cxx Wires in the reference EXX-gradient stub header.
src/xc_integrator/replicated/host/reference_replicated_xc_host_integrator_exx_grad.hpp Adds NYI stub for reference host EXX gradient.
src/xc_integrator/replicated/device/shell_batched_replicated_xc_device_integrator.cxx Wires in the shell-batched EXX-gradient stub header on device build.
src/xc_integrator/replicated/device/incore_replicated_xc_device_integrator.hpp Declares EXX-gradient evaluation and local-work helpers.
src/xc_integrator/replicated/device/incore_replicated_xc_device_integrator.cxx Includes the new EXX-gradient device implementation header.
src/xc_integrator/replicated/device/incore_replicated_xc_device_integrator_exx_grad.hpp Implements EXX gradient evaluation workflow on device.
src/xc_integrator/local_work_driver/device/scheme1_magma_base.hpp Adds EXX-gradient driver API surface for MAGMA scheme (NYI).
src/xc_integrator/local_work_driver/device/scheme1_magma_base.cxx Adds NYI throws for EXX-gradient operations under MAGMA.
src/xc_integrator/local_work_driver/device/scheme1_base.hpp Extends scheme1 base interface with EXX-gradient ops.
src/xc_integrator/local_work_driver/device/scheme1_base.cxx Implements EXX K-derivative accumulation and contraction to basis-function gradients.
src/xc_integrator/local_work_driver/device/local_device_work_driver.hpp Exposes EXX-gradient operations on the local device work driver.
src/xc_integrator/local_work_driver/device/local_device_work_driver.cxx Forwards EXX-gradient calls to the device-driver PIMPL.
src/xc_integrator/local_work_driver/device/local_device_work_driver_pimpl.hpp Adds pure-virtual EXX-gradient hooks for device-driver implementations.
src/xc_integrator/local_work_driver/device/cuda/kernels/cublas_extensions.cu Adds CUDA kernels for matrix row/column reductions used in EXX gradient assembly.
src/xc_integrator/local_work_driver/device/common/device_blas.hpp Declares matrix_reduce_rows/cols device BLAS helpers.
src/runtime_environment/device/device_runtime_environment.cxx Adds DeviceRuntimeEnvironment constructor taking an explicit byte count.
src/runtime_environment/device/device_runtime_environment_impl.hpp Implements explicit-size device-buffer allocation constructor.
include/gauxc/xc_integrator/xc_integrator_impl.hpp Adds eval_exx_grad to the virtual integrator interface and public wrapper.
include/gauxc/xc_integrator/replicated/replicated_xc_integrator_impl.hpp Adds low-level replicated EXX-gradient virtual + public entrypoint.
include/gauxc/xc_integrator/replicated/impl.hpp Implements ReplicatedXCIntegrator::eval_exx_grad_ wrapper returning a vector.
include/gauxc/xc_integrator/replicated_xc_integrator.hpp Plumbs eval_exx_grad_ through replicated integrator type.
include/gauxc/xc_integrator/impl.hpp Adds XCIntegrator::eval_exx_grad public wrapper.
include/gauxc/xc_integrator.hpp Adds public eval_exx_grad API type and method declaration.
include/gauxc/runtime_environment/decl.hpp Declares the new explicit-size DeviceRuntimeEnvironment constructor.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +608 to +626
if( integrate_exc_grad ) {
if( rks ) {
EXX_GRAD = integrator.eval_exx_grad( P, sn_link_settings );
}
else if( uks ) {
std::cout << "Warning: eval_exx_grad + UKS NYI!" << std::endl;
}
else if( gks ) {
std::cout << "Warning: eval_exx_grad + GKS NYI!" << std::endl;
}
if(!world_rank) {
std::cout << "EXX Gradient:" << std::endl;
std::cout << std::scientific << std::setprecision(6);
for( auto iAt = 0; iAt < mol.size(); ++iAt ) {
std::cout << " "
<< std::setw(16) << EXX_GRAD[3*iAt + 0]
<< std::setw(16) << EXX_GRAD[3*iAt + 1]
<< std::setw(16) << EXX_GRAD[3*iAt + 2]
<< std::endl;
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If integrate_exc_grad is enabled but the calculation is UKS/GKS, EXX_GRAD is never populated/resized (only a warning is printed) but is still indexed in the printing loop. This can cause out-of-bounds access/crash. Guard the printout behind the same condition that actually computes EXX_GRAD (RKS), or ensure EXX_GRAD is resized/filled with zeros in the NYI branches.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit d0f683d

Comment thread tests/standalone_driver.cxx Outdated
Comment thread tests/standalone_driver.cxx Outdated
Comment thread src/xc_integrator/xc_data/device/xc_device_stack_data.cxx
Comment thread src/xc_integrator/xc_data/device/xc_device_stack_data.hpp
Comment thread include/gauxc/xc_integrator/xc_integrator_impl.hpp
Comment thread src/runtime_environment/device/device_runtime_environment_impl.hpp
- Fix allreduce length from nbf to 3*nbf
- Guard EXX_GRAD access in standalone_driver for UKS/GKS where the
  vector is empty
- Add missing settings to util::unused
- Fix typos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add seminumerical exact exchange (EXX) gradient for CUDA

2 participants