PyTorch NaNs are silent killers. This hook catches them at the exact layer and batch — with ~3 ms overhead vs ~7 ms for set_detect_anomaly.
pytorch autograd model-debugging training-stability gradient-explosion nan-detection deep-learning-debugging forward-hooks
-
Updated
Apr 25, 2026 - Python