Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Add NVFP4 1x64 Local Encode Recipe
#2941 opened Apr 29, 2026 by cael-ling Contributor Draft
1 of 13 tasks
[Common/PyTorch/JAX] make offset of ClampedSwiGLU configurable
#2938 opened Apr 28, 2026 by hxbai Contributor Loading…
13 tasks
Fix CUDA graph parameter grad lifetime
#2937 opened Apr 28, 2026 by buptzyb Contributor Loading…
[JAX] Fix MNIST L2 jax test instability
#2933 opened Apr 27, 2026 by tdophung Collaborator Loading…
6 of 13 tasks
[PyTorch] Enable head dim 256 for FA4
#2932 opened Apr 27, 2026 by yaox12 Member Draft
13 tasks
Implement per-token NVFP4 fprop recipe
#2931 opened Apr 27, 2026 by zianglih Contributor Loading…
8 of 13 tasks
[Common/PyTorch] Add MXFP8 cast-and-transpose op community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2930 opened Apr 26, 2026 by jeweldave Loading…
[PyTorch] Avoid removing usages from quantized weight tensors 2.15.0 bug Something isn't working
#2929 opened Apr 25, 2026 by timmoon10 Collaborator Loading…
8 of 13 tasks
Fix WHEEL Tag mismatch in transformer-engine-cu12 wheels
#2928 opened Apr 25, 2026 by eyupcanakman Loading…
7 of 13 tasks
[PyTorch] Fix stale columnwise data usage
#2925 opened Apr 25, 2026 by ksivaman Member Loading…
7 of 13 tasks
Make TE Sequential Grouped linear Op CUDA graphable
#2923 opened Apr 24, 2026 by vthumbe1503 Collaborator Loading…
13 tasks
[PyTorch] Add distributed Muon optimizer 2.16.0
#2920 opened Apr 23, 2026 by vcherepanov-nv Collaborator Loading…
5 of 13 tasks
guard fuser grad checks on non-leaf nodes
#2919 opened Apr 23, 2026 by CarlosGomes98 Contributor Draft
13 tasks
[PyTorch][CP] Reduce P2P forward peak memory: O(C) _ O(1)
#2916 opened Apr 22, 2026 by sudhakarsingh27 Collaborator Draft
1 of 3 tasks
Variable Grouped Swizzle
#2914 opened Apr 22, 2026 by int-smart Contributor Loading…
8 of 13 tasks
NVFP4 per-token recipe
#2913 opened Apr 21, 2026 by YigongQin Draft
1 of 13 tasks
feat: auto-pad FP8 GEMM dimensions for unaligned sequence packing community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2911 opened Apr 21, 2026 by NoonePauseferg Loading…
[Common][PyTorch] Fix int32 overflow and -1 sentinel handling in moe_permute community-contribution PRs from external contributor outside the core maintainers, representing community-driven work.
#2907 opened Apr 21, 2026 by jing-4369 Loading…
3 of 4 tasks
Add head dim 256 support for SDPA on Blackwell
#2906 opened Apr 21, 2026 by yaox12 Member Loading…
1 of 13 tasks
[PyTorch] Expose function to bulk-allocate tensors backed by the same buffer
#2900 opened Apr 18, 2026 by timmoon10 Collaborator Loading…
9 of 13 tasks
Improve the dimension checks for the FP8 recipes
#2894 opened Apr 16, 2026 by ptrendx Member Loading…
13 tasks
Add AI written qwen3_moe example
#2887 opened Apr 15, 2026 by skyw Loading…
4 of 13 tasks
ProTip! Mix and match filters to narrow down what you’re looking for.