TransArch: Hardware-Aware Architecture Transfer for Foundation Models

Modern large language models are increasingly bottlenecked by communication rather than computation on today's hardware. TransArch is a collection of research projects that design hardware-friendly model architectures and migrate existing pre-trained models into these new architectures with minimal performance loss.

📦 Projects

Project	Venue	Description	Paper
CLOVER	ICML 2025	Cross-layer SVD pruning of Q-K and V-O pairs in attention heads; combined with TransMLA achieves up to 11.1× speedup over LLaMA-2-7B	arXiv
TransMLA	NeurIPS 2025 Spotlight	Convert GQA models (LLaMA, Qwen, Mixtral, …) to DeepSeek-MLA with full Absorb compatibility and up to 10.6× inference speedup	arXiv
TPLA	ASPLOS 2026	Tensor Parallel Latent Attention — partitions latent representations across devices, achieving 1.79×/1.93× speedup on DeepSeek-V3/Kimi-K2	arXiv
HISA	Preprint	Hierarchical two-stage indexer for fine-grained sparse attention, achieving 2×–4× speedup at 32K–128K context	arXiv
MISA	Preprint	Mixture-of-experts routing over DSA indexer heads — matches dense DSA with 8×/4× fewer indexer heads and ~3.82× kernel speedup on H200	arXiv
GQLA	Preprint	Group-Query Latent Attention — one set of weights, two decoding paths (MQA-absorb for H100, GQA+MTP for H20), with up to 8-way TP	arXiv

📰 News

[2026.05] GQLA preprint released: arXiv:2605.15250.
[2026.05] MISA preprint released: arXiv:2605.07363.
[2026.03] HISA preprint released: arXiv:2603.28458.
[2026.02] 🎉 TransMLA is adopted by Ant Group's latest 1T model Ling-2.5-1T! This demonstrates the robust scalability of TransMLA in ultra-large-scale LLMs.
[2025.11] TPLA accepted at ASPLOS 2026 (Summer cycle).
[2025.09] TransMLA accepted at NeurIPS 2025 (Spotlight, Top 3.19%).
[2025.05] CLOVER accepted at ICML 2025.

📋 To-Do

Release TPLA code
Release HISA code
Release MISA code
Release GQLA code

📚 Citation

If you find our work useful, please cite the relevant paper(s):

@inproceedings{meng2025transmla,
  title={TransMLA: Multi-head Latent Attention Is All You Need},
  author={Meng, Fanxu and Tang, Pingzhi and Tang, Xiaojuan and Yao, Zengwei and Sun, Xing and Zhang, Muhan},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2025}
}

@inproceedings{meng2025clover,
  title={CLOVER: Cross-Layer Orthogonal Vectors Pruning and Fine-Tuning},
  author={Meng, Fanxu and Tang, Pingzhi and Jiang, Fan and Zhang, Muhan},
  booktitle={International Conference on Machine Learning (ICML)},
  year={2025}
}

@inproceedings{tang2026tpla,
  title={TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference},
  author={Tang, Xiaojuan and Meng, Fanxu and Tang, Pingzhi and Wang, Yuxuan and Yin, Di and Sun, Xing and Zhang, Muhan},
  booktitle={International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS)},
  year={2026}
}

@article{xu2026hisa,
  title={HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention},
  author={Xu, Yufei and Meng, Fanxu and Jiang, Fan and Wang, Yuxuan and Zhou, Ruijie and Wang, Zhaohui and Wu, Jiexi and Pan, Zhixin and Tang, Xiaojuan and Pei, Wenjie and Liu, Tongxuan and Yin, Di and Sun, Xing and Zhang, Muhan},
  journal={arXiv preprint arXiv:2603.28458},
  year={2026}
}

@article{zhou2026misa,
  title={MISA: Mixture of Indexer Sparse Attention for Long-Context LLM Inference},
  author={Zhou, Ruijie and Meng, Fanxu and Xu, Yufei and Liu, Tongxuan and Lu, Guangming and Zhang, Muhan and Pei, Wenjie},
  journal={arXiv preprint arXiv:2605.07363},
  year={2026}
}

@article{meng2026gqla,
  title={GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding},
  author={Meng, Fanxu},
  journal={arXiv preprint arXiv:2605.15250},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 92 Commits
CLOVER_ICML_2025		CLOVER_ICML_2025
GQLA_preprint		GQLA_preprint
HISA_preprint		HISA_preprint
MISA_preprint		MISA_preprint
TPLA_ASPLOS_2026		TPLA_ASPLOS_2026
TransMLA_NeurIPS_2025		TransMLA_NeurIPS_2025
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransArch: Hardware-Aware Architecture Transfer for Foundation Models

📦 Projects

📰 News

📋 To-Do

📚 Citation

⭐ Star History

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TransArch: Hardware-Aware Architecture Transfer for Foundation Models

📦 Projects

📰 News

📋 To-Do

📚 Citation

⭐ Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages