Skip to content

Question: Applicability of TransMLA to DLM #37

@amandpkr

Description

@amandpkr

Hi Authors,

Thank you for the amazing work and for open-sourcing this project!

I had a quick question — do you think TransMLA could be applied to Diffusion Language Models (DL models)?
I’m currently exploring ways to reduce the K–V cache memory during inference in diffusion-based language or vision–language models.

Since TransMLA provides a theoretical and practical framework for converting GQA-based architectures into MLA with compressed KV caches, I was wondering whether a similar idea could be used for the iterative denoising steps in diffusion models.

Would love to hear your thoughts on whether TransMLA’s low-rank latent compression or RoRoPE decoupling could extend to diffusion-style attention or cross-attention blocks.

Thanks again for this great contribution!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions