Implemented generic multimodal chat handler. by alcoftTAO · Pull Request #125 · JamePeng/llama-cpp-python

alcoftTAO · 2026-05-04T19:19:44Z

Implemented a generic/global multimodal chat handler.

What does it do?

It automatically uses the model's chat template and replaces all of the model's multimodal tags with the media_marker tag.

This allows a much easier implementation for multimodal models, since the chat template doesn't need to be hard-coded for each model.

How to use it?

It is as simple as passing the clip_model_path parameter to the Llama class when created.

Note

Using the previous implementation (e.g. Qwen35ChatHandler) still works.

I'm also looking forward to implement more model architectures. Please, reply if you want me to implement any.

JamePeng · 2026-05-05T21:30:48Z

You can take a look at how to improve the injection process. #110

JamePeng · 2026-05-13T16:43:04Z

It seems there's no work on how to perform URL injection for multimedia; simply replacing it with a media marker isn't enough.

This code also needs to be removed:

if hasattr(llama, 'input_ids'):
    llama.input_ids.fill(0)

Architecture-based tag guessing should not default unknown models to Qwen-style tags. Prefer detecting media tags from the actual chat template, or better, avoid tag guessing by normalizing OpenAI content parts into placeholders before rendering.

KNOWN_MEDIA_TAGS = [
    "<|image_pad|>",
    "<|audio_pad|>",
    "<|video_pad|>",
    "<|image|>",
    "<|audio|>",
    "<|video|>",
    "[IMG]",
]

and

self._chat_format_parser_tags = [
    tag for tag in KNOWN_MEDIA_TAGS
    if tag in self.chat_format
]

In addition, a check is needed to ensure that the number of replacement markers matches the number of incoming media.

alcoftTAO · 2026-05-16T04:45:56Z

@JamePeng What do you think of this code?

JamePeng · 2026-05-16T12:34:21Z

You can test the multimodal usage of qwen3vl, qwen3.5/3.6, and gemma4.
In particular, check if the omni function of gemma4 is affected.

Signed-off-by: JamePeng <jame_peng@sina.com>

- Add a PowerShell step to the Windows CI workflow to locate and copy `libomp140.x86_64.dll` from the Visual Studio redistributables. - Place the runtime DLL into the `llama_cpp\lib` package directory. This ensures that the dynamically loaded `ggml-cpu-*.dll` variants (which are built with LLVM OpenMP on Windows) have their required dependencies packaged in the wheel. Without this, `ggml_backend_load_all_from_path()` can silently fail to load the CPU backends at runtime on end-user machines. Signed-off-by: JamePeng <jame_peng@sina.com>

alcoftTAO added 4 commits May 4, 2026 20:58

Implemented generic multimodal chat handler.

1f5226b

Used text.replace()

a8d19d3

Fixed some bugs.

3e031d5

Implemented 'chat_handler_kwargs'.

389d0d9

alcoftTAO marked this pull request as draft May 14, 2026 15:19

fix

9187910

JamePeng force-pushed the main branch 8 times, most recently from e1caafb to 628373c Compare May 16, 2026 12:09

JamePeng and others added 3 commits May 19, 2026 19:25

Update Submodule vendor/llama.cpp 39cf5d6..6db1304

b48d57a

Signed-off-by: JamePeng <jame_peng@sina.com>

Merge branch 'JamePeng:main' into mtmd

4d9af07

JamePeng force-pushed the main branch from f309265 to dd61687 Compare May 19, 2026 14:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented generic multimodal chat handler.#125

Implemented generic multimodal chat handler.#125
alcoftTAO wants to merge 8 commits into
JamePeng:mainfrom
TAO71-AI:mtmd

alcoftTAO commented May 4, 2026 •

edited

Loading

Uh oh!

JamePeng commented May 5, 2026

Uh oh!

JamePeng commented May 13, 2026 •

edited

Loading

Uh oh!

alcoftTAO commented May 16, 2026

Uh oh!

JamePeng commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alcoftTAO commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does it do?

How to use it?

Uh oh!

JamePeng commented May 5, 2026

Uh oh!

JamePeng commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alcoftTAO commented May 16, 2026

Uh oh!

JamePeng commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alcoftTAO commented May 4, 2026 •

edited

Loading

JamePeng commented May 13, 2026 •

edited

Loading