Inject default thinking_token_budget and presence_penalty for vLLM, fixing the gap where --override-generation-config doesn't propagate these fields. Prevents Qwen3 thinking-mode infinite loops.
-
Updated
Apr 26, 2026 - Shell
Inject default thinking_token_budget and presence_penalty for vLLM, fixing the gap where --override-generation-config doesn't propagate these fields. Prevents Qwen3 thinking-mode infinite loops.
Add a description, image, and links to the sampling-params topic page so that developers can more easily learn about it.
To associate your repository with the sampling-params topic, visit your repo's landing page and select "manage topics."