sampling-params

Here is 1 public repository matching this topic...

palmfuture / vllm-default-thinking-budget

Inject default thinking_token_budget and presence_penalty for vLLM, fixing the gap where --override-generation-config doesn't propagate these fields. Prevents Qwen3 thinking-mode infinite loops.

monkey-patch vllm llm-inference qwen reasoning-models qwen3 thinking-budget sampling-params

Updated Apr 26, 2026
Shell

Improve this page

Add a description, image, and links to the sampling-params topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the sampling-params topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling-params

Here is 1 public repository matching this topic...

palmfuture / vllm-default-thinking-budget

Improve this page

Add this topic to your repo