-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Remove Qwen Omni workaround that's no longer necessary
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
#21057
opened Jul 16, 2025 by
hmellor
Loading…
[Feature][EPLB] Add EPLB support for MiniMax-01
#21056
opened Jul 16, 2025 by
haveheartt
Loading…
1 of 4 tasks
[Bugfix]: Fix final_res_batch list index out of range error
frontend
#21055
opened Jul 16, 2025 by
chaunceyjiang
Loading…
[Frontend] Add explicit validation of tool length when tool_choice="required" in OpenAI server
frontend
#21052
opened Jul 16, 2025 by
n0gu-furiosa
Loading…
[fix] fix qwen image_embeds input
qwen
Related to Qwen models
ready
ONLY add when PR is ready to merge/full CI is needed
#21049
opened Jul 16, 2025 by
h-avsha
Loading…
[V1] [KVConnector] Fix MultiprocExecutor worker output aggregation
v1
#21048
opened Jul 16, 2025 by
sdavidbd
Loading…
Add test case for compiling multiple graphs
#21044
opened Jul 16, 2025 by
sarckk
Loading…
3 of 4 tasks
Fix minor docs issues and fix metric requests
documentation
Improvements or additions to documentation
v1
#21040
opened Jul 16, 2025 by
SriRangaTarun
•
Draft
[XPU] Add xpu support for fusion and collective fusion
#21036
opened Jul 16, 2025 by
chaojun-zhang
Loading…
4 tasks
[Feature][EPLB] Add eplb support for Qwen2
qwen
Related to Qwen models
#21035
opened Jul 16, 2025 by
lengrongfu
•
Draft
1 of 4 tasks
Add tokenization_kwargs to encode for embedding model truncation
v1
#21033
opened Jul 16, 2025 by
Receiling
Loading…
[bugfix][WIP] Fix auto thread-binding when world_size > 1 in CPU backend and refactor code
ci/build
documentation
Improvements or additions to documentation
v1
#21032
opened Jul 16, 2025 by
bigPYJ1151
Loading…
1 of 4 tasks
Enable sequence parallelism for full cuda graph without specifying compile sizes
#21031
opened Jul 16, 2025 by
cascade812
Loading…
[Bugfix] weight loading use correct tp_group with patch_tensor_parallel_group
#21024
opened Jul 16, 2025 by
Kevin-XiongC
Loading…
4 tasks
Add the instruction to run e2e validation manually before release
documentation
Improvements or additions to documentation
#21023
opened Jul 16, 2025 by
huydhn
Loading…
4 tasks done
[XPU] Enable external_launcher to serve as an executor via torchrun
v1
#21021
opened Jul 16, 2025 by
chaojun-zhang
Loading…
4 tasks
[Misc] Minor comment reorganization in capture_model()
v1
#21015
opened Jul 15, 2025 by
ruisearch42
Loading…
3 of 4 tasks
[Docker] Allow FlashInfer to be built in the ARM CUDA Dockerfile
ci/build
ready
ONLY add when PR is ready to merge/full CI is needed
#21013
opened Jul 15, 2025 by
mgoin
Loading…
4 tasks
[protocol] Add request_id to the Request object so they can be controlled better via external load balancers
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#21009
opened Jul 15, 2025 by
kouroshHakha
Loading…
4 tasks
[Not for merge] Unshift eagle prefill
documentation
Improvements or additions to documentation
llama
Related to Llama models
needs-rebase
new-model
Requests to new models
speculative-decoding
v1
#21008
opened Jul 15, 2025 by
morgendave
•
Draft
4 tasks
[Core] Avoid KVCacheBlock.__eq__ invocations in FreeKVCacheBlockQueue
performance
Performance-related issues
v1
#21005
opened Jul 15, 2025 by
JialinOuyang-Meta
Loading…
Start using py3.12 for TPU.
ci/build
documentation
Improvements or additions to documentation
tpu
Related to Google TPUs
#21000
opened Jul 15, 2025 by
vanbasten23
Loading…
3 of 4 tasks
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.