vllm CVE Vulnerabilities & CVE List (35)

Products (CPE): — CVEs: 35

vllm vulnerability overview

Aggregates CVE and security vulnerability intelligence across all vllm-related products, including CVSS, EPSS, publication dates, and vulnerability intelligence data.

Historical issues mainly involve vendor risk buffer overflow and vendor risk ssrf and related problems; some flaws may lead to vendor impact application crash, affecting vendor surface software deployment scenarios.

Vulnerability distribution trend (last 24 months)

CVE	Summary	Source	Max CVSS	EPSS %	Published	Updated
CVE-2026-44223	vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition	[email protected]	6.5	0.04%	2026-05-12	2026-05-15
CVE-2026-44222	vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and video placeholder sequences supplied without matching data cause vLLM to index into empty grids during input-position computation, raising an unhandled IndexError and terminating the worker or degrading availability. Multi	[email protected]	6.5	0.04%	2026-05-12	2026-05-14
CVE-2026-7141	A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack remotely. The attack is considered to have high complexity. The exploitability is described as difficult. The exploit has been made public and could be used. The patch is named 1ad67864c0c20f167929e64c875f5c28e1aad9fd. To	[email protected]	2.9	0.05%	2026-04-27	2026-05-01
CVE-2026-34756	vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest and CompletionRequest Pydantic models, an unauthenticated attacker can send a single HTTP request with an astronomically large n value. This completely blocks the Python asyncio event loop and causes immediate Out-Of-Memo	[email protected]	6.5	0.03%	2026-04-06	2026-04-20
CVE-2026-34755	vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame count limit. The num_frames parameter (default: 32), which is enforced by the load_bytes() code path, is completely bypassed in the video/jpeg base64 path. An attacker can send a single API request containing thousands of co	[email protected]	6.5	0.05%	2026-04-06	2026-04-20
CVE-2026-34753	vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HTTP/HTTPS requests from the server, without any URL validation or domain restrictions. This can be used to target internal services (e.g. cloud metadata endpoints or internal HTTP APIs) reachable from the vLLM host. This	[email protected]	5.4	0.03%	2026-04-06	2026-04-20
CVE-2026-34760	vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm. This discrepancy results in inconsistency between audio heard by humans (e.g., through headphones/regular speakers) and audio processed by AI models (Which infra via Librosa, such as vllm, transformer). This issue has been	[email protected]	5.9	0.06%	2026-04-02	2026-05-11
CVE-2026-27893	vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.18.0, two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit `--trust-remote-code=False` security opt-out. This enables remote code execution via malicious model repositories even when the user has explicitly disabled remote code trust. Version 0.18.0 patches the issue.	[email protected]	8.8	0.03%	2026-03-27	2026-03-30
CVE-2026-25960	vLLM is an inference and serving engine for large language models (LLMs). The SSRF protection fix for CVE-2026-24779 add in 0.15.1 can be bypassed in the load_from_url_async method due to inconsistent URL parsing behavior between the validation layer and the actual HTTP client. The SSRF fix uses urllib3.util.parse_url() to validate and extract the hostname from user-provided URLs. However, load_from_url_async uses aiohttp for making the actual HTTP requests, and aiohttp internally uses the yarl	[email protected]	7.1	0.02%	2026-03-09	2026-03-18
CVE-2026-22778	vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14.1, when an invalid image is sent to vLLM's multimodal endpoint, PIL throws an error. vLLM returns this error to the client, leaking a heap address. With this leak, we reduce ASLR from 4 billion guesses to ~8 guesses. This vulnerability can be chained a heap overflow with JPEG2000 decoder in OpenCV/FFmpeg to achieve remote code execution. This vulnerability is fixed in 0.14.1.	[email protected]	9.8	0.06%	2026-02-02	2026-02-23
CVE-2026-24779	vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods obtain and process media from URLs provided by users, using different Python parsing libraries when restricting the target host. These two parsing libraries have different interpretations of backslashes, which all	[email protected]	7.1	0.04%	2026-01-27	2026-01-30
CVE-2026-22807	vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.14.0, vLLM loads Hugging Face `auto_map` dynamic modules during model resolution without gating on `trust_remote_code`, allowing attacker-controlled Python code in a model repo/path to execute at server startup. An attacker who can influence the model repo/path (local directory or remote Hugging Face repo) can achieve arbitrary code execution on the vLLM host during model l	[email protected]	8.8	0.02%	2026-01-21	2026-01-30
CVE-2026-22773	vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This causes a tensor dimension mismatch that results in an unhandled runtime error, leading to complete server termination. This issue has been patched in version 0.12.0.	[email protected]	6.5	0.02%	2026-01-10	2026-01-27
CVE-2025-66448	vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves that mapping with get_class_from_dynamic_module(...) and immediately instantiates the returned class. This fetches and executes Python from the remote repository referenced in the auto_map string. Crucially, this happens	[email protected]	7.1	0.04%	2025-12-01	2025-12-03
CVE-2025-62426	vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat template. With the right chat_template_kwargs parameters, it is possible to block processing of the API server for long periods of time, delaying all other requests. This issue has been patched in version 0.11.1.	[email protected]	6.5	0.05%	2025-11-21	2025-12-04
CVE-2025-62372	vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong), regardless of whether the model is intended to support such inputs (as defined in the Supported Models page). This issue has been patched in version 0.11.1.	[email protected]	8.3	0.06%	2025-11-21	2025-12-04
CVE-2025-62164	vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When processing user-supplied prompt embeddings, the endpoint loads serialized tensors using torch.load() without sufficient validation. Due to a change introduced in PyTorch 2.8.0, sparse tensor integrity checks are disabled	[email protected]	8.8	0.19%	2025-11-21	2025-12-04
CVE-2025-59425	vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more characters the provided API key gets correct. Data analysis across many attempts could allow an attacker to determine when it finds the next correct character in the key sequence. Deployments relying on vLLM's built-in API key	[email protected]	7.5	0.28%	2025-10-07	2025-10-16
CVE-2025-48956	vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory exhaustion, potentially leading to a crash or unresponsiveness. The attack does not require authentication, making it exploitable by any remote user. This vulnerability is fixed in 0.10.1.1.	[email protected]	7.5	0.27%	2025-08-21	2025-10-09
CVE-2025-48944	vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the tools functionality is invoked. These inputs are not validated before being compiled or parsed, causing a crash of the inference worker with a single request. The worker will remain down until it is restarted. Version 0.9	[email protected]	6.5	0.32%	2025-05-30	2025-07-01

cvelogic Threat Intelligence