GHSA-83vm-p52w-f9pw · Severity: medium · Ecosystem: pip — vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
vLLM is an inference and serving engine for large language models (LLMs). From 0.18.0 to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
Conclusion & alert: CVE-2026-44223 is rated Low Risk (36.2/100): CVSS Medium severity, with low exploitation likelihood (EPSS 0.37%). Mandatory action: Monitor for updates and reassess as exploit intelligence or EPSS changes.
Risk is dynamic; we continuously reassess and refresh what is shown on this page as upstream context changes.
EPSS lead: Daily EPSS estimates relative likelihood of exploitation; percentile ranks this CVE among scored vulnerabilities (higher = more severe relative rank).
| # | Date | Old EPSS score | New EPSS score | Delta (New - Old) |
|---|---|---|---|---|
| 1 | 2026-06-15 | 0.04% | 0.37% | +0.33% |
| 2 | 2026-05-13 | — | 0.04% | — |
Full EPSS history (2 records total)
CVSS metrics for this CVE.
| Base score | Version | Severity | Vector | Exploitability | Impact | Score source |
|---|---|---|---|---|---|---|
| 6.5 | 3.1 | MEDIUM |
|
2.8 | 3.6 | [email protected] |
GHSA-83vm-p52w-f9pw · Severity: medium · Ecosystem: pip — vLLM: extract_hidden_states speculative decoding crashes server on any request with penalty parameters
| URL | Tags |
|---|---|
| https://github.com/vllm-project/vllm/pull/38610 | Issue Tracking Patch |
| https://github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pw | Mitigation Vendor Advisory |