GHSA-f8m6-h2c7-8h9x · Severity: high · Ecosystem: pip — Inefficient Regular Expression Complexity in nltk (word_tokenize, sent_tokenize)
NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Versions prior to 3.6.5 are vulnerable to regular expression denial of service (ReDoS) attacks. The vulnerability is present in PunktSentenceTokenizer, sent_tokenize and word_tokenize. Any users of this class, or these two functions, are vulnerable to the ReDoS attack. In short, a specifically crafted long input to any of these vulnerable functions will cause them to take a significant amount of execution time. If your program relies on any of the vulnerable functions for tokenizing unpredictable user input, then we would strongly recommend upgrading to a version of NLTK without the vulnerability. For users unable to upgrade the execution time can be bounded by limiting the maximum length of an input to any of the vulnerable functions. Our recommendation is to implement such a limit.
Conclusion & alert: CVE-2021-43854 is rated High Exploit Risk (76.8/100): CVSS High severity, with medium exploitation likelihood (EPSS 2.67%). Core evidence: 3 public exploit reference(s) are indexed (Exploit-DB). EPSS rose +2.52% over the last day, indicating growing attacker interest. Mandatory action: Public exploits are available—assess exposure, apply mitigations, and prioritize patching.
Risk is dynamic; we continuously reassess and refresh what is shown on this page as upstream context changes.
| EDB-ID | Source | Kind | Published | Link |
|---|---|---|---|---|
| — | nvd_ref | exploit_tag | Exploit-DB ↗ | |
| — | nvd_ref | exploit_tag | Exploit-DB ↗ | |
| — | nvd_ref | exploit_tag | Exploit-DB ↗ |
EPSS lead: Daily EPSS estimates relative likelihood of exploitation; percentile ranks this CVE among scored vulnerabilities (higher = more severe relative rank).
| # | Date | Old EPSS score | New EPSS score | Delta (New - Old) |
|---|---|---|---|---|
| 1 | 2026-06-15 | 0.14% | 2.67% | +2.52% |
| 2 | 2026-05-02 | 0.84% | 0.14% | -0.69% |
| 3 | 2025-11-21 | — | 0.84% | — |
Full EPSS history (15 records total)
CVSS metrics for this CVE.
| Base score | Version | Severity | Vector | Exploitability | Impact | Score source |
|---|---|---|---|---|---|---|
| 7.5 | 3.1 | HIGH |
|
3.9 | 3.6 | [email protected] |
| 5.0 | 2.0 | MEDIUM |
|
10.0 | 2.9 | [email protected] |
GHSA-f8m6-h2c7-8h9x · Severity: high · Ecosystem: pip — Inefficient Regular Expression Complexity in nltk (word_tokenize, sent_tokenize)
| vendor | priority | summary | link |
|---|---|---|---|
debian
|
not yet assigned | CVE-2021-43854 not yet assigned priority: Debian including 1 source packages (nltk), 5 status rows across 5 suites (bookworm, bullseye, forky, sid, trixie): resolved 4, open 1. | https://security-tracker.debian.org/tracker/CVE-2021-43854 |
ubuntu
|
medium | CVE-2021-43854 medium priority: Ubuntu including 1 source packages (nltk), 14 status rows across 14 suites (bionic, focal, hirsute, impish, jammy, kinetic, lunar, mantic, noble, oracular, plucky, trusty, upstream, xenial): not-affected 6, released 5, ignored 3. | https://ubuntu.com/security/CVE-2021-43854 |
| URL | Tags |
|---|---|
| https://github.com/nltk/nltk/commit/1405aad979c6b8080dbbc8e0858f89b2e3690341 | Patch Third Party Advisory |
| https://github.com/nltk/nltk/issues/2866 | Exploit Issue Tracking Patch Third Party Advisory |
| https://github.com/nltk/nltk/pull/2869 | Exploit Patch Third Party Advisory |
| https://github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x | Exploit Patch Third Party Advisory |