CVE-2021-43854 | Inefficient Regular Expression Complexity in nltk

Exp

NLTK (Natural Language Toolkit) is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing. Versions prior to 3.6.5 are vulnerable to regular expression denial of service (ReDoS) attacks. The vulnerability is present in PunktSentenceTokenizer, sent_tokenize and word_tokenize. Any users of this class, or these two functions, are vulnerable to the ReDoS attack. In short, a specifically crafted long input to any of these vulnerable functions will cause them to take a significant amount of execution time. If your program relies on any of the vulnerable functions for tokenizing unpredictable user input, then we would strongly recommend upgrading to a version of NLTK without the vulnerability. For users unable to upgrade the execution time can be bounded by limiting the maximum length of an input to any of the vulnerable functions. Our recommendation is to implement such a limit.

Published: 2021-12-23 Last update: 2026-06-17 Assigner: [email protected] Source: [email protected]

Conclusion & alert: CVE-2021-43854 is rated High Exploit Risk (76.8/100): CVSS High severity, with medium exploitation likelihood (EPSS 2.67%). Core evidence: 3 public exploit reference(s) are indexed (Exploit-DB). EPSS rose +2.52% over the last day, indicating growing attacker interest. Mandatory action: Public exploits are available—assess exposure, apply mitigations, and prioritize patching.

Risk is dynamic; we continuously reassess and refresh what is shown on this page as upstream context changes.

Public exploit references (Exploit-DB) for CVE-2021-43854

EDB-ID Source Kind Published Link
nvd_ref exploit_tag Exploit-DB ↗
nvd_ref exploit_tag Exploit-DB ↗
nvd_ref exploit_tag Exploit-DB ↗

Exploit prediction scoring system (EPSS) score for CVE-2021-43854

EPSS lead: Daily EPSS estimates relative likelihood of exploitation; percentile ranks this CVE among scored vulnerabilities (higher = more severe relative rank).

# Date Old EPSS score New EPSS score Delta (New - Old)
1 2026-06-15 0.14% 2.67% +2.52%
2 2026-05-02 0.84% 0.14% -0.69%
3 2025-11-21 0.84%

Full EPSS history (15 records total)

Common vulnerability scoring system (CVSS) metrics for CVE-2021-43854

CVSS metrics for this CVE.

Base score Version Severity Vector Exploitability Impact Score source
7.5 3.1 HIGH
CVSS:3.1/AV:N/AC:L/PR:N/UI:N/S:U/C:N/I:N/A:H Click to expand
Attack vector (AV:N)
Could be attacked over the internet or any normal routed network—not just someone sitting at the machine.
Attack complexity (AC:L)
Once they can reach the bug, pulling it off is straightforward—no weird race conditions or rare setup.
Privileges required (PR:N)
No account or special rights needed—anonymous or random user is enough.
User interaction (UI:N)
Nobody has to click “OK” or open a trap file; it can work without a victim helping.
Scope (S:U)
Damage stays in the same “trust bubble” as the broken component—no big spill into unrelated systems.
Confidentiality (C:N)
Doesn’t really leak secrets in a meaningful way.
Integrity (I:N)
Data isn’t meaningfully altered or forged.
Availability (A:H)
Could take the service down hard or make it unusable for people who depend on it.
3.9 3.6 [email protected]
5.0 2.0 MEDIUM
AV:N/AC:L/Au:N/C:N/I:N/A:P Click to expand
Access vector (AV:N)
Can be exploited remotely over network reachability.
Access complexity (AC:L)
Exploitation conditions are straightforward and predictable.
Authentication (AU:N)
No authentication is required.
Confidentiality impact (C:N)
No confidentiality impact.
Integrity impact (I:N)
No integrity impact.
Availability impact (A:P)
Partial availability impact.
10.0 2.9 [email protected]

Weakness enumeration for CVE-2021-43854

GitHub Security Advisory for CVE-2021-43854

GHSA-f8m6-h2c7-8h9x · Severity: high · Ecosystem: pip — Inefficient Regular Expression Complexity in nltk (word_tokenize, sent_tokenize)

OS Trackers for CVE-2021-43854

vendor priority summary link
debian not yet assigned CVE-2021-43854 not yet assigned priority: Debian including 1 source packages (nltk), 5 status rows across 5 suites (bookworm, bullseye, forky, sid, trixie): resolved 4, open 1. https://security-tracker.debian.org/tracker/CVE-2021-43854
ubuntu medium CVE-2021-43854 medium priority: Ubuntu including 1 source packages (nltk), 14 status rows across 14 suites (bionic, focal, hirsute, impish, jammy, kinetic, lunar, mantic, noble, oracular, plucky, trusty, upstream, xenial): not-affected 6, released 5, ignored 3. https://ubuntu.com/security/CVE-2021-43854

Affected software / configurations for CVE-2021-43854

Vendor Product Version Raw CPE
nltk nltk < 3.6.5 cpe:2.3:a:nltk:nltk:*:*:*:*:*:*:*:*

References for CVE-2021-43854

URL Tags
https://github.com/nltk/nltk/commit/1405aad979c6b8080dbbc8e0858f89b2e3690341 Patch Third Party Advisory
https://github.com/nltk/nltk/issues/2866 Exploit Issue Tracking Patch Third Party Advisory
https://github.com/nltk/nltk/pull/2869 Exploit Patch Third Party Advisory
https://github.com/nltk/nltk/security/advisories/GHSA-f8m6-h2c7-8h9x Exploit Patch Third Party Advisory
cvelogic Threat Intelligence