In the Linux kernel, the following vulnerability has been resolved: bpf: Fix race in cpumap on...

Description

In the Linux kernel, the following vulnerability has been resolved:

bpf: Fix race in cpumap on PREEMPT_RT

On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed
concurrently by multiple preemptible tasks on the same CPU.

The original code assumes bq_enqueue() and __cpu_map_flush() run
atomically with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_flush_to_queue(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.

This leads to several races:

  1. Double __list_del_clearprev(): after bq->count is reset in
    bq_flush_to_queue(), a preempting task can call bq_enqueue() ->
    bq_flush_to_queue() on the same bq when bq->count reaches
    CPU_MAP_BULK_SIZE. Both tasks then call __list_del_clearprev()
    on the same bq->flush_node, the second call dereferences the
    prev pointer that was already set to NULL by the first.

  2. bq->count and bq->q[] races: concurrent bq_enqueue() can corrupt
    the packet queue while bq_flush_to_queue() is processing it.

The race between task A (__cpu_map_flush -> bq_flush_to_queue) and
task B (bq_enqueue -> bq_flush_to_queue) on the same CPU:

Task A (xdp_do_flush) Task B (cpu_map_enqueue)
---------------------- ------------------------
bq_flush_to_queue(bq)
spin_lock(&q->producer_lock)
/ flush bq->q[] to ptr_ring /
bq->count = 0
spin_unlock(&q->producer_lock)
bq_enqueue(rcpu, xdpf)
<-- CFS preempts Task A --> bq->q[bq->count++] = xdpf
/ ... more enqueues until full ... /
bq_flush_to_queue(bq)
spin_lock(&q->producer_lock)
/ flush to ptr_ring /
spin_unlock(&q->producer_lock)
__list_del_clearprev(flush_node)
/ sets flush_node.prev = NULL /
<-- Task A resumes -->
__list_del_clearprev(flush_node)
flush_node.prev->next = ...
/ prev is NULL -> kernel oops /

Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
in bq_enqueue() and __cpu_map_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.

To reproduce, insert an mdelay(100) between bq->count = 0 and
__list_del_clearprev() in bq_flush_to_queue(), then run reproducer
provided by syzkaller.

Basic information

Type
unreviewed
Severity
medium
Advisory on GitHub
Open advisory ↗
Repository advisory
Source code
Not specified
Published (advisory)
2026-03-25 12:30:23 UTC
Updated
2026-04-23 21:32:26 UTC
NVD published
2026-03-25 11:16:32 UTC

EPSS Score

Score Percentile
0.01% 1.70%

CVSS Scores

Base score Version Severity Vector
4.7 3.1
CVSS:3.1/AV:L/AC:H/PR:L/UI:N/S:U/C:N/I:N/A:H Click to expand
Attack vector (AV:L)
They already need access on the box, or another person has to do something wrong; it’s not a remote drive-by.
Attack complexity (AC:H)
Even with access, the exploit needs extra luck, timing, or a fussy environment to actually work.
Privileges required (PR:L)
A normal user session is enough; they don’t have to be admin.
User interaction (UI:N)
Nobody has to click “OK” or open a trap file; it can work without a victim helping.
Scope (S:U)
Damage stays in the same “trust bubble” as the broken component—no big spill into unrelated systems.
Confidentiality (C:N)
Doesn’t really leak secrets in a meaningful way.
Integrity (I:N)
Data isn’t meaningfully altered or forged.
Availability (A:H)
Could take the service down hard or make it unusable for people who depend on it.

Identifiers

CWEs

CWE id Name
CWE-362 Concurrent Execution using Shared Resource with Improper Synchronization ('Race Condition')

References

cvelogic Threat Intelligence