OpenTelemetry eBPF Instrumentation: Unbounded BPF internal metrics replay can exhaust CPU

Description

Summary

OBI replays BPF probe hits into histogram observations by looping once per recorded run count. On busy systems, the run-count delta can become very large, causing the metrics exporter to spend excessive CPU time in a tight loop every collection interval.

Details

The vulnerable loop is in pkg/export/prom/prom_bpf.go. During each metrics tick, OBI iterates through probeMetrics and then executes for range metric.count, invoking BpfProbeLatency(...) for each individual recorded hit.

The count comes from calculateStats() in the same file, where deltaCount := bp.runCount - bp.prevRunCount is calculated and returned without any cap before the per-hit replay loop.

If probe activity spikes between scrape intervals, deltaCount can be very large. The exporter then spends CPU time proportional to the number of probe hits rather than the number of metric series.

PoC

Local testing with a small reproducer confirmed the replay-loop behavior and showed CPU scaling with the recorded hit count rather than the number of metric series.

Use a vulnerable build and enable internal metrics export:

git checkout v0.0.0-rc.1+build
make build
export OTEL_EBPF_INTERNAL_METRICS_PROMETHEUS_PORT=9090
sudo ./bin/obi

Create a high-rate workload that repeatedly exercises traced probes. For example, generate HTTP traffic against an instrumented service:

python3 -m http.server 18081

Then drive it:

seq 1 500000 | xargs -P 128 -I{} curl -s http://127.0.0.1:18081 >/dev/null

At the same time, scrape metrics repeatedly:

while true; do curl -s http://127.0.0.1:9090/metrics >/dev/null; done

On a vulnerable build, OBI CPU consumption rises sharply during the metrics loop because histogram updates are replayed once per counted probe execution. The effect is visible in top or pidstat and is most pronounced under sustained high request volume.

Impact

This is an availability issue in the internal metrics path. Any deployment that enables BPF internal metrics and traces busy workloads is affected. Attackers can indirectly consume CPU in the privileged agent by driving enough activity through instrumented services.

Basic information

Type
reviewed
Severity
medium
Advisory on GitHub
Open advisory ↗
Repository advisory
Open repository advisory ↗
Source code
Browse source ↗
Published (advisory)
2026-05-18 20:11:13 UTC
Updated
2026-06-09 10:58:33 UTC
GitHub reviewed
2026-05-18 20:11:13 UTC
NVD published
2026-06-02

EPSS Score

Score Percentile
0.05% 16.65%

CVSS Scores

Base score Version Severity Vector
5.9 3.1
CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:U/C:N/I:N/A:H Click to expand
Attack vector (AV:N)
Could be attacked over the internet or any normal routed network—not just someone sitting at the machine.
Attack complexity (AC:H)
Even with access, the exploit needs extra luck, timing, or a fussy environment to actually work.
Privileges required (PR:N)
No account or special rights needed—anonymous or random user is enough.
User interaction (UI:N)
Nobody has to click “OK” or open a trap file; it can work without a victim helping.
Scope (S:U)
Damage stays in the same “trust bubble” as the broken component—no big spill into unrelated systems.
Confidentiality (C:N)
Doesn’t really leak secrets in a meaningful way.
Integrity (I:N)
Data isn’t meaningfully altered or forged.
Availability (A:H)
Could take the service down hard or make it unusable for people who depend on it.

Identifiers

CWEs

CWE id Name
CWE-400 Uncontrolled Resource Consumption
CWE-834 Excessive Iteration

Credits

  • MrAlias (reporter)

Affected packages (1)

Vulnerable version ranges and first patched releases as published by GitHub.

Ecosystem Package Vulnerable range First patched Vulnerable functions
go go.opentelemetry.io/obi < 0.9.0 0.9.0

References

cvelogic Threat Intelligence