eBPF agent saturates internal queue (sending queue is full) on one Kubernetes node even with restricted instrumentation

**Description**

Hello team,
I’m facing an issue where the eBPF OTel agent saturates its internal queue (sending queue is full) repeatedly on one Kubernetes node, even with:

- restricted namespaces in `discovery.instrument`
- additional `exclude_services`
- reduced instrumentation scope
- and the OpenTelemetry Collector in front of Instana (not sending directly)

**Environment:**

- Kubernetes cluster: OpenShift 4.x (IBM Cloud)
- eBPF instrumentation version: v0.2.0
- Image: custom registry (`ghcr.io/open-telemetry/opentelemetry-ebpf-instrumentation/ebpf-instrument`)
- Mode: DaemonSet (12 nodes)
- Only one node out of the 12 produces the issue

**Observed behavior:**

On one worker node, the agent repeatedly logs:
```
time=... level=ERROR msg="error sending trace to consumer" error="sending queue is full"
time=... level=ERROR msg="error sending trace to consumer" error="sending queue is full"
time=... level=ERROR msg="error sending trace to consumer" error="sending queue is full"
time=... level=ERROR msg="error sending trace to consumer" error="sending queue is full"
```
This happens continuously.
The logs also show very frequent process attachment:

```
instrumenting process cmd=/myapp/app/bin/tg_loader ...
instrumenting process cmd=/myapp/app/bin/tg_daemon ...
instrumenting process cmd=/usr/bin/coreutils ...
instrumenting process cmd=/usr/bin/bash ...

```
Even after tightening the configuration (namespace restrictions, exe_path exclusions,  etc.), the issue persists on this node only.

**Troubleshooting done:**

- Restricted discovery.instrument to only the binaries of interest (/myapp/app/bin/*)
- Excluded noisy executables (bash, coreutils, conmon)
- Tried with and without OpenTelemetry Collector
- Added sampling on the collector side
- Enabled (or prepared to enable) k8s-cache component
- Verified node load (CPU/Memory OK)
- Verified that the OTLP receiver is not lagging

**Questions:**

1. Is the internal eBPF queue size configurable?
2. Is this behavior expected under high syscall churn?
3. Could the process attach/detach loop cause queue saturation?
4. Is this known / documented behavior in v0.2.0?
5. Should the sampling or filtering also apply before the queue?
6. Any recommended mitigations for a single-node hotspot?

**Thanks!**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

eBPF agent saturates internal queue (sending queue is full) on one Kubernetes node even with restricted instrumentation #957

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

eBPF agent saturates internal queue (sending queue is full) on one Kubernetes node even with restricted instrumentation #957

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions