Skip to content

ingero-io/ingero

v0.9.2 Breaking

This release includes breaking changes for platform teams planning a safe upgrade.

Published 1mo MCP Data & Storage
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

causal-tracing cuda cuda-graphs ebpf gpu gpu-monitoring
+11 more
gpu-observability incident-response kubernetes machine-learning mcp model-context-protocol nvidia observability pytorch sre distributed-tracing

Affected surfaces

auth

Summary

AI summary

The eBPF Python frame walker (--py-walker=ebpf) is now the headline feature, offering container support, distro-patched CPython compatibility, and enhanced multiprocessing handling.

Full changelog

The in-kernel eBPF Python frame walker (--py-walker=ebpf) is the headline of this release. After extensive validation
on container and bare-metal Linux, through fork storms and lifecycle edge cases, it now delivers the advantages the
userspace walker cannot:

  • works in containers (Docker, K8s) without debug symbols or host mounts
  • works on distro-patched CPython (Ubuntu 24.04, etc.) via runtime offset harvesting
  • works at kernel.yama.ptrace_scope=3 with --pid X (PID-specific uprobes)
  • seeds state for dynamically-discovered PIDs (no --pid required)
  • inherits state across fork() for multiprocess workloads (DDP, Ray, torch dataloaders)
  • broadcasts state across multi-library libcudart setups
  • surfaces drop warnings under ringbuf pressure

The userspace walker remains the --py-walker=auto default in this release, but will be deprecated in favor of the eBPF
walker once remaining items settle. See the walker roadmap.

New deployments should prefer --py-walker=ebpf today.

The new Python frame walker

What it is

The eBPF walker runs inside the kernel uprobe, reading CPython's _PyRuntime, PyThreadState, and
_PyInterpreterFrame structures directly from process memory via bpf_probe_read_user. It emits Python source frames
(py_file, py_func, py_line) alongside native stack traces in a single event.

Why it matters

  • No /proc/<pid>/mem access needed. Works in hardened ptrace_scope=3 environments where the userspace walker is
    blocked.
  • No distro-matched dbgsym needed. The runtime offset harvester spawns the target Python and uses ctypes to discover
    struct offsets empirically. Works on Ubuntu's patched CPython 3.12 where upstream offsets are wrong.
  • Synchronous with the CUDA event. Frame capture happens in-kernel, in the same uprobe dispatch that emits the cuda
    event. No async gap, no resolver lag.

What's new in this release

  • Containers: Python binary paths resolved via /proc/<pid>/root/; harvester chroots into the target's mount
    namespace before spawning so ld-linux resolves the target's libpython.
  • Dynamic PID discovery: walker state is pushed for PIDs discovered at runtime, not just those passed via --pid X
    at startup.
  • Multi-library libcudart: state is broadcast to every cuda tracer's py_runtime_map, so workloads using any
    attached libcudart (system or PyTorch bundled) get walker coverage.
  • Fork inheritance: children of traced Python processes inherit the parent's walker state synchronously on
    sched_process_fork via a new pytrace.CopyPID helper. The fork-event filter now uses tgid, fixing dropped fork events
    for torch, DDP, Ray, and dataloader workloads that call os.fork() from non-main threads.
  • PID-specific uprobes at ptrace_scope=3: when --pid X is set, uprobes attach with link.UprobeOptions{PID: X}
    instead of system-wide. Reduces kernel overhead and works at ptrace_scope=3 where system-wide uprobes can be gated. A
    startup WARN fires for scope=3 + trace-all + --py-walker=ebpf.
  • Drop warnings: a throttled WARN fires every 5 seconds if the Python extended record fails to reserve in the
    ringbuf. Previously, Python frames silently dropped to zero with no operator feedback.

Infrastructure improvements (supporting the walker)

Critical event delivery guaranteed

OOM, process exec, fork, and exit events are now guaranteed delivery. The critical-events reader blocks on a full event
channel instead of dropping. This unblocks fork inheritance under bursty loads, OOM causal chains, and orchestrator
remediation.

Single-instance enforcement

ingero trace takes a lock at /var/run/ingero-trace.lock (or /tmp as fallback). Concurrent invocations are refused
with a clear error. Stale locks from SIGKILLed predecessors are detected and cleaned up (orphaned watchdog map
unpinned).

Code maintainability

  • New internal/procpath package centralizes cross-namespace path resolution.
  • Extracted handlePyLifecycle(evt, pyMaps) helper; removes duplicated fork/exec/exit handling between table and JSON
    modes.
  • eventLoopConfig struct shrinks runTableMode from 23 parameters to 8 and runJSONMode from 19 to 4.

Known issues

See README Known Issues for details.

  1. Multiprocess CUDA via fork(): NVIDIA driver limitation, not an ingero bug. Use
    torch.multiprocessing.set_start_method('spawn').
  2. Ubuntu 24.04 + distro-patched CPython 3.12 on userspace walker: produces garbage frames. Use --py-walker=ebpf.
  3. Trace-all at kernel.yama.ptrace_scope=3: use --pid X or lower scope to 1.

Walker roadmap

The userspace walker (current --py-walker=auto default) will be deprecated in an upcoming release. The eBPF walker
will be promoted to the auto default once the remaining limitations settle.

New deployments should prefer --py-walker=ebpf today. The userspace mode stays available via --py-walker=userspace
during the deprecation window.

Upgrade notes

No breaking changes. Flag semantics unchanged.

If your deployment previously ran concurrent ingero trace invocations on the same host, those are now refused with a
clear error. Use a single instance.

Validation

Unit tests: all 11 Go packages pass. EC2 validation on g4dn.xlarge (Ubuntu 24.04, CPython 3.12.3, Tesla T4) covers
binary mode, Docker mode, trace-all mode, ptrace_scope=3 with --pid, fork bursts, and SIGKILL/recovery cycles.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track ingero-io/ingero

Get notified when new releases ship.

Sign up free

About ingero-io/ingero

eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs and host kernel events to build causal chains explaining GPU latency.

All releases →

Related context

Earlier breaking changes

  • v0.17.0 Dropped 'annotate --socket' option from CLI.

Beta — feedback welcome: [email protected]