This release includes breaking changes for platform teams planning a safe upgrade.
✓ No known CVEs patched in this version
Topics
+11 more
Affected surfaces
Summary
AI summaryThe eBPF Python frame walker (--py-walker=ebpf) is now the headline feature, offering container support, distro-patched CPython compatibility, and enhanced multiprocessing handling.
Full changelog
The in-kernel eBPF Python frame walker (--py-walker=ebpf) is the headline of this release. After extensive validation
on container and bare-metal Linux, through fork storms and lifecycle edge cases, it now delivers the advantages the
userspace walker cannot:
- works in containers (Docker, K8s) without debug symbols or host mounts
- works on distro-patched CPython (Ubuntu 24.04, etc.) via runtime offset harvesting
- works at
kernel.yama.ptrace_scope=3with--pid X(PID-specific uprobes) - seeds state for dynamically-discovered PIDs (no
--pidrequired) - inherits state across
fork()for multiprocess workloads (DDP, Ray, torch dataloaders) - broadcasts state across multi-library libcudart setups
- surfaces drop warnings under ringbuf pressure
The userspace walker remains the --py-walker=auto default in this release, but will be deprecated in favor of the eBPF
walker once remaining items settle. See the walker roadmap.
New deployments should prefer --py-walker=ebpf today.
The new Python frame walker
What it is
The eBPF walker runs inside the kernel uprobe, reading CPython's _PyRuntime, PyThreadState, and
_PyInterpreterFrame structures directly from process memory via bpf_probe_read_user. It emits Python source frames
(py_file, py_func, py_line) alongside native stack traces in a single event.
Why it matters
- No
/proc/<pid>/memaccess needed. Works in hardenedptrace_scope=3environments where the userspace walker is
blocked. - No distro-matched dbgsym needed. The runtime offset harvester spawns the target Python and uses ctypes to discover
struct offsets empirically. Works on Ubuntu's patched CPython 3.12 where upstream offsets are wrong. - Synchronous with the CUDA event. Frame capture happens in-kernel, in the same uprobe dispatch that emits the cuda
event. No async gap, no resolver lag.
What's new in this release
- Containers: Python binary paths resolved via
/proc/<pid>/root/; harvester chroots into the target's mount
namespace before spawning so ld-linux resolves the target's libpython. - Dynamic PID discovery: walker state is pushed for PIDs discovered at runtime, not just those passed via
--pid X
at startup. - Multi-library libcudart: state is broadcast to every cuda tracer's
py_runtime_map, so workloads using any
attached libcudart (system or PyTorch bundled) get walker coverage. - Fork inheritance: children of traced Python processes inherit the parent's walker state synchronously on
sched_process_forkvia a newpytrace.CopyPIDhelper. The fork-event filter now uses tgid, fixing dropped fork events
for torch, DDP, Ray, and dataloader workloads that callos.fork()from non-main threads. - PID-specific uprobes at
ptrace_scope=3: when--pid Xis set, uprobes attach withlink.UprobeOptions{PID: X}
instead of system-wide. Reduces kernel overhead and works atptrace_scope=3where system-wide uprobes can be gated. A
startup WARN fires forscope=3 + trace-all + --py-walker=ebpf. - Drop warnings: a throttled WARN fires every 5 seconds if the Python extended record fails to reserve in the
ringbuf. Previously, Python frames silently dropped to zero with no operator feedback.
Infrastructure improvements (supporting the walker)
Critical event delivery guaranteed
OOM, process exec, fork, and exit events are now guaranteed delivery. The critical-events reader blocks on a full event
channel instead of dropping. This unblocks fork inheritance under bursty loads, OOM causal chains, and orchestrator
remediation.
Single-instance enforcement
ingero trace takes a lock at /var/run/ingero-trace.lock (or /tmp as fallback). Concurrent invocations are refused
with a clear error. Stale locks from SIGKILLed predecessors are detected and cleaned up (orphaned watchdog map
unpinned).
Code maintainability
- New
internal/procpathpackage centralizes cross-namespace path resolution. - Extracted
handlePyLifecycle(evt, pyMaps)helper; removes duplicated fork/exec/exit handling between table and JSON
modes. eventLoopConfigstruct shrinksrunTableModefrom 23 parameters to 8 andrunJSONModefrom 19 to 4.
Known issues
See README Known Issues for details.
- Multiprocess CUDA via
fork(): NVIDIA driver limitation, not an ingero bug. Use
torch.multiprocessing.set_start_method('spawn'). - Ubuntu 24.04 + distro-patched CPython 3.12 on userspace walker: produces garbage frames. Use
--py-walker=ebpf. - Trace-all at
kernel.yama.ptrace_scope=3: use--pid Xor lower scope to 1.
Walker roadmap
The userspace walker (current --py-walker=auto default) will be deprecated in an upcoming release. The eBPF walker
will be promoted to the auto default once the remaining limitations settle.
New deployments should prefer --py-walker=ebpf today. The userspace mode stays available via --py-walker=userspace
during the deprecation window.
Upgrade notes
No breaking changes. Flag semantics unchanged.
If your deployment previously ran concurrent ingero trace invocations on the same host, those are now refused with a
clear error. Use a single instance.
Validation
Unit tests: all 11 Go packages pass. EC2 validation on g4dn.xlarge (Ubuntu 24.04, CPython 3.12.3, Tesla T4) covers
binary mode, Docker mode, trace-all mode, ptrace_scope=3 with --pid, fork bursts, and SIGKILL/recovery cycles.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About ingero-io/ingero
eBPF-based GPU causal observability agent with MCP server. Traces CUDA Runtime/Driver APIs and host kernel events to build causal chains explaining GPU latency.
Related context
Related tools
Earlier breaking changes
- v0.17.0 Dropped 'annotate --socket' option from CLI.
Beta — feedback welcome: [email protected]