Skip to content

ErenAri/Aegis-BPF

v0.4.0 Breaking

This release includes 2 breaking changes for platform teams planning a safe upgrade.

Published 1mo Network Security
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

bpf bpf-lsm cloud-native-security container-security ebpf helm
+9 more
incident-response kubernetes-security linux-kernel linux-security observability policy-enforcement prometheus runtime-security workload-securi

Affected surfaces

auth rbac rce_ssrf breaking_upgrade

Summary

AI summary

kLayoutVersion bumped to 2 and policy format upgraded to v6.

Full changelog

Highlights

v0.4.0 lands the complete 3-phase roadmap to elevate AegisBPF to gold-standard eBPF runtime security, plus the Phase 0 production-audit fixes, the multi-tenant cgroup-scoped policy groundwork, the operator status-conditions upgrade, and an honesty pass on the competitive performance story.

Phase 1 — Kernel Enforcement Perimeter

  • OverlayFS copy-up propagation — new LSM inode_copy_up hook detects when denied lower-layer inodes are promoted to the overlay upper layer (containers, overlay-on-overlay). The userspace OverlayCopyUpPropagator re-stats the original path, discovers the new inode, and propagates deny flags into deny_inode_map, closing the classic OverlayFS inode-disassociation bypass. Mappings are persisted to deny_db for restart resilience.
  • Full socket lifecycle enforcementsocket_connect, socket_bind, port-oriented socket_listen, accepted-peer socket_accept, outbound socket_sendmsg, and inbound socket_recvmsg. recvmsg blocks data reception from denied sources even when the connection was established before the deny rule was loaded. kLayoutVersion bumped to 2 (NetBlockStats grew from 48→56 bytes).
  • Phase 0 safety fixes — network hooks now fail-open on parse errors (matching file hooks), hardcoded kernel constants replaced with CO-RE enum reads, exec-identity verification is container-compatible via EXEC_IDENTITY_FLAG_ALLOW_OVERLAYFS / SKIP_VERITY, attach_prog() deduplicated, and an atomic policy_generation commit marker prevents hooks from enforcing half-written rulesets.

Phase 2 — Telemetry & Cryptographic Integrity

  • Deep process lineage — BPF walks task_struct->real_parent up to 8 ancestors (bounded #pragma unroll loop satisfies the verifier). Ancestor PIDs are included in exec events for forensic correlation. dead_processes LRU map retains metadata about recently-exited processes for post-mortem inspection.
  • UID-to-username identity resolution — forensic block events now include thread-safe getpwuid_r/getgrgid_r resolved username/groupname fields, accelerating SOC triage.
  • IMA-backed exec trust (kernel 6.1+) — new handle_bprm_ima_check LSM program calls bpf_ima_file_hash() and looks the SHA-256 up in the trusted_exec_hash map (16,384 entries). Autoload gated on KernelFeatures.bpf_ima_helpers; opt-in via EXEC_IDENTITY_FLAG_USE_IMA_HASH. Provides cryptographic FIM without requiring fs-verity setup.

Phase 3 — DevSecOps Orchestration

  • Validating admission webhook — operator-side webhook (--enable-webhook) rejects malformed AegisPolicy/AegisClusterPolicy specs before they reach the daemon. Ships with a ValidatingWebhookConfiguration manifest and Service definition.
  • Merged policy reconciler — watches all AegisPolicy and AegisClusterPolicy CRDs and produces a single merged aegis-merged-policy ConfigMap for the DaemonSet. Uses most-restrictive-wins semantics: any policy in enforce forces the merged mode to enforce.
  • Selector-based filteringPolicySelector.matchNamespaces and matchLabels filter which policies apply, enabling gradual rollouts.
  • Structured policy status conditionsAegisPolicyStatus now exposes a standard []metav1.Condition slice with the canonical types Ready, PolicyValid, EnforceCapable, and Degraded, plus stable reason constants (PolicyApplied, TranslationFailed, ConfigMapWriteFailed, BPFLSMUnavailable, …). Dashboards and CI can now match on condition types instead of parsing the legacy Phase/Message strings (which are kept for back-compat). New printer columns Ready and EnforceCapable show on kubectl get ap/kubectl get acp. EnforceCapable is deliberately reported as Unknown (not False) until per-node posture is observed — surfacing uncertainty rather than guessing.

Multi-tenant cgroup-scoped policy

  • deny_cgroup_inode / deny_cgroup_ipv4 / deny_cgroup_port BPF maps allow per-workload deny rules — the same binary or endpoint can be allowed for one cgroup and denied for another.
  • New CLI commands, v6 policy format, and capability-report surface for these maps.
  • deadman fail-static mode preserves enforcement under agent downtime.

Enhanced rule engine

  • Rich condition types: CommExact, CommPrefix, PathGlob, PathPrefix, PortEquals, AncestorComm, CgroupPath with AND composition.
  • glob matching via fnmatch(3) with FNM_PATHNAME.
  • MITRE ATT&CK technique tagging propagated in match output.
  • RuleAction enum: Alert, Block, Kill.
  • Backward-compatible with legacy match_comm/match_path format.
  • 16 new unit tests covering conditions, composition, MITRE tags, JSON loading, lifecycle, and struct-layout assertions.

Honesty pass on the competitive performance story

  • docs/PERFORMANCE_COMPARISON.md rewritten. Removed all estimated peer-tool µs/MB tables (they were copied from third-party blog posts and were never measured on the same hardware as AegisBPF). Replaced with a verifiable-vs-architectural split, real build/aegisbpf_bench deny-map-lookup numbers (flat 4.2 ns from 100 → 10 000 entries, evidence for the O(1) policy-evaluation claim), and an explicit "What is not claimed" section covering the things the repo cannot prove.
  • scripts/compare_runtime_security.sh (new). Head-to-head comparison driver that runs the same perf_open_bench.sh workload under each agent (none, aegisbpf, falco, tetragon, tracee, kubearmor) in isolation, refuses to start if a peer agent is already alive, skips missing agents cleanly, and emits a single results.md with delta vs none.
  • docs/COMPETITIVE_BENCH_METHODOLOGY.md (new). Specifies reproducibility constraints, the open_close workload definition, per-agent baseline configurations, how to interpret the result table, and a "hall of shame" of forbidden moves (cross-run/cross-host number copying, hand-added rows, quoting upstream blog numbers, tuning one agent but not the others).
  • SECURITY.md drift fix. Refreshed the supported-versions table, removed two stale Known Limitations the codebase has since fixed (no live policy reload — superseded by atomic shadow-map swap; coverage limited to connect/bind — superseded by six socket hooks), and added two real ones (no third-party security review, no head-to-head performance evidence on identical hardware).

Breaking Changes

  • kLayoutVersion = 2NetBlockStats grew from 48 to 56 bytes to accommodate recvmsg_blocks. Userspace and BPF must agree; deploy the new daemon and BPF object together.
  • Policy format v6 — new [deny_cgroup_*] sections for cgroup-scoped rules (optional; v5 policies remain valid).
  • Capability contract 1.6.0capabilities.json now includes bpf_ima_helpers, overlay_copy_up_propagation, cgroup_scoped_deny, policy_generation, and deadman_fail_static fields.
  • Operator CRD schema additionsAegisPolicyStatus gains the conditions array (additive, no removals). Existing controllers that read phase/message continue to work; new automation should prefer conditions[?(@.type=="Ready")].status.

New BPF Maps

| Map | Purpose |
|-----|---------|
| trusted_exec_hash | SHA-256 hashes of trusted binaries for IMA hash verification |
| deny_cgroup_inode | Per-cgroup file inode deny rules |
| deny_cgroup_ipv4 | Per-cgroup IPv4 destination deny rules |
| deny_cgroup_port | Per-cgroup port deny rules with protocol/direction filtering |
| dead_processes | LRU cache of recently-exited processes for post-mortem forensics |
| policy_generation | Atomic policy commit marker preventing half-written ruleset enforcement |

All maps are documented in docs/BPF_MAP_SCHEMA.md.

CI & Quality

  • All 39 CI checks green on the underlying PR commit: build (x86_64/ARM64), test, lint, clang-tidy, cppcheck, sanitizers (asan/tsan/ubsan), coverage, kernel-bpf-test, veristat, BPF compiler matrix (clang-15/16/17/18), release-readiness, semgrep, CodeQL, gitleaks, SBOM, smoke/parser/fuzz suites, and every contract/posture gate.
  • Schema contract: 37 BPF maps documented in docs/BPF_MAP_SCHEMA.md (all in sync with bpf/aegis_common.h).
  • bpf_map_schema_contract, guarantees-contract, capability-contract, helm-posture-contract, k8s-rollout-contract, label-contract, ops-observability-contract, and required-checks-contract all pass.
  • Operator: go build ./..., go vet ./..., go test ./... -count=1, and gofmt -l . all clean after the conditions work, with 5 new unit tests pinning the condition-helper contract.

Upgrade Notes

  1. Audit mode first. Run aegisbpf run --audit for at least one week before enabling enforcement on the new hook surface (recvmsg, inode_copy_up, IMA).
  2. Kernel 6.1+ for IMA hash. EXEC_IDENTITY_FLAG_USE_IMA_HASH is a no-op on older kernels; bpf_ima_helpers in capabilities.json advertises availability.
  3. Cgroup-scoped rules are additive. Global deny_* maps still apply; cgroup-scoped maps narrow or extend per workload.
  4. Layout version mismatch is fatal. The daemon refuses to start if the BPF object and userspace disagree on kLayoutVersion.
  5. Validating webhook is opt-in. Start the operator with --enable-webhook and apply manifests/validating-webhook.yaml before enabling.
  6. Operator condition consumers. If you have alerting that scrapes .status.phase, it still works. New alerts should prefer .status.conditions[?(@.type=="Ready")].status == "True" and .status.conditions[?(@.type=="EnforceCapable")].statusEnforceCapable=Unknown is the expected initial state until per-node posture observation lands.
  7. No competitive perf claims without the script. Any "AegisBPF is N× faster than $tool" statement must be backed by a results.md produced by scripts/compare_runtime_security.sh on the same host in the same run. See docs/COMPETITIVE_BENCH_METHODOLOGY.md.

Docs

  • README.md — refreshed Features list, architecture diagrams, Claim Taxonomy, Data Flow Diagram, and metrics table
  • docs/BPF_MAP_SCHEMA.md — new "Cgroup-Scoped Deny Rules" section + trusted_exec_hash entry
  • docs/PERFORMANCE_COMPARISON.md — honesty pass; estimated peer-tool numbers removed, replaced with verifiable architecture/hook coverage and real aegisbpf_bench numbers
  • docs/COMPETITIVE_BENCH_METHODOLOGY.md (new) — head-to-head comparison rules
  • docs/CHANGELOG.md — full commit history
  • SECURITY.md — version table refreshed, stale Known Limitations replaced with current ones

🤖 Release notes prepared with Claude Code

Breaking Changes

  • `kLayoutVersion` = 2 — `NetBlockStats` grew from 48 to 56 bytes; daemon and BPF object must agree on the new layout.
  • Policy format v6 introduces `[deny_cgroup_*]` sections for cgroup‑scoped rules (optional but required for multi‑tenant scenarios).

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track ErenAri/Aegis-BPF

Get notified when new releases ship.

Sign up free

About ErenAri/Aegis-BPF

All releases →

Related context

Beta — feedback welcome: [email protected]