This release includes breaking changes for platform teams planning a safe upgrade.
Published 28d
Virtualization
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
agent-native
agentic-ai
cloud-cost-efficiency
dev-environment
lxc
lxc-container
+2 more
multi-tenant
ssh
Affected surfaces
auth
breaking_upgrade
Summary
AI summaryMulti-pool architecture enables one sentinel to front multiple Containarium clusters with SNI routing.
Full changelog
Highlights
- Multi-pool architecture — One sentinel can now front multiple independent Containarium clusters, each with its own primary VM, peers, core stack, and subdomain. SNI-based routing on the sentinel transparently dispatches inbound TLS to the right pool. See docs/MULTI-POOL.md.
- GPU passthrough by PCI address — Containers are now pinned by stable PCI ID instead of DRM card minor index, surviving kernel-upgrade renumbering (which broke
fts-5900xon6.8.0-110→6.8.0-111). - Postgres restart policy on auto-detect — Closes a 12-day silent-outage path on prod where an OOM kill of
containarium-core-postgresleft it down indefinitely (Grafana went dark). - Lab pool SSH access — Install script now provisions the
containarium-shellwrapper +/etc/motd; daemon stops writing~/.hushlogin. SSH into containers on tunneled-primary pools (e.g. lab) now works end-to-end.
Added
- Multi-pool architecture (PR #97 — slices 1-8):
- Pool tag on peers (
--pool=<name>propagates through tunnel handshake toTunnelSpot/Backend;GET /sentinel/peers?pool=<name>filters). - Pool-scoped peer discovery (
PeerPool.discover()appends?pool=). - Primary self-registration (
POST /sentinel/primariesat startup, 30s heartbeat,DELETEon shutdown; sentinel evicts after 90s missed heartbeats). - SNI-based routing on the sentinel (peeks ClientHello, looks up primary in registry, falls back to legacy single-backend on no-SNI / unregistered hostname).
- Hostname aliases for app domains (
--public-aliases foo.example,bar.example). - Primary registration via tunnel handshake (
containarium tunnel --public-hostname=… --public-aliases=… --public-port=443) — lets a primary behind NAT/Tailscale register itself without direct HTTP access to the sentinel. - Token-bound pool authorization (
--tunnel-token-policy <token>=<pool1>,<pool2>, repeatable;*= wildcard; legacy--tunnel-tokenkeeps wildcard semantics). - SNI router uses yamux for tunneled primaries (avoids loopback-alias listener conflicts on the sentinel).
- Pool tag on peers (
Fixed
- Tunnel handshake over-read corrupts yamux —
json.NewDecoderover-read swallowed bytes that arrived in the same TCP packet as the JSON handshake (notably the yamux SYN). Fix: line-delimited JSON read, leaving subsequent bytes for yamux. Latent since the original tunnel implementation; surfaced under load by the slice 8 deploy. - Tunnel-promoted primaries decay after 90s TTL —
PrimaryRegistry.All()evicted entries withBackendIDset even though their lifetime is tied to the yamux session. Fix: skip TTL eviction whenBackendID != "". - Lab pool bring-up — 5 corner cases caught while standing up the first tunneled primary:
- Subnet drift between
--network-subnetflag and actualincusbr0(Incus'EnsureNetworkis idempotent — won't change a pre-existing bridge's subnet). Daemon now queriesGetNetworkSubnet("incusbr0")after init and uses that as authoritative. - Port forwarder missing OUTPUT-chain DNAT (PREROUTING alone doesn't catch local-origin packets that tunneled primaries generate when forwarding to
127.0.0.1:443). route_localnet=1not enabled (kernel default refuses to route127.0.0.0/8out a non-loopback interface). Now set at runtime + persisted via/etc/sysctl.d/99-containarium-route-localnet.conf.- Caddy TLS app missing on first install (
apps.tlsisnullon a fresh Caddy → PATCH returns 400).ProvisionTLSnow callsensureTLSAppfirst. - Port forwarder ran before Caddy spawned. Re-run
PortForwarder.SetupPortForwardingafterEnsureCaddysucceeds.
- Subnet drift between
- Postgres restart policy missing on auto-detected containers (
internal/server/dual_server.go) — auto-detect path skippedensurePostgresRestartPolicy(). Re-applied; idempotent. - GPU device passthrough breaks across kernel upgrades (
internal/incus/client.go,internal/container/manager.go) — newClient.ResolveGPUInputToPCI()resolves--gpu Ninto a stable PCI address at container creation time. Existing containers withid-based config aren't auto-migrated; manual fix:incus config device set <name> gpu pci=<addr>. - Lab pool SSH lands at host nologin instead of inside the container (
scripts/install-lab-phase-b.sh,internal/container/jump_server.go) — install script now installs/usr/local/bin/containarium-shelland writes/etc/motd; daemon stops writing~/.hushlogin. Existing per-container host users keep their stale.hushloginuntil manually removed.
Upgrade notes
- Existing GPU containers with
gpu: { id: "0" }config aren't auto-migrated. After upgrade run:sudo incus config device set <container> gpu pci=$(sudo containarium gpu list | awk '/<vendor>/ {print $1}')(or pick the PCI address manually fromlspci -nn | grep -i nvidia). - Existing per-container host users on backends still have
~/.hushlogin. To get the host MOTD back:sudo rm /home/<user>/.hushlogin. - Lab-style pools standing up on this release will auto-install
containarium-shell+/etc/motdviascripts/install-lab-phase-b.sh. No action needed on existing pools that already have the wrapper.
Known follow-ups
## [0.15.0]heading was lost fromCHANGELOG.mdbetween the v0.15.0 bump commit and HEAD; some v0.15.0 content also got duplicated under[0.16.0]. Will be cleaned up in a follow-up.- See
docs/MULTI-POOL.md"What's still ahead" for: pool-namespaced SSH usernames (silent collision fix), sshpiper restart not refreshing state, defensive nologin filter on/authorized-keys.
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Containarium
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]