Skip to content

last9/gpu-telemetry](https:

Monitoring & Metrics

A vendor‑neutral, OpenTelemetry‑based GPU telemetry agent that attributes utilization and health metrics to Kubernetes pods or Slurm jobs for per‑team accounting

Python Latest chart-v0.2.1 · 1mo ago Security brief →

Features

  • Emits OTLP telemetry with built‑in workload attribution (Kubernetes pod/namespace/deployment or Slurm job/user)
  • Supports NVIDIA, AMD MI300X/MI325X and Intel Gaudi GPUs via unified collectors
  • Works as a Helm DaemonSet for Kubernetes or as a pip‑installable systemd service on bare metal
  • Provides pre‑built Grafana dashboards and Prometheus alert rules for fleet monitoring

Recent releases

View all 3 releases →

No releases yet

We'll surface new releases as they're published — check back soon.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
23
Forks
2
Languages
Python Go Shell

Install & Platforms

Install via
pip docker

Beta — feedback welcome: [email protected]