This release adds 3 notable features for engineering teams evaluating rollout.
Published 2mo
Model Serving & MLOps
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
inference
llm
model-compression
pytorch
quantization
Summary
AI summaryAdd Triton fused Walsh‑Hadamard kernel for 5x faster inference and a GPU benchmark script.
Full changelog
- Add Triton fused Walsh-Hadamard kernel for 5x faster inference (1.9 → 9.9 tok/s)
- Cache Wscale/inv_resid_scale as Python floats to eliminate GPU→CPU sync per forward pass
- Add GPU benchmark script (GLQ vs AWQ vs bf16): perplexity, tokens/sec, GPU memory, disk size
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Glq
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]