Glq

v0.2.9 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 3mo Model Serving & MLOps

✓ No known CVEs patched

✓ No known CVEs patched in this version

Topics

inference llm model-compression pytorch quantization

Summary

AI summary

Block‑diagonal FHT eliminates padding overhead and N‑stage RVQ enables true 2‑8 bpw quantization.

Full changelog

Block-diagonal FHT: eliminates power-of-2 padding overhead (6.8 → 4.0 effective bpw for Nemotron-30B)
N-stage RVQ: true 2-8 bpw quantization via multi-stage codebooks
CUDA kernel support for block-diagonal (col_offset parameter)
Default nsamples=128 with warning if <64

| bpw | Stages | % of bf16 |
|-----|--------|-----------|
| 4 | 2 | 97.0% |
| 5 | 3 | 99.3% |
| 6 | 3 | 100.2% |
| 8 | 4 | 100.5% |

xv0y5ncu/SmolLM3-3B-GLQ-6bpw — 99.6% of bf16 at true 6.0 bpw (block-diagonal FHT, zero padding)

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Glq

Get notified when new releases ship.

About Glq