Skip to content

Glq

v0.2.9 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 1mo Model Serving & MLOps
✓ No known CVEs patched
Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

inference llm model-compression pytorch quantization

Summary

AI summary

Block‑diagonal FHT eliminates padding overhead and N‑stage RVQ enables true 2‑8 bpw quantization.

Full changelog

Features

  • Block-diagonal FHT: eliminates power-of-2 padding overhead (6.8 → 4.0 effective bpw for Nemotron-30B)
  • N-stage RVQ: true 2-8 bpw quantization via multi-stage codebooks
  • CUDA kernel support for block-diagonal (col_offset parameter)
  • Default nsamples=128 with warning if <64

Quality (SmolLM2-135M lm-eval 5-task)

| bpw | Stages | % of bf16 |
|-----|--------|-----------|
| 4 | 2 | 97.0% |
| 5 | 3 | 99.3% |
| 6 | 3 | 100.2% |
| 8 | 4 | 100.5% |

New Model

Compatibility

  • Fully backward compatible: existing power-of-2 models load and run unchanged
  • Requires transformers >= 5.0 for small models

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Glq

Get notified when new releases ship.

Sign up free

Related context

Beta — feedback welcome: [email protected]