Glq

v0.2.2 Feature

This release adds 5 notable features for engineering teams evaluating rollout.

Published 4mo Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

inference llm model-compression pytorch quantization

Summary

AI summary

Updates New features, Also in this release, and v0.2.1 across a mixed release.

Full changelog

New features

Hessian-based sensitivity profiling for per-layer bit allocation
--bpw 2.5 auto-allocates 2/3bpw per layer to hit target average
--min-bpw 2 --max-bpw 4 enables full {2,3,4} bpw range per layer
Greedy marginal-gain optimizer considers all upgrade jumps (2→3, 2→4, 3→4)
New module: glq/sensitivity.py with allocate_bpw() + print_allocation_summary()

Results (SmolLM3-3B, WikiText-2, 128 nsamples, L40S)

| Model | Eff. BPW | Perplexity | vs bf16 |
|-------|----------|------------|---------|
| bf16 | 16.00 | 7.04 | 1.00x |
| GLQ 4bpw | 4.00 | 7.19 | 1.02x |
| GLQ 3.5bpw mixed | 3.50 | 7.20 | 1.02x |
| GLQ 3bpw | 3.00 | 7.64 | 1.09x |
| GLQ 3bpw mixed (2+4) | 3.00 | 7.65 | 1.09x |
| GLQ 2.5bpw mixed | 2.50 | 8.08 | 1.15x |
| GLQ 2bpw | 2.00 | 9.61 | 1.36x |

GLQ 3.5bpw mixed matches uniform 4bpw quality at 10% less storage. GLQ 3bpw mixed (2+4) matches uniform 3bpw at 20% less storage.

5-task lm-eval accuracy (SmolLM3-3B)

| Method | Avg | % of bf16 |
|--------|-----|-----------|
| bf16 | 0.709 | 100% |
| GLQ 4bpw | 0.699 | 98.6% |
| GLQ 2bpw | 0.623 | 87.9% |

Also in this release

Fix KV cache bug (v0.2.1): 0.6 → 14.0 tok/s decode
Remove B from TC kernel autotune keys (v0.2.1)
Fair perplexity re-measurement on L40S with 128 nsamples

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Glq

Get notified when new releases ship.

About Glq

All releases →

Glq

Summary

New features

Results (SmolLM3-3B, WikiText-2, 128 nsamples, L40S)

5-task lm-eval accuracy (SmolLM3-3B)

Also in this release

Related context

Related tools