This release adds 5 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
Summary
AI summaryUpdates New features, Also in this release, and v0.2.1 across a mixed release.
Full changelog
New features
- Hessian-based sensitivity profiling for per-layer bit allocation
--bpw 2.5auto-allocates 2/3bpw per layer to hit target average--min-bpw 2 --max-bpw 4enables full {2,3,4} bpw range per layer- Greedy marginal-gain optimizer considers all upgrade jumps (2→3, 2→4, 3→4)
- New module:
glq/sensitivity.pywithallocate_bpw()+print_allocation_summary()
Results (SmolLM3-3B, WikiText-2, 128 nsamples, L40S)
| Model | Eff. BPW | Perplexity | vs bf16 |
|-------|----------|------------|---------|
| bf16 | 16.00 | 7.04 | 1.00x |
| GLQ 4bpw | 4.00 | 7.19 | 1.02x |
| GLQ 3.5bpw mixed | 3.50 | 7.20 | 1.02x |
| GLQ 3bpw | 3.00 | 7.64 | 1.09x |
| GLQ 3bpw mixed (2+4) | 3.00 | 7.65 | 1.09x |
| GLQ 2.5bpw mixed | 2.50 | 8.08 | 1.15x |
| GLQ 2bpw | 2.00 | 9.61 | 1.36x |
GLQ 3.5bpw mixed matches uniform 4bpw quality at 10% less storage. GLQ 3bpw mixed (2+4) matches uniform 3bpw at 20% less storage.
5-task lm-eval accuracy (SmolLM3-3B)
| Method | Avg | % of bf16 |
|--------|-----|-----------|
| bf16 | 0.709 | 100% |
| GLQ 4bpw | 0.699 | 98.6% |
| GLQ 2bpw | 0.623 | 87.9% |
Also in this release
- Fix KV cache bug (v0.2.1): 0.6 → 14.0 tok/s decode
- Remove B from TC kernel autotune keys (v0.2.1)
- Fair perplexity re-measurement on L40S with 128 nsamples
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Glq
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]