This release adds 2 notable features for engineering teams evaluating rollout.
✓ No known CVEs patched in this version
Topics
Summary
AI summaryBlock‑diagonal decode now beats legacy power‑of‑2 by ~10 % on Blackwell RTX PRO 6000.
Full changelog
Highlights
Block-diagonal GLQ decode is now faster than the legacy power-of-2 fused path. On Blackwell RTX PRO 6000 with SmolLM2-135M 4bpw:
| Path | v0.2.9 | v0.2.10 |
|------|--------|---------|
| Block-diag eager | 51.1 tok/s | 53.6 tok/s |
| Block-diag + CUDA graph | 121.6 tok/s | 136.3 tok/s |
| Legacy pow2 + CUDA graph | 124.3 tok/s | 124.3 tok/s |
Block-diag graph now beats pow2 graph by ~10% — same butterfly work, 3 launches total, exact in_features (no padding).
Features
- Phase A — CUDA-graph-safe forward for block-diagonal
E8RHTLinear. Eager_blocks_n/m_tensorconstruction in__init__, cached empty placeholders, explicitdevice="cpu"pin so HF'sinit_empty_weightsmeta-default-device doesn't silently promote bookkeeping tensors to meta. - Phase B — Fused multi-block FHT kernel (
glq_{input,output}_rht_multiblock_kernel). CollapsesNper-sub-block launches into one.gridDim.y = num_blocks,blockIdx.yselects the sub-block via packedint4device metadata. Dispatches viamax_bs ≤ 8192gate; legacy per-block loop retained for larger blocks.
Fixes
fast_hadamard_transformfalls back to the PyTorch implementation when the input is on CPU (previously raised even with the CUDA pkg installed)._process_model_before_weight_loadingno longer crashes on models without.config(e.g. barenn.Sequential).
Testing
- 30 new CUDA fast-path tests for block-diagonal
E8RHTLinearcovering multiblock↔legacy equivalence, B=1 matvec vs B≥2 TC consistency, CUDA graph bit-exactness, and large-block fallback. - 8 stale tests updated for block-diag-default and nsamples=128-default semantics.
- Full suite: 260 passed, 3 skipped, 0 failed.
Install
pip install glq==0.2.10
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Glq
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]