This release adds 1 notable feature for engineering teams evaluating rollout.
Published 2mo
Model Serving & MLOps
✓ No known CVEs patched
✓ No known CVEs patched in this version
Topics
inference
llm
model-compression
pytorch
quantization
Summary
AI summaryAdded Triton fused dequant+matmul inference kernel supporting 3/4 bits per weight.
Full changelog
- Fix state_dict loading for two-stage buffers (Qidxs2, inv_resid_scale)
- Add Triton fused dequant+matmul inference kernel with 3/4bpw support
- Update README with Triton kernel docs, quantization and inference examples
- Update SmolLM2-360M 2bpw perplexity: 18.29 → 17.70
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Share this release
About Glq
All releases →Related context
Related tools
Beta — feedback welcome: [email protected]