Glq

v0.1.4 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 4mo Model Serving & MLOps

✓ No known CVEs patched

✓ No known CVEs patched in this version

Topics

inference llm model-compression pytorch quantization

Summary

AI summary

CPU offloading for 7B+ model quantization prevents OOM on large models with limited VRAM.

Full changelog

CPU offloading for 7B+ model quantization: Model layers are now loaded to CPU and moved to GPU one at a time during quantization, preventing OOM on large models with 24GB VRAM GPUs.
Mistral-7B-v0.3 benchmark: GLQ 3-bit achieves 4.41 perplexity (1.05x vs bf16 4.20) using only 4.4 GB GPU memory vs 14.5 GB for bf16.
Llama-3.2-3B benchmarks: GLQ 3-bit (6.78 PPL, 1.10x) and 2-bit (8.49 PPL, 1.38x) results added.

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Track Glq

Get notified when new releases ship.

About Glq

Beta — feedback welcome: [email protected]