Glq

v0.2.5 Feature

This release adds 3 notable features for engineering teams evaluating rollout.

Published 4mo Model Serving & MLOps

View tool

✓ No known CVEs patched

Read the diff → Tool health → What is this tool? →

✓ No known CVEs patched in this version

Topics

inference llm model-compression pytorch quantization

Summary

AI summary

Updates Other Changes, decode, and prefill across a mixed release.

Full changelog

CUDA C Kernels

Dequant split-K matvec (glq/csrc/glq_cuda.cu):

4 rows/warp with __shfl_xor_sync reduction, __launch_bounds__(256,2)
Beats cuBLAS dense fp16 matmul on 2/3 benchmark shapes
2.7-3.0× faster than Triton kernels

| Shape | CUDA C | Triton | cuBLAS |
|-------|--------|--------|--------|
| 3072×3072 | 39μs | 104μs | 47μs |
| 3072×9216 | 51μs | 142μs | 39μs |
| 9216×3072 | 52μs | 158μs | 99μs |

Shared-memory FHT for input/output RHT:

Double-buffered butterfly stages in shared memory
1.6-3.1× faster than Triton global-memory FHT (n_pad ≤ 8192)

Triton Now Optional

CUDA C handles all batch sizes:

B=1: split-K matvec (decode)
B>1: batched matvec (prefill)
Dispatch: CUDA C > Triton > PyTorch fallback

Performance (SmolLM3-3B 3.5bpw, L40S)

| Metric | v0.2.2 (Triton) | v0.2.5 (CUDA C) | Speedup |
|--------|-----------------|------------------|---------|
| Decode (B=1) | 12.8 tok/s | 17.7 tok/s | +38% |
| Prefill (B=16) | — | 59 tok/s | new |
| Generate 128 | 14.0 tok/s | 17.1 tok/s | +22% |

Perplexity unchanged (7.20).

Other Changes

Fix ProcessPoolExecutor fork+CUDA deadlock (mp_context='spawn')
GLQ 3.5bpw mixed lm-eval results: 96.6% of bf16 accuracy
217 tests pass

View diff on GitHub

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Share this release

Share on X Share on Bluesky

Track Glq

Get notified when new releases ship.

About Glq

All releases →

Glq

Summary

CUDA C Kernels

Triton Now Optional

Performance (SmolLM3-3B 3.5bpw, L40S)

Other Changes

Related context

Related tools