Skip to content

Release history

Glq releases

All releases

24 shown

No immediate action
v0.5.1 Breaking risk

Default inline-dequant E8 KV

No immediate action
v0.5.0 New feature

Inline-dequant E8 KV cache

No immediate action
v0.3.5 New feature

Auto‑PIECEWISE downgrade

No immediate action
v0.3.4 New feature

CUDA-graph capture size tuning

No immediate action
v0.3.3 Bug fix

E8 KV regression fix + GLQ ops

No immediate action
v0.3.2 Breaking risk

Faster decoding without flags

No immediate action
v0.3.1 Bug fix

GLQShardedParameter duplicate allocation fix

No immediate action
v0.2.13 Performance

Throughput +19 %

No immediate action
v0.2.12 New feature

HF integration + fused MoE + torch pin

No immediate action
v0.2.11 New feature

CUDA graph buckets + N-stage matmul

No immediate action
v0.2.10 New feature

Decode speed boost

No immediate action
v0.2.9 New feature

Block‑diagonal FHT + N‑stage RVQ

No immediate action
v0.2.8 Mixed

Kernel + Inference + Quantization

No immediate action
v0.2.7 New feature

CUDA Graph + INT8 KV cache

No immediate action
v0.2.6 New feature

Inline PTX performance boost

No immediate action
v0.2.5 Mixed

CUDA C kernel speedups + deadlock fix

No immediate action
v0.2.2 New feature

Sensitivity profiling + bit‑allocation

No immediate action
v0.1.6 New feature

Tiled Triton kernel speedup

No immediate action
v0.1.5 Bug fix

Quantization fix

No immediate action
v0.1.4 New feature

CPU offloading for 7B+ quantization

No immediate action
v0.1.3 Feature

Triton kernel + benchmark script

No immediate action
v0.1.2 Feature

Triton fused dequant+matmul

No immediate action
v0.1.1 Maintenance

Routine maintenance and dependency updates.

No immediate action
v0.1.0 Maintenance

Routine maintenance and dependency updates.

Beta — feedback welcome: [email protected]