Skip to content
Release history
Glq releases
No immediate action
v0.5.1
Breaking risk
·
Default inline-dequant E8 KV
No immediate action
v0.5.0
New feature
·
Inline-dequant E8 KV cache
No immediate action
v0.3.5
New feature
·
Auto‑PIECEWISE downgrade
No immediate action
v0.3.4
New feature
·
CUDA-graph capture size tuning
No immediate action
v0.3.3
Bug fix
·
E8 KV regression fix + GLQ ops
No immediate action
v0.3.2
Breaking risk
·
Faster decoding without flags
No immediate action
v0.3.1
Bug fix
·
GLQShardedParameter duplicate allocation fix
No immediate action
v0.2.13
Performance
·
Throughput +19 %
No immediate action
v0.2.12
New feature
·
HF integration + fused MoE + torch pin
No immediate action
v0.2.11
New feature
·
CUDA graph buckets + N-stage matmul
No immediate action
v0.2.10
New feature
·
Decode speed boost
No immediate action
v0.2.9
New feature
·
Block‑diagonal FHT + N‑stage RVQ
No immediate action
v0.2.8
Mixed
·
Kernel + Inference + Quantization
No immediate action
v0.2.7
New feature
·
CUDA Graph + INT8 KV cache
No immediate action
v0.2.6
New feature
·
Inline PTX performance boost
No immediate action
v0.2.5
Mixed
·
CUDA C kernel speedups + deadlock fix
No immediate action
v0.2.2
New feature
·
Sensitivity profiling + bit‑allocation
No immediate action
v0.1.6
New feature
·
Tiled Triton kernel speedup
No immediate action
v0.1.5
Bug fix
·
Quantization fix
No immediate action
v0.1.4
New feature
·
CPU offloading for 7B+ quantization
No immediate action
v0.1.3
Feature
·
Triton kernel + benchmark script
No immediate action
v0.1.2
Feature
·
Triton fused dequant+matmul
No immediate action
v0.1.1
Maintenance
·
Routine maintenance and dependency updates.
No immediate action
v0.1.0
Maintenance
·
Routine maintenance and dependency updates.
Search tools, categories, lists, and users
Use ↑↓ to navigate, Enter to open, Esc to close
No results for ""
⌘K to open
↑↓ navigate
⏎ open