Glq

Model Serving & MLOps

Post‑training weight quantization for large language models using E8 lattice codebooks to achieve 2–8 bits‑per‑weight with quality comparable to state‑of‑the‑art methods.

Track releases GitHub

Python Latest v0.7.3 · 1d ago Security brief →

Features

Encodes groups of 8 weights as a 16‑bit index into an E8 lattice codebook
Uses Randomized Hadamard Transform for near‑optimal Euclidean nearest‑neighbour search
Provides fused CUDA kernel that performs matrix multiplication directly on compressed indices
Supports bit‑widths from 2 to 8 bpw (including fractional values) and mixed‑precision allocation
Offers pre‑built Docker image with PyTorch, vLLM, transformers and GPU support

Recent releases

View all 30 releases →

No immediate action

v0.7.3 Bug fix 1d

Trellis checkpoint fix

Open

No immediate action

v0.7.2 Bug fix 1d

Trellis codebook fix

Open

No immediate action

v0.7.1 Performance 1d

Decode performance boost

Open

Review required

v0.7.0 Breaking risk 1d

Breaking upgrade

3INST trellis decode + OOB fix

Open

No immediate action

v0.6.9 Mixed 14d

vLLM 0.25 support + profiling fix

Open

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Releases

View all →

Releases per month

Cadence 2.0 / wk

Last release 2d

Tracked 30

Security

Full profile →

Security score 6.5/10

OpenSSF —

Open CVEs 0

Active maintainer

Community

GitHub stars 5

Forks 1

Contributors 90d 1

Open issues 2

Open PRs 3

Stars/wk velocity 0.0

About

Stars

Forks

Languages

Python Cuda C++

View on GitHub

Install & Platforms

Install via

pip

Platforms

linux macos windows arm64

Similar tools

UQLM

Pgmig

Agmsg

Ratel

Dbtrail

About

Stars

Forks

Languages

Python Cuda C++

View on GitHub

Install & Platforms

Install via

pip

Platforms

linux macos windows arm64

Similar tools

UQLM

Pgmig

Agmsg

Ratel

Dbtrail