noonghunna/club-3090

Model Serving & MLOps

A collection of ready‑to‑run configurations and tooling for serving large language models locally on one or two RTX 3090 GPUs, supporting multiple inference engines (vLLM, llama.cpp, ik_llama) and model‑agnostic workflows.

Track releases GitHub

Python Latest v0.8.7 · 3h ago Security brief →

Features

Multi‑engine support: vLLM, llama.cpp, and ik_llama with unified Docker Compose configs
Model‑agnostic design – curated configs for Qwen3.6‑27B plus a universal `pull` tool to add any HF safetensors model
Dedicated guides for single‑card (max context, robustness) and dual‑card (high throughput) setups on RTX 3090(s)
Built‑in benchmarking scripts (`bench.sh`) and sanity‑check utilities (`verify-full.sh`)
Extensive documentation covering hardware tips, quantization options, and inference engine comparison

Recent releases

View all 28 releases →

Config change

v0.8.7 Breaking risk 3h

Breaking upgrade Dependencies

beellama DFlash, Qwen3.6‑35B‑A3B prod, vLLM v0.22.0, switch.sh overhaul

Open

v0.8.6 Breaking risk 8d

⚠ Upgrade required

`launch.sh` and `switch.sh` now derive launcher tables from the registry (single source of truth) and accept `/default` variants.
Complete repository‑wide migration documentation for the new compose path format.

Breaking changes

Compose paths moved to `models/**compose**/*.yml`; previous raw paths (e.g., `dual/turbo.yml`, `docker-compose.yml`) no longer resolve. Use `--variant` keys such as `bash scripts/launch.sh vllm/default`, `vllm/dual`, `ik-llama/iq4ks-mtp` or the new `/` paths directly.

Notable features

Added ik-llama PRISM-PRO-DQ and APEX-MTP presets conforming to the new / layout

Full changelog

⚠️ Breaking — compose paths moved. Composes now live at models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml (e.g. dual/autoround-int4/fp8-mtp.yml). Raw pre-v0.8.6 paths (dual/turbo.yml, docker-compose.yml, …) no longer resolve. Use the --variant keys — bash scripts/launch.sh vllm/default (topology-autodetect), vllm/dual, ik-llama/iq4ks-mtp, … — or the new <quant>/ paths directly. launch.sh/switch.sh now derive from the registry (single source of truth) and accept <engine>/default + <engine>/<topology>/default.

v0.8.6 — 2026-05-26

✨ Features

feat(ik-llama): PRISM-PRO-DQ + APEX-MTP presets, conformed to the / layout (458c473)

🐛 Bug fixes

fix(profiles): cover the new ik-llama PRISM/APEX presets in the compat catalog (aa2a965)
fix: post-PR-A compose-path fixes for gpu-mode.sh + 2 patch READMEs (b23846e)
fix(registry+bench): sync vision defaults to the 2026-05-25 re-tune (#438) (b116750)
fix(vision): re-tune single-card vision defaults to measured-safe (1M-px + 160K/150K) (c9b7dd3)
fix(vllm/dual): pin to stable v0.21.0, drop all source overlays (#407 pin-drift) (cf1f14f)

📝 Documentation

docs+scripts: finish / path migration across full repo sweep (eaa7a8c)
docs(switch): correct ik-llama/iq4ks-mtp usage comment 262K -> 200K (2135230)

🧹 Maintenance

refactor(launch): derive launcher tables from the registry + /default resolver (a0520e2)
refactor(compose): insert layer + make the registry the single source of truth (9821c94)

[Pin: git checkout v0.8.6] · Full diff

View release on GitHub

No immediate action

v0.8.5 New feature 10d

kv-calc breakdown, ik-llama ctx increase, docs updates

Open

No immediate action

v0.8.4 Mixed 11d

Features + bug fixes

Open

No immediate action

v0.8.2 New feature 16d

Fit check, fixable pulls, more models, NVLink, docs

Open

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

Releases

View all →

Releases per week

May 9

May 16

May 23

May 30

Cadence 4.0 / wk

Last release 0d

Churn +1442 / −26858 lines · 205 files · 86 commits

Tracked 28

Security

Full profile →

Security score 6.5/10

OpenSSF —

Open CVEs 0

Active maintainer

Community

GitHub stars 1,232

Forks 66

Open issues 13

Open PRs 4

Stars/wk velocity 0.0

About

Stars

1,232

Forks

Languages

Python Shell Jinja

View on GitHub

Install & Platforms

Platforms

linux macos windows

About

Stars

1,232

Forks

Languages

Python Shell Jinja

View on GitHub

Install & Platforms

Platforms

linux macos windows