Skip to content

noonghunna/club-3090

Model Serving & MLOps

A collection of ready‑to‑run configurations and tooling for serving large language models locally on one or two RTX 3090 GPUs, supporting multiple inference engines (vLLM, llama.cpp, ik_llama) and model‑agnostic workflows.

Python Latest v0.8.7 · 3h ago Security brief →

Features

  • Multi‑engine support: vLLM, llama.cpp, and ik_llama with unified Docker Compose configs
  • Model‑agnostic design – curated configs for Qwen3.6‑27B plus a universal `pull` tool to add any HF safetensors model
  • Dedicated guides for single‑card (max context, robustness) and dual‑card (high throughput) setups on RTX 3090(s)
  • Built‑in benchmarking scripts (`bench.sh`) and sanity‑check utilities (`verify-full.sh`)
  • Extensive documentation covering hardware tips, quantization options, and inference engine comparison

Recent releases

View all 28 releases →
Config change
v0.8.7 Breaking risk
Breaking upgrade Dependencies

beellama DFlash, Qwen3.6‑35B‑A3B prod, vLLM v0.22.0, switch.sh overhaul

v0.8.6 Breaking risk
⚠ Upgrade required
  • `launch.sh` and `switch.sh` now derive launcher tables from the registry (single source of truth) and accept `/default` variants.
  • Complete repository‑wide migration documentation for the new compose path format.
Breaking changes
  • Compose paths moved to `models/**compose**/*.yml`; previous raw paths (e.g., `dual/turbo.yml`, `docker-compose.yml`) no longer resolve. Use `--variant` keys such as `bash scripts/launch.sh vllm/default`, `vllm/dual`, `ik-llama/iq4ks-mtp` or the new `/` paths directly.
Notable features
  • Added ik-llama PRISM-PRO-DQ and APEX-MTP presets conforming to the new / layout
Full changelog

⚠️ Breaking — compose paths moved. Composes now live at models/<model>/<engine>/compose/<topology>/<quant>/<serving>.yml (e.g. dual/autoround-int4/fp8-mtp.yml). Raw pre-v0.8.6 paths (dual/turbo.yml, docker-compose.yml, …) no longer resolve. Use the --variant keysbash scripts/launch.sh vllm/default (topology-autodetect), vllm/dual, ik-llama/iq4ks-mtp, … — or the new <quant>/ paths directly. launch.sh/switch.sh now derive from the registry (single source of truth) and accept <engine>/default + <engine>/<topology>/default.


v0.8.6 — 2026-05-26

✨ Features

  • feat(ik-llama): PRISM-PRO-DQ + APEX-MTP presets, conformed to the / layout (458c473)

🐛 Bug fixes

  • fix(profiles): cover the new ik-llama PRISM/APEX presets in the compat catalog (aa2a965)
  • fix: post-PR-A compose-path fixes for gpu-mode.sh + 2 patch READMEs (b23846e)
  • fix(registry+bench): sync vision defaults to the 2026-05-25 re-tune (#438) (b116750)
  • fix(vision): re-tune single-card vision defaults to measured-safe (1M-px + 160K/150K) (c9b7dd3)
  • fix(vllm/dual): pin to stable v0.21.0, drop all source overlays (#407 pin-drift) (cf1f14f)

📝 Documentation

  • docs+scripts: finish / path migration across full repo sweep (eaa7a8c)
  • docs(switch): correct ik-llama/iq4ks-mtp usage comment 262K -> 200K (2135230)

🧹 Maintenance

  • refactor(launch): derive launcher tables from the registry + /default resolver (a0520e2)
  • refactor(compose): insert layer + make the registry the single source of truth (9821c94)

[Pin: git checkout v0.8.6] · Full diff

No immediate action
v0.8.5 New feature

kv-calc breakdown, ik-llama ctx increase, docs updates

No immediate action
v0.8.4 Mixed

Features + bug fixes

No immediate action
v0.8.2 New feature

Fit check, fixable pulls, more models, NVLink, docs

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
1,232
Forks
66
Languages
Python Shell Jinja

Install & Platforms

Platforms
linux macos windows

Beta — feedback welcome: [email protected]