Release history
SLayer, a semantic layer maintained by your agent releases
All releases
22 shown
- `RecallHit.match_count: int` is renamed to `RecallHit.score: float` across all API surfaces (MCP, REST, CLI, SlayerClient).
- BM25 ranking for memory recall via new `bm25_rank` module and `rank_bm25.BM25Plus`.
- Two-mode reference semantics locked down: SQL mode accepts arbitrary functions; DSL mode enforces strict identifier resolution with a string‑hygiene allowlist (lower, upper, trim, replace, substr, instr, length, concat).
- MCP `query` tool now accepts inline `source_model` dicts (`str | ModelExtension | SlayerModel`).
Full changelog
SLayer 0.5.1 Release Notes
A maintenance release: six PRs since 0.5.0 tighten the agent-facing surface that the bird-interact-agents benchmark pounded on. The headline items are a BM25 ranker for the memory layer, a cleaned-up two-mode reference model that pins what is and isn't valid in Column.sql / Column.filter / SlayerModel.filters versus DSL formula and filter strings, SQL-mode model filters now properly accepting arbitrary SQL functions plus a small string-hygiene allowlist for DSL-mode query filters (lower, upper, trim, replace, substr, instr, length, concat, and ||), an MCP query tool that finally accepts inline source-model dicts, and clearer diagnostics for two filter-parser edge cases (path-qualified LIKE and subquery-shaped predicates). One small breaking change rides along: RecallHit.match_count: int is renamed to RecallHit.score: float -- no alias.
BM25 ranking for memory recall (DEV-1365, BREAKING)
MemoryService.recall_memories previously ranked candidates by raw entity-overlap count, which trivially favoured memories tagged with large entity sets -- a memory linked to 50 entities would out-overlap a precisely-tagged one of 2 regardless of relevance. The new slayer/memories/ranker.py module exposes bm25_rank(memories, query_entities) over canonical entity sets, using rank_bm25.BM25Plus (now a core dep). IDF / avgdl are computed across the full memory corpus (recall calls storage.list_memories(entities=None) rather than the intersection-filtered form); an explicit set-intersection pre-filter still enforces "must overlap on at least one entity" and BM25 is used purely to rank the eligible set. BM25Plus (rather than BM25Okapi) is chosen specifically because at small corpus sizes Okapi's IDF goes negative for terms appearing in even a moderate fraction of documents, which inverts BM25's length normalisation and reproduces the exact bug DEV-1365 was meant to fix; the rationale is documented in the module header. The empty-about recency fallback is unchanged. The breaking part: RecallHit.match_count: int is replaced by RecallHit.score: float across MCP, REST, CLI, and SlayerClient. Hard rename, no alias -- callers introspecting the field break loudly rather than silently misreading. inspect_model's Learnings section is unchanged (it is a per-model browsing view, not a retrieval query, and still pulls memories via the storage entity-intersection filter in insertion order).
Two-mode reference semantics (DEV-1369, BREAKING)
SLayer has always had two distinct expression layers but the implementation had drifted; this release pins the intended semantics and enforces them. SQL mode (Column.sql, Column.filter, SlayerModel.filters) is sqlglot-parsed free SQL: arbitrary function calls (json_extract, coalesce, nullif, CASE WHEN, ...), dialect operators, and __-delimited join paths (customers__regions.name) are all accepted; DSL constructs (aggregation colon syntax, transform calls, raw OVER (...) in filters) are rejected. DSL mode (ModelMeasure.formula, SlayerQuery.{measures, filters, dimensions, time_dimensions, order, main_time_dimension}) is the Python-AST DSL: bare names strict-resolve at enrichment time against the model's defined Column / ModelMeasure / custom-aggregation / query-level alias set, and raw SQL function calls or unknown names raise actionable errors. The DEV-1336 escape hatch -- auto-promoting a query filter against a Column whose sql contained a window function to a post-aggregation outer WHERE -- is removed; the rank-family transforms (DEV-1353) cover top-N filtering in pure DSL and the new error message points at them or at multi-stage source_queries. Opportunistic consolidation: the new slayer/core/refs.py is now the single source of truth for identifier-shape regexes (AGG_REF_RE, IDENT_OR_PATH_RE, DOTTED_IDENT_REF_RE, IDENTIFIER_RE), the agg-suffix parser (canonical_agg_name, strip_agg_suffix), and the user-input dunder helper -- four prior duplicates in formula.py, dbt/converter.py, engine/enrichment.py, and memories/resolver.py re-export from here. _walk_join_chain collapses the two near-duplicate join walkers in query_engine.py. Migration: stored YAML / SQLite models with aggregation colon syntax or transform calls inside Column.filter / SlayerModel.filters will fail at load (move the predicate to a query-level filter or a ModelMeasure.formula); stored SlayerQuery objects that filter on a windowed Column will raise (switch to filters=["rank(<measure>) <= N"] or factor the window column into a multi-stage source); stored queries that filter on bare names not declared as Column entries on the model (silent pass-through to underlying-table columns) will raise at enrichment (declare the column on the model first). The new docs/concepts/references.md is the single source of truth; CLAUDE.md, docs/concepts/queries.md, and docs/concepts/models.md point at it.
Filters referencing unjoined models now raise at enrichment (DEV-1367)
A query filter of the form <other_model>.<col> <op> <value>, where <other_model> is not in the source model's joins, used to silently render SQL with an unbound table reference in the WHERE clause and fail at execution with a cryptic "no such column" -- the agent received an opaque SQL-runtime error instead of a clear translation-time error. DEV-1369's strict-resolution check fired only on the bare-name branch of resolve_filter_columns; this PR closes the dotted-path gap. The dotted-path branch now raises ValueError at enrichment when the path's head segment doesn't match any ModelJoin.target_model on the source model (error: "Filter '' references model '' but it is not in joins for source model ''. Add it to source_model.joins or rewrite the filter to use a local derived column.") or when the path resolves through joins but the leaf column doesn't exist on the terminal model. The cross-model-measure rerooting CTE path (_build_rerooted_enriched) inherits the outer query's filter list and some of those filters reference models reachable from the outer source but not from the re-rooted source; a new drop_unreachable_filters: bool = False kwarg threads through enrich_query / _enrich onto resolve_filter_columns's drop_if_unresolved parameter. The rerooting call sets it True so unresolved dotted paths drop from the parsed-filter list rather than raising; outer-query callers keep the strict default.
MCP query tool accepts inline source_model (DEV-1372)
The MCP query tool's source_model parameter is now typed str | ModelExtension | SlayerModel, matching SlayerQuery.source_model's native polymorphism. Previously typed str, which forced agents to JSON-encode inline dicts -- SLayer then validated the JSON blob as a model name and rejected it with Invalid model name '{...}': must not contain '/'. Wire-level evidence from a single 15-task households / Haiku 4.5 benchmark run: 15 / 88 query-tool calls (17 %) failed this way. The run-by-name shortcut (engine.execute(str) with model-name strings) is preserved -- it layers model.query_variables -> stage -> runtime precedence that the regular SlayerQuery path doesn't reach -- and is gated on isinstance(source_model, str) so inline values fall through to SlayerQuery.model_validate. The variable-precedence asymmetry that forces this gate is filed separately as DEV-1373.
SQL-mode predicate routing and string-hygiene allowlist (DEV-1378)
Closes two implementation gaps left over from the DEV-1369 reference-semantics work. First, Column.filter and SlayerModel.filters (SQL mode) were authored as raw SQL but re-parsed at enrichment time through the DSL parser, so any model filter containing json_extract, coalesce, CASE WHEN, or any other arbitrary SQL function raised Unknown filter function 'json_extract' at runtime. The new slayer/sql/sql_predicate.py module exposes a dedicated parse_sql_predicate that rejects DSL constructs (colon syntax, transform calls, raw OVER (...)) up front, extracts column-shaped identifiers, and otherwise accepts any SQL function call; every enrichment site that handles an SQL-mode filter (measure_def.filter, the model-filter validation loop, _collect_needed_paths, and _resolve_joins) now routes through this parser. DSL-mode query filters (SlayerQuery.filters) continue through the DSL parser unchanged. Strictness is split per mode: model filters resolve with strict=False, query filters with strict=True. Second, DSL-mode query filters now accept a small lowercase string-hygiene allowlist -- lower, upper, trim, replace, substr, instr, length, concat -- exposed as the STRING_HYGIENE_OPS frozenset in slayer/core/formula.py; the SQL || operator is rewritten to concat(...) by a new _preprocess_concat pass so agents can write lower(name) || '_x' = 'foo_x' in filter strings. Names are lowercase only, matching SLayer's existing transform convention; sqlglot translates per-dialect at SQL-generation time. Also folded in: the SQL generator's predicate parser now wraps user-supplied fragments in SELECT 1 WHERE ... before parsing (_parse_predicate), which dodges a sqlglot trap on SQLite / MySQL where REPLACE at the start of a fragment was parsed as the REPLACE INTO statement keyword rather than the function call. Docs updated in docs/concepts/queries.md; CLAUDE.md and .claude/skills/slayer-query.md reflect the new allowlist.
Filter-parser fixes for path-qualified LIKE and subquery-in-filter (DEV-1376)
Two DSL filter-parser gaps surfaced by the bird-interact-agents benchmark. Path-qualified LIKE / NOT LIKE (<joined_model>.<col> like '...') was rejected with "Unsupported filter syntax" because _preprocess_like's LHS regex excluded .; the four-step preprocess collapses into a single regex (_LIKE_RE) whose LHS accepts dotted paths and whose pattern group captures the closing quote in one shot. Subquery-shaped predicates (housenum in (select ...), not in (select ...), exists (select ...)) used to surface Python's misleading "Perhaps you forgot a comma?" advice and send agents on a nonsense recovery path; a new _SUBQUERY_IN_FILTER_RE early-rejects in parse_filter next to the existing has_window_function check and points at source_queries / Column.sql / joins instead. Narrowly scoped to IN (SELECT...) / NOT IN (SELECT...) / EXISTS (SELECT...); existing UNION SELECT / ; SELECT 1 paths keep their original handling.
Schema versions
SlayerModel remains version 5, SlayerQuery remains version 3, DatasourceConfig remains version 1. No storage migrations were needed for this release.
- Automatic v4 → v5 storage migration runs on load; use `slayer storage migrate-types [--dry-run]` for explicit post‑migration refinement.
- The Jaffle Shop demo now depends on the upstream `jafgen` package (`^0.4.14`) as a core dependency, removing the need for manual git installs.
- DataType enum values are now uppercase sqlglot‑aligned names (TEXT, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP) instead of lowercase strings.
- Unused aggregation pseudo-types (`count`, `sum`, `avg`, `min`, `max`, `last`, `count_distinct`) have been removed from DataType.
- Schema‑drift detection and idempotent re‑ingestion with `engine.validate_models` and `ingest_datasource_idempotent()`.
- Agent memory layer for saving/retrieving free‑form learnings and example queries via MCP, REST, CLI, and async client methods.
Full changelog
SLayer 0.5.0 Release Notes
A feature release: five PRs since 0.4.3 add an agent-memory layer, a schema-drift validation and idempotent re-ingestion subsystem, three new ranking transforms, and type-aware CAST emission. Two breaking changes ride along, both covered by an automatic v4 -> v5 storage migration: DataType enum values are now uppercase sqlglot-aligned names (TEXT, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP), and the unused aggregation pseudo-types (count, sum, avg, min, max, last, count_distinct) have been dropped from DataType.
Schema-drift detection and idempotent re-ingestion (DEV-1356)
A new read-only diagnostic, engine.validate_models(data_source=...), diffs persisted models against live database schemas and returns the minimal list of deletes needed for SQL generation to remain valid (EditModelDelete for individual columns / measures / joins, WholeModelDelete when an entire backing table is gone). The same surface is exposed as the MCP validate_models tool, the REST POST /validate-models endpoint, and the CLI slayer validate-models [--data-source X] [--json]; --force-clean actually applies the deletes for batch repair workflows. The new top-level slayer/engine/schema_drift.py module owns the cascade walking and a two-pass invariant so dropping a column also prunes anything that referenced it. Query-time DBAPI errors that are caused by drift are now wrapped as the new SchemaDriftError (in slayer/core/errors.py) carrying a structured payload instead of opaque driver text. ingest_datasource_idempotent() in slayer/engine/ingestion.py walks the live datasource and only adds to existing sql_table-mode models -- it never overwrites user customizations and silently skips sql-mode and query-backed models. See docs/concepts/schema-drift.md and docs/examples/schema-drift.md.
Agent memory layer: learnings and saved queries (DEV-1357)
A new slayer.memories package (models.py, service.py, resolver.py) lets agents save free-form notes ("learnings") and example queries indexed by the schema entities they reference, then recall them before drafting new queries. Three MCP tools (save_memory, forget_memory, recall_memories), three REST endpoints (POST /memories, DELETE /memories/{memory_id}, POST /memories/recall), three CLI subcommands (slayer memory save|forget|recall), and matching SlayerClient async methods all front the same MemoryService orchestrator. Entity references are canonicalized through slayer.memories.resolver; unresolvable references raise the new EntityResolutionError (HTTP 422), and missing memory IDs raise the new MemoryNotFoundError (HTTP 404). inspect_model now renders a Learnings section listing memories that reference the inspected model. Storage adds a memories table (SQLite) plus an entity-overlap index, and a memories.yaml + counters.yaml pair (YAML backend). See docs/concepts/memories.md and specs/DEV-1357-memories.md.
Type-aware CAST emission and DataType realignment (DEV-1361, BREAKING)
DataType enum values were lowercase strings ("string", "number", "integer", "boolean", "date", "time") plus seven unused aggregation pseudo-types ("count", "sum", ...); they are now uppercase sqlglot-aligned names (TEXT, INT, DOUBLE, BOOLEAN, DATE, TIMESTAMP), and the pseudo-types are gone. Generated SQL now emits explicit CAST(... AS <type>) on column SQL and measure expressions (idempotent against pre-cast input, so CAST(... AS INT) is not double-wrapped) so result tuples match downstream comparators that care about driver-returned types. Auto-ingestion now narrows generic numeric SQL columns to INT vs DOUBLE based on live precision/scale via slayer/storage/type_refinement.py instead of always emitting DOUBLE. The SlayerModel schema bumps to v5: the v4 -> v5 converter in slayer/storage/v5_migration.py is a pure dict transform that rewrites legacy lowercase types and strips the dropped pseudo-types so the field falls through to its default after Pydantic loads; it recurses into source_queries[].source_model inline dicts so multi-stage models migrate cleanly. Migration is automatic on load via the converter chain in slayer/storage/migrations.py. The new slayer storage migrate-types [--dry-run] [--data-source X] CLI exposes the optional post-migration datasource introspection refinement step explicitly for batch workflows. Lenient before-validators on Column.type and ModelMeasure.type still accept legacy lowercase input from external callers without erroring.
Ranking transform family (DEV-1353)
Three new first-class transforms -- percent_rank, dense_rank, ntile -- join the existing rank. All four accept an optional partition_by= kwarg to rank within subsets of rows (a column name, a dotted join path, or a list of either) and require no time dimension; ntile additionally takes a positive-integer n= kwarg. The SQL generator emits native PERCENT_RANK(), DENSE_RANK(), NTILE(n) window functions; positional args beyond the wrapped measure are rejected with an actionable error pointing at the keyword form. See the rank-family entries in slayer/help/topics/04_transforms.md and docs/concepts/formulas.md.
Demo: upstream jafgen is now a core dependency
The Jaffle Shop demo dataset switched from a vendored, extras-only git-URL install to the upstream jafgen package pinned to ^0.4.14 as a core dependency, so pip install motley-slayer alone enables slayer datasources create demo (and the --demo flags on slayer serve / slayer mcp) without any manual git-based install. slayer/demo/jaffle_shop.py was rewritten to drive the upstream CLI rather than ship its own vendored DDL.
Schema versions
SlayerModel is now version 5 (was 4 in 0.4.3). SlayerQuery remains version 3 and DatasourceConfig remains version 1. All older versions are upgraded automatically on load via the converter chain in slayer/storage/migrations.py; saves always emit the current version.
- New `data_source` argument required on `engine.execute`, MCP model tools, HTTP `/models` endpoints (`?data_source=`), and `SlayerClient`. Existing bare‑name calls will raise `AmbiguousModelError` if multiple datasources match.
- Use new datasource‑priority APIs (`set_datasource_priority`, `get_datasource_priority`, CLI `slayer datasources priority`) to resolve ambiguous lookups without passing `data_source` on every call.
- Models are now keyed by `(data_source, name)` instead of bare `name`; schema bumps from v3 to v4 with automatic migration moving YAML files; bare-name lookups that match more than one datasource raise `AmbiguousModelError` (HTTP 409).
- Recursive expansion of derived columns allows any column on the same or joined model—including other derived columns—to be referenced in SQL expressions.
- Auto‑join discovery for filter‑only cross‑table references automatically adds implied joins when a filter uses a derived column that spans a join.
- Multi‑stage query `source_queries` can now declare `joins.target_model` against prior sibling stages, forming a true DAG instead of being limited to final stage joins.
Full changelog
SLayer 0.4.3 Release Notes
A bug-fix and quality release: thirteen PRs since 0.4.2 covering storage, the query planner, SQL generation, MCP introspection, and a large drop in test-suite wall time.
Datasource-scoped model storage
Models are now keyed by (data_source, name) instead of bare name, so two datasources sharing a table name no longer collide silently. The schema bumps from v3 to v4, with an automatic migration that moves YAML files from models/<name>.yaml into models/<data_source>/<name>.yaml and rebuilds the SQLite models table with a composite primary key. A new data_source argument flows through engine.execute, the MCP create_model / edit_model / inspect_model / delete_model tools, the HTTP /models endpoints (?data_source=), and SlayerClient. Bare-name lookups that match more than one datasource raise the new AmbiguousModelError (HTTP 409). New datasource-priority APIs -- set_datasource_priority / get_datasource_priority, GET/PUT /datasources/priority, and the slayer datasources priority CLI -- let you disambiguate without passing data_source on every call.
Recursive expansion of derived columns
A Column.sql may now reference any other column on the same model or on a joined model -- including derived columns whose own sql is itself an expression. Chains like "A.bar / B.foo_normalized" (where foo_normalized is itself derived) used to fail at execute with no such column; expansion now happens at enrichment time and applies uniformly to dimensions, time dimensions, measures, cross-model measures, and filters. Reference cycles raise a clear error.
Auto-join discovery for filter-only cross-table refs
When a filter references a bare-named local derived column whose Column.sql itself crosses a join (e.g., filters=["is_eu = 1"] where is_eu.sql reads customers.region), the planner now walks the column's reference chain, discovers the implied joins, and adds them. Previously the column also had to appear in dimensions to force the join.
Window functions in filters
Filter expressions that resolve to SQL containing OVER (...) no longer crash with near "OVER": syntax error. A model column whose sql contains a window function is auto-promoted to a computed column on the inner SELECT and referenced via an outer post-filter wrap, so it can appear in a query filter naturally. Raw OVER text inside a ModelMeasure.formula or an inline query filter is rejected at construction or parse time with an actionable error pointing at the rank() / first() / last() / lag() / lead() transforms.
Multi-stage query joins form a DAG
Non-final named stages in source_queries can now declare joins.target_model against prior named sibling stages, so a multi-stage query forms a true DAG instead of being limited to "final stage joins anything, inner stages join nothing". Forward references and self-references surface as clear errors instead of the generic Model 'X' not found.
Honor user-supplied measure names on inner stages
{"formula": "amount:sum", "name": "rev"} on an inner SlayerQuery stage now emits the alias rev instead of the canonical amount_sum. Downstream stages that reference the renamed measure no longer error or surface NULLs.
SQLite json_extract() semantics fix
sqlglot's default SQLite generator was emitting JSONExtract as col -> '$.path', which in SQLite returns the JSON-quoted form ('"Owned"' with literal quotes). Equality and CASE-WHEN matches against bare-string literals silently produced all-NULL or 0 -- wrong answers, no error. The generator now rewrites every JSONExtract to JSON_EXTRACT(...) on SQLite while leaving Postgres / DuckDB / MySQL paths and JSONExtractScalar (->>) untouched.
Preserve log10(x) / log2(x) literal form
sqlglot was canonicalizing log10(x) and log2(x) into LOG(10, x) / LOG(2, x), which broke benchmark agents reading back inspect_model.last_sql and triggered runtime errors on dialects without 2-arg LOG(b, x) (Oracle in some configs, embedded SQLite builds without log). The generator now rewrites the canonicalized form back to log10(...) / log2(...) on every dialect that natively supports the alias. A strict-error log2 UDF for SQLite is registered alongside the existing _log10 for parity with Postgres semantics.
__aggN__ placeholder leak fix
When a measure formula wrapped a colon-syntax aggregation reference inside a non-transform SQL call (nullif, coalesce, ...) and combined it with another aggregation in arithmetic, the embedded __aggN__ placeholder leaked all the way to emitted SQL, crashing execution with no such column: __aggN__. The formula AST walker now recurses into non-transform Call args and keywords so the placeholder is registered correctly.
Multi-hop derived-column qualification regression tests
Six regression tests pin the contract that derived Column.sql reached via a multi-hop join path emits correctly qualified SQL across dim refs, cross-model measure aggregations, filters, time dimensions, the verbatim solar-panels reproduction from the original issue, and the diamond-join arm-isolation case. The bug itself was incidentally fixed by the derived-column expansion work above; the tests pin the fix so it cannot silently regress.
meta field surfaced in inspect_model
The meta field on ModelMeasure and Aggregation (added in 0.4.2) round-tripped through storage but never appeared in inspect_model output, so callers concluded the value had been dropped. Markdown columns / measures / aggregations tables now include a meta cell (compact JSON, pruned when no entity in the section uses it); the model header gets a **meta:** bullet; the JSON form emits meta unconditionally on every entity.
Unit suite ~30x faster
_retry_with_backoff was retrying every OperationalError, including deterministic ones like no such table, syntax errors, and permission denied -- adding a 3-second backoff to every such failure. A new _is_transient_db_error filter narrows retries to DisconnectionError plus a known-transient DBAPI signal allowlist (database is locked, deadlock, lost connection, broken pipe, server closed, connection refused/reset). Combined with a deferred dbt.cli.main import (saving ~4s of CLI startup) and session-scope MCP and FastAPI test fixtures, the unit suite drops from ~243s to ~8.3s. --cov is also dropped from the default pytest addopts so coverage is now opt-in via pytest --cov=slayer --cov-report=term-missing.
Integration suite ~85s faster
Module-scope Postgres and DuckDB fixtures plus a cached Jaffle Shop DuckDB shared across tests cut the integration-suite wall time without weakening per-test isolation guarantees.
- Added meta field to ModelMeasure and Aggregation
Full changelog
What's Changed
- Add meta field to ModelMeasure and Aggregation by @ZmeiGorynych in https://github.com/MotleyAI/slayer/pull/83
Full Changelog: https://github.com/MotleyAI/slayer/compare/v0.4.1...v0.4.2
- Validates `percentile(p=…)` argument as numeric in [0,1] to prevent injection attacks such as `pg_sleep(10)`
- Seven statistical aggregations (stddev_samp, var_pop, corr, covar_samp, etc.) exposed via colon syntax
- Scalar math UDFs (ln, log10, log(b,x), exp, sqrt, pow/power) added to SQLite
Full changelog
SLayer 0.4.1
Bug-fix-and-capability release that brings SQLite to math/stat parity
with Postgres, DuckDB, MySQL, and ClickHouse, and exposes seven new
statistical aggregations as built-in colon-syntax aggregates. Motivated
by DEV-1317: a
BIRD-Interact KB-translation effort that hit ~30–40 formulas which were
unencodable on bare SQLite for want of basic math/stat operators.
Backwards-compatible. No schema migrations.
What's new
Scalar math UDFs on SQLite
Use these inside Column.sql, ModelMeasure.formula, or
Aggregation.formula. NULL inputs return NULL (matching cross-dialect
SQL semantics). Math-domain errors propagate as
sqlite3.OperationalError — matching Postgres's strict semantics, not
SQLite ≥3.35's silent-NULL log() built-in.
| Name | Args | Notes |
|---|---|---|
| ln(x) | 1 | Natural logarithm |
| log10(x) | 1 | Base-10 logarithm |
| log(B, X) | 2 | Returns log_B(X). Base first, value second — matches Postgres LOG(b, x), sqlglot's emission, and the SQLite ≥3.35 built-in. The UDF registers unconditionally and overrides any built-in so the strict-error policy is uniform across SQLite versions. |
| exp(x) | 1 | e^x |
| sqrt(x) | 1 | Square root |
| pow(x, n) / power(x, n) | 2 | Both spellings registered. Uses math.pow internally — rejects negative-base + non-integer exponent (Python's ** would silently return a complex), and bounds huge exponents to IEEE-754 (Python's ** would build unbounded big-ints). |
Statistical aggregations (colon syntax)
Available as built-in aggregations on every numeric column:
{"formula": "latency:stddev_samp"}
{"formula": "latency:var_pop"}
{"formula": "price:corr(other=quantity)"}
{"formula": "price:covar_samp(other=quantity)"}
| Aggregation | Postgres-matching edge cases |
|---|---|
| stddev_samp / var_samp / covar_samp | NULL when N ≤ 1 |
| stddev_pop / var_pop / covar_pop | NULL at N = 0; 0 at N = 1 |
| corr | NULL when fewer than 2 non-null pairs OR either side has zero variance |
The two-column aggregations (corr, covar_samp, covar_pop) take
the second column as a named other= parameter — same shape as
weighted_avg(weight=…). NULL pairs (either x or y NULL) are dropped
entirely from the calculation.
Memory profile: implemented with Welford online accumulators —
O(1) memory regardless of group size, and numerically more stable than
the naive two-pass formula. Safe for SQLite analytics workloads with
millions of rows per group.
Cross-dialect support
| Dialect | Stat aggregates | Math scalars |
|---|---|---|
| SQLite | All seven via Python UDFs (slayer/sql/sqlite_udfs.py); var_samp/var_pop aliased to variance/variance_pop to handle sqlglot's transpilation rewrite. | All seven UDFs above. |
| Postgres / DuckDB | All native: STDDEV_* / VAR_* / CORR / COVAR_*. | All native. |
| ClickHouse | All native (sqlglot may emit camelCase, e.g. varSamp). | All native. |
| MySQL | STDDEV_SAMP / STDDEV_POP / VAR_SAMP / VAR_POP are native. corr / covar_samp / covar_pop raise NotImplementedError at SQL generation time (no native function, no Python-UDF mechanism). Use MariaDB or compute client-side. | All native. |
Internal improvements
These don't change behaviour but are worth knowing about:
-
Unified aggregate builders. The three dialect-aware builders
(_build_percentile,_build_formula_agg,_build_stat_aggin
slayer/sql/generator.py) now share two helpers —_wrap_filterand
SQLGenerator._resolve_agg_param— and emit fully model-qualified
column references via_resolve_sql. Cognitive complexity of
_build_stat_aggdropped from 22 → ~6, addressing a Sonar
python:S3776finding. -
percentile(p=…)validation.pis now validated as a numeric
literal in[0, 1]after parameter resolution. Closes a security gap
where a malicious model-level default likep=pg_sleep(10)could
bypass the SQL-injection regex check (Codex review finding).
Verify your work
# Full unit suite
poetry run pytest -m "not integration"
# SQLite + DuckDB integration (no Docker needed)
poetry run pytest tests/integration/test_integration.py -m integration
poetry run pytest tests/integration/test_integration_duckdb.py -m integration
# Lint
poetry run ruff check slayer/ tests/ examples/
If you run the Docker-backed examples, the new aggregations are smoke-
tested against live ClickHouse and MySQL via
examples/{clickhouse,mysql}/verify.py (with check_stddev_var() and,
on ClickHouse, check_corr_covar() — MySQL skips the latter since
corr/covar_* aren't supported there).
Documentation
- CLAUDE.md — Key Conventions + Aggregation caveats sections updated.
docs/concepts/{formulas,queries,models,terminology}.md— new aggregations + scalar functions documented end-to-end, with thelog(B, X)arg-order pin called out explicitly.docs/database-support.md— per-dialect support matrix + SQLite UDF caveats.docs/examples/07_aggregations/aggregations.md— extended quick-reference table.slayer/help/topics/03_aggregations.md+.claude/skills/slayer-{query,models}.md— agent-facing summaries updated.
Known limitations & follow-ups
Filed as sibling Linear issues for future releases:
- DEV-1320 — SQLite string/regex UDFs (
regexp,regexp_replace,split_part, …). Not blocking any known KB pattern but rounds out parity. - DEV-1321 — ClickHouse stat aggregates may return
NaN(not SQL NULL) for single-row / empty / zero-variance groups, contradicting the documented Postgres-style semantics. Fix needsIF(isNaN(x), NULL, x)wrapping plus ClickHouse integration coverage. - DEV-1322 — Type-check kwarg-as-column args for
weighted_avg(weight=…)/corr(other=…)/covar_*(other=…). Todayamount:corr(other=string_column)slips past enrichment and fails at the database with a dialect-specific runtime error rather than a clear "not numeric" message.
Acknowledgements
CodeRabbit, SonarCloud, and Codex (via the codex-review skill) all
contributed review feedback that shaped the final shape of this
release.
- Run any existing model through `engine.save_model()` once to trigger the v1→v2→v3 migration; stored YAML/SQLite will be upgraded automatically.
- Update Python code constructing `SlayerQuery` objects: replace `fields=` with `measures=` and remove `dry_run`/`explain` from query bodies.
- When using CLI, prefer `--storage` over the deprecated `--models-dir`; default storage locations have changed per OS.
- `SlayerQuery.fields` → `SlayerQuery.measures` (all interfaces)
- `SlayerModel.dimensions` + `SlayerModel.measures` → unified `columns`; old `Dimension`/`Measure` classes removed
- `dry_run` and `explain` are no longer fields on `SlayerQuery`; pass them as kwargs to `engine.execute()`
- Query-backed models as a first-class source mode (`source_queries`, cached on save)
- Windowed aggregations (`sum(window='30d')`, `avg(window='1y2m')`) and `consecutive_periods` transform
- `median` and `percentile` support across Postgres, DuckDB, SQLite, ClickHouse
Full changelog
SLayer 0.4.0 — Schema overhaul, query-backed models, windowed aggregations
The 0.4.0 release lands the v1 → v2 → v3 schema migration: a unified column model, query-backed models as a first-class source mode, windowed sum/avg, cross-dialect median/percentile, and a richer MCP surface. The wire format changes are backward-compatible on read (a migration chain auto-upgrades v1/v2 YAML and SQLite stores at load time), but the Python types and JSON shapes have moved — see Breaking changes below.
Every change in this release is covered by the unit suite (1264 tests) and integration suites against SQLite, Postgres, and DuckDB.
Highlights
Unified columns and query-time aggregation (v2 schema)
SlayerModel no longer has separate dimensions and measures lists. Both have been merged into a single columns: List[Column]. Each column carries a data type and can be used either as a group-by dimension or as the input to an aggregation — the role is decided per query. SlayerModel.measures is repurposed to hold a library of named formulas (ModelMeasure) — saved metrics that queries can reference by bare name.
Aggregations are picked at query time using colon syntax: revenue:sum, *:count, price:weighted_avg(weight=quantity). Function-style calls (sum(revenue)) auto-correct to the colon form with a deprecation warning.
# v2/v3 model
columns:
- {name: id, sql: id, type: number, primary_key: true}
- {name: status, sql: status, type: string}
- {name: revenue, sql: amount, type: number}
- {name: quantity, sql: qty, type: number}
measures:
- {name: aov, formula: "revenue:sum / *:count", label: "Average Order Value"}
// v2/v3 query
{
"source_model": "orders",
"measures": ["revenue:sum", {"formula": "aov"}],
"dimensions": ["status"]
}
SlayerQuery.fields is now SlayerQuery.measures, and model is now source_model.
Query-backed models — first-class source mode
A SlayerModel can now be backed by one or more saved SlayerQuery stages:
await engine.create_model_from_query(
query={"source_model": "orders", "measures": ["amount:sum"], "dimensions": ["region"]},
name="monthly_revenue",
variables={"region": "US"}, # default placeholder values
)
# Run by name
result = await engine.execute("monthly_revenue", variables={"region": "EU"})
source_queries joins sql_table and sql as the third source mode (mutually exclusive). query_variables provides defaults for {var} placeholders inside the saved stages, and a save-time dry-run populates the columns + backing_query_sql cache. The cache refreshes only on save paths — engine.execute never writes back to storage (closes a lost-update race against parallel edit_model calls).
Surfaced across every interface: REST POST /query accepts {"name": "...", "variables": {...}}; MCP query tool accepts a model-name shortcut; CLI slayer query <model_name> --variables k=v.
Schema versioning + read-time migrations
Every persisted entity (SlayerModel, SlayerQuery, DatasourceConfig) now carries a version: int field. Older versions are upgraded transparently at load time via a converter chain:
v1 → v2: mergesdimensions+measuresintocolumns; renamesSlayerQuery.fields→measures; converts the deprecatedtype: sum/avgalias on legacy measures intoallowed_aggregations.v2 → v3: stripsdry_run/explainfromSlayerQuery(they're nowengine.execute()kwargs); walksSlayerModel.source_queriesrecursively.
The migration hook lives on the Pydantic class itself (@model_validator(mode="before")), so every storage backend — YAMLStorage, SQLiteStorage, third-party backends, plus the MCP/REST/dbt entry points — picks up migrations automatically with no backend-side changes. See docs/concepts/models.md#schema-versioning.
Windowed aggregations + consecutive_periods transform
sum and avg accept an optional window parameter for trailing-time aggregations:
{
"measures": [
{"formula": "revenue:sum(window='30d')", "name": "revenue_30d"},
{"formula": "revenue:avg(window='1y2m')", "name": "avg_14m"}
],
"time_dimensions": [{"dimension": "created_at", "granularity": "month"}]
}
Compact duration grammar: y, m, w, d, h, min, s — combinable as 1y2m3w5d6h7min8s. The window is computed over raw source rows ending at each output bucket, so it can be larger, equal to, or smaller than the query's time granularity.
consecutive_periods(predicate) is a new time-ordered transform returning the trailing run length where the predicate is true at the query grain — usable in selected measures, filters, and comparisons against integer literals.
Time-ordered transforms (cumsum, lag, lead, consecutive_periods) now partition by the query's non-time dimensions, so running calculations don't bleed across groups.
median and percentile across tier-1 databases
revenue:median and revenue:percentile(p=0.95) now work on Postgres, DuckDB, SQLite, and ClickHouse:
- Postgres: native
PERCENTILE_CONT(p) WITHIN GROUP (ORDER BY x). - DuckDB: sqlglot rewrites to
QUANTILE_CONT(x, p ORDER BY x). - SQLite: Python aggregate UDFs registered on every connection.
- ClickHouse: native
median(x)and parametricquantile(p)(x). - MySQL: not supported — raises
NotImplementedErrorat SQL generation time. Use MariaDB or compute client-side.
MCP inspect_model section budgeting
inspect_model is the heaviest MCP tool an agent uses on each new model. It now accepts three params so callers can drill in without paying for the full envelope every call:
sections: subset of["columns", "measures", "aggregations", "joins", "reachable_fields", "samples"]. Default = all six. Omitted sections collapse to a one-line backticked CSV of names (or are dropped entirely forreachable_fields/samples).descriptions_max_chars: trim long descriptions with a literal... [truncated]marker.reachable_fields_depth: configurable BFS depth for joined-model field expansion (default 5, range[0, 20]).
When any section is trimmed, a quoted-Markdown footer at the end of the response shows what was returned, what was names-only, and what was omitted, with a hint on how to re-fetch. JSON output mirrors the same structure with <section>_names siblings and top-level omitted_sections / names_only_sections / unknown_sections arrays.
Default behaviour is unchanged: inspect_model(model_name="orders") produces the same payload as before.
One-line setup commands
slayer serve --demoandslayer mcp --demospin up the bundled Jaffle Shop DuckDB datasource and ingest its models on startup. Idempotent. Requires theduckdbextra andjafgen.slayer datasources create postgresql://user:${DB_PASSWORD}@host/db --ingestcreates a datasource and ingests it in one command. Connection strings are parsed; secrets can be supplied via--password-stdin.
Auto-correct for function-style aggregations
SUM(amount), count(*), etc. inside a formula now auto-correct to the colon-aggregation form (amount:sum, *:count) with a one-shot warning. The colon syntax is canonical, but agents that emit SQL-flavoured aggregation calls no longer error out.
first() transform
Adds first(x) as the FIRST_VALUE-window mirror of the existing last(x) transform — broadcasts the earliest time bucket's value across all rows. Useful for "% growth from baseline" queries.
Breaking changes
The wire format is backward-compatible on read — old YAML and SQLite stores upgrade transparently via the v1 → v2 → v3 migration chain. But the Python types and JSON shapes have moved:
SlayerQuery.fields→SlayerQuery.measures. All four interfaces (REST body, MCP tool, CLI JSON, Python kwarg) now usemeasures. TheFieldclass no longer exists; use{"formula": "..."}dicts or pass strings.SlayerQuery.model→SlayerQuery.source_modelin JSON bodies. The Python kwarg has always beensource_model.SlayerModel.dimensions+SlayerModel.measures→SlayerModel.columns. The oldDimensionandMeasurerow-level classes are gone; a unifiedColumnreplaces both.SlayerModel.measuresstill exists but now holds named formulas (ModelMeasure), not row-level SQL.SlayerQueryv3 setsextra="forbid". Direct Python construction with an unknown field (SlayerQuery(typo=...)) now raisesValidationError. Persisted v1/v2 queries withdry_run/explainare auto-migrated with a one-shotDeprecationWarning.dry_runandexplainare no longer fields onSlayerQuery. Pass them as kwargs toengine.execute(query, dry_run=..., explain=...). The MCPquerytool, CLI--dry-run/--explainflags, and RESTPOST /querybody still accept them at the surface layer; the routing pops them before constructing the SlayerQuery.- Query-backed model caches refresh only on save paths (
engine.save_model,create_model_from_query(save=True)). Directstorage.save_model()calls bypass the cache refresh — go through the engine.engine.execute()no longer writes to storage as a side effect. - CLI:
--models-diris deprecated (still works); use--storage. Default storage path is now platform-appropriate (~/.local/share/slayeron Linux,~/Library/Application Support/slayeron macOS,%LOCALAPPDATA%\slayeron Windows) instead of./slayer_data. Override with$SLAYER_STORAGE.
Migration
For most users, no action is needed: load your v1/v2 store with 0.4.0, save any model once, and on-disk YAML/SQLite is upgraded to v3.
A few callers will need explicit fixes:
- Anything constructing
SlayerQueryin Python code: renamefields=→measures=. The oldmodel=kwarg was already aliased tosource_model=in v0.3 but is now removed; usesource_model=directly. - Anything reading
model.dimensionsormodel.measures(where measures was the row-level list): switch tomodel.columns. If you specifically want the named-formula library, that's stillmodel.measuresbut containsModelMeasureobjects ({formula, name, label, description}), not the oldMeasureshape. - Any code passing
dry_run=Trueinside a query body: move it to theengine.execute()kwarg.
What's new (full list)
Schema and migrations
- #57 — Add
version: intto every persisted entity + read-time migration chain inslayer/storage/migrations.py. Foundational; functionally a no-op until the v2 cut. - #61 (S3) — v2 schema: merge
dimensions+measuresintocolumns; repurposemeasuresasModelMeasureformulas; renameSlayerQuery.fields→measures; v1 → v2 converter inslayer/storage/v2_migration.py. - #63 (S3a) — Bare-name
ModelMeasureresolution ({"formula": "aov"}expands to the saved formula); 8 review follow-ups. - #66 (S4) — Auto-ingestion and dbt converter aligned with v2 schema (one
Columnper non-joined column,*:countalways available, no auto-generated measures). - #77 — v3 schema: drop
dry_run/explainfromSlayerQuery(now engine kwargs); addextra="forbid"; v2 → v3 converter inslayer/storage/v3_migration.py.
New features
- #67 (S2a) — Query-backed models as a first-class source mode (
SlayerModel.source_queries,query_variables,engine.create_model_from_query,engine.execute("name")shorthand). Surfaced through MCP, REST, and CLI. - #64 (S5) — Windowed sum/avg via
(window="..."); newconsecutive_periods(predicate)transform; partition time-ordered transforms by non-time dimensions. - #60 (S7) —
medianandpercentilework across Postgres, DuckDB, SQLite, ClickHouse. SQLite uses Python aggregate UDFs registered per-connection. - #70 (S8) —
inspect_modelsection budgeting (sections,descriptions_max_chars,reachable_fields_depth). - #43 — One-line
slayer datasources create <connection-string> [--ingest]. Supports--password-stdinfor secure credentials. - #44 —
slayer serve --demo/slayer mcp --demoboots the bundled Jaffle Shop DuckDB. - #45 — Auto-correct function-style aggregations (
sum(x)→x:sum); addfirst()transform; convert remaining dataclasses to Pydantic.
Bug fixes
- #46 — Skip
information_schemaPK fallback for SQLite (was triggering a silent error on auto-ingestion). - #56 — Log the underlying DBAPI exception + SQL excerpt on transient DB retries.
- #75 — Fix ClickHouse type ingestion mapping numerics to STRING.
- #76 —
sqlite:///:memory:now works across multiple async calls on a singleSlayerSQLClient(per-instanceStaticPoolengine withcheck_same_thread=False). Two separate clients on:memory:are still isolated; usemode=memory&cache=sharedto share. - #79 — Read paths (
engine.execute,inspect_model,get_column_types, MCPquery, REST/query) no longer write to storage. Eliminates a lost-update race against paralleledit_modelcalls.
Internal
- #78 — Windowed-CTE and
consecutive_periodsbuilders refactored from string concatenation to sqlglot AST; pure refactor, no behaviour change. - #80 (S9) — Docs, skills, help topics, READMEs, runnable examples, and the
slayer-queryskill all aligned with v2/v3 schema. Also corrected a long-standing spec drift:*:countproduces result keyorders._count, notorders.count.
Links
- Query‑time multi‑hop join resolution eliminates stale metadata storage
- INNER join support with automatic mirroring for bidirectional relationships
- Per‑measure CTE isolation enabling cross‑model filtered measures and arithmetic formulas
Full changelog
Join overhaul: query-time multi-hop resolution, INNER join support, per-measure CTE isolation
Motivation
The join system had fundamental limitations exposed by the ACME insurance benchmark (gpt-5.3-codex scored 1/11 correct). Baked-in multi-hop joins at ingestion created stale metadata when the graph changed. All joins were LEFT, causing fan-out on entity relationships that should be INNER (50-118x inflated sums on benchmark queries). Cross-model filtered measures sharing a single base query caused JOIN intersections that zeroed out results. This PR rewrites the join and SQL generation architecture to fix these issues.
What changed
1. Query-time multi-hop join resolution
Ingestion now emits only direct joins (one ModelJoin per FK). Multi-hop paths like orders.customers.regions are resolved at query time by walking each intermediate model's own joins. This eliminates redundant storage and keeps join metadata consistent when the graph changes.
Files: ingestion.py, enrichment.py (_collect_needed_paths, _resolve_joins)
2. LEFT/INNER join support
New JoinType enum (LEFT, INNER). dbt entity-based joins (both foreign->primary and peer) now use INNER, matching MetricFlow behavior. Inner joins are automatically mirrored: if A->B is INNER, B->A is created too.
Join synchronization is centralized in a new JoinSyncStorage decorator that wraps every storage backend, handling mirroring on save, orphan cleanup on delete, and one-time reconciliation of hand-edited YAML.
Files: enums.py, models.py, entities.py, converter.py, join_sync.py (new), base.py
3. Per-measure CTE isolation
Major SQL generation refactor. The generator now follows a multi-stage pattern:
- Stage 1: Base CTE (simple measures + dims) + per-measure CTEs for cross-model and cross-model-filtered measures
- Stage 2: Combined CTE (LEFT JOIN all CTEs on shared dims), expressions/transforms stacked on top
This fixes two classes of bugs: filtered measures needing different INNER JOINs no longer intersect in a single base query, and cross-model measures in arithmetic formulas (previously "not yet supported") now work.
CTE builders were also converted from f-string SQL to sqlglot AST building for dialect safety.
Files: generator.py (major rewrite), enrichment.py
4. Re-rooted subqueries for cross-model measures
Cross-model measure CTEs previously only joined source -> target, so filters on tables reachable from the target's join graph were silently dropped. Now each cross-model measure gets a re-rooted EnrichedQuery where the target model is the FROM table, giving it access to its full join graph for correct filter application. Unreachable dimensions/filters are dropped gracefully (scalar CROSS JOIN).
Files: query_engine.py (_build_rerooted_enriched), enriched.py, generator.py
5. dbt entity system fixes
- Peer joins: Models sharing a primary entity (e.g.,
claimandclaim_coverageboth declaringclaim_identifier) now get bidirectional joins, enabling paths likeclaim.claim_coverage.policy - Foreign entity fan-out: Resolution now iterates ALL matching primary models, not just the first
- Metric filter qualification: Filters referencing dimensions on peer models are now qualified correctly (e.g.,
loss_payment.has_loss_payment) - Deterministic resolution:
get_primary_modeland peer dimension selection use lexicographic ordering instead of depending on registration order - Peer join dedup: Uses
(model, local_expr, peer_expr)signature instead of model-name-only
Files: entities.py, filters.py, converter.py
6. LLM-friendliness improvements
- Auto-promote fields to dimensions: When an LLM puts a bare dimension name in
fields, it's automatically moved todimensions(with validation: no aggregation, resolves as dimension, not a measure) - Source model prefix stripping: New
SlayerQuery.strip_source_model_prefix()centralizes the removal of redundant prefixes likeorders.revenue:sumwhensource_model=orders, handling all reference types (dims, time dims, fields, filters, order, main_time_dimension) - Dimension aggregation:
claim_identifier:count_distinctnow works; dimensions are looked up when no measure matches, with type-safety checks (rejects sum/avg on strings) - Cross-model dimension aggregation: Formulas like
policies.id:count_distinctwork via a synthetic Measure wrapper - Constant-SQL filter fix: Dimensions with literal SQL (e.g.,
sql="1") no longer produce brokenpremium.1qualifications - inspect_model show_sql: Now controls ALL SQL visibility (model SQL, filters, dim/measure SQL columns), reducing noise in LLM context
Files: query_engine.py, query.py, enrichment.py, generator.py, server.py
7. Runtime column type inference and measure profiling
- New
get_column_types()probes measure types via the engine pipeline (properly joined SQL), supporting DuckDB, Postgres, MySQL, and SQLite with dialect-specific OID maps inspect_modelnow shows type (number/string/time/boolean) and value range (min..max or "all NULL") columns for measures, helping LLMs avoid errors likeSUM(VARCHAR)
Files: client.py, query_engine.py, server.py
8. Time shift rework
time_shiftnow shifts the time dimension column via SQLINTERVALinstead of computing shifted date range strings in Python. Eliminates_shift_date,_build_row_number_join, and the row-number vs calendar join branchingchange(x)andchange_pct(x)are desugared at enrichment time into a hiddentime_shift+ arithmetic expression, simplifying the generator
Files: enrichment.py, generator.py, enriched.py
9. Additional bug fixes
- ORDER BY with custom field names:
field_name_aliasesmap consulted by the ORDER BY resolver - ORDER BY in computed/transform path: Uses
_resolve_order_column()instead of manual f-string formatting - Nested WITH clause:
_build_combined()returns structured CTE tuples instead of assembled SQL - CTE name collisions: Same measure with different aggregations (e.g.,
revenue:sum+revenue:avg) gets distinct CTE names - CTE WHERE leaking: Filters narrowed to only conditions whose table aliases are available in each CTE
- Inline-SQL model joins: JOIN injection moved from string replacement to sqlglot AST level
- FK dedup: Uses
(target, src_col, tgt_col)signature, preserving multiple FKs to the same table - Self-referencing join mirroring: Skipped to prevent duplicates
- First/last isolated measures: Get their own ranked subquery within their CTE instead of referencing non-existent
_last_rncolumns from the base
Test coverage
~3,400 new test lines across 12 test files:
test_sql_generator.py: ~2,000 new lines covering CTE isolation, re-rooted subqueries, cross-model dimension aggregation, ORDER BY resolution, first/last isolation, SQL validity checks (_assert_valid_sqlvalidates every generated query with sqlglot)test_join_sync.py(new): 321 lines: mirroring, save/delete sync, reconciliationtest_models.py(new): 303 lines:strip_source_model_prefixexhaustive coveragetest_dbt_converter.py: 326 new lines: peer joins, foreign entity, inner join mirroringtest_dbt_filters.py: 100 new lines: peer dimension qualification, deterministic selectiontest_sql_client.py(new): 86 lines: type code mapping across driverstest_mcp_server.py: 154 new lines: show_sql gating, measure types, inner join mirroring- Integration tests: expanded with cross-model filter scenarios
Documentation
Updated across ingestion, joins, formulas, transforms, queries, terminology, and help topics to reflect direct FK joins at ingestion, query-time traversal, INTERVAL-based time shifting, and dimension aggregation.
- `SlayerQueryEngine.execute()` is now `async def`; use `execute_sync()` for synchronous contexts (CLI, notebooks, scripts)
- `slayer serve` and `slayer mcp` default to platform‑specific storage paths (`~/.local/share/slayer`, `~/Library/Application Support/slayer`, `%LOCALAPPDATA%\slayer`) when no `--storage` flag or `$SLAYER_STORAGE` is set
- dbt import requires installing the `dbt` extra: `pip install motley-slayer[dbt]`
- `datasource_summary` removed; replaced by `models_summary`
- `inspect_model` output format changed to complete single-model view with sampled values, measure metadata, join fields, and auto‑pruned markdown tables
- New async‑first engine, storage backends, and SQL client using native async drivers (`asyncpg`, `aiomysql`) and thread fallbacks
- Filter placeholders `{variable}` supporting reusable filter templates with substitution from a query's `variables` dict
- `models`, `dimensions`, and `measures` now support an optional `meta: Dict[str, Any]` field for arbitrary JSON metadata
Full changelog
SLayer 0.3 Release Notes
Redesigned MCP capabilities
The MCP server has been substantially reworked. datasource_summary is replaced by models_summary — a per-datasource brief overview — and inspect_model now returns a complete single-model view with sampled dimension values, measure metadata, reachable join fields, and auto-pruned markdown tables. A new help tool provides contextual explanations of SLayer concepts (queries, transforms, joins, etc.) directly inside the agent conversation. All MCP tools are now fully async, and query output supports format parameter ("markdown", "json", "csv").
Async-first architecture
The engine, storage backends, and SQL client are now async-first. SlayerQueryEngine.execute() is async def; use execute_sync() for CLI, notebooks, and scripts. Postgres and MySQL queries use native async drivers (asyncpg, aiomysql) with pooled connections. SQLite and DuckDB fall back to asyncio.to_thread. Storage backends (YAMLStorage, SQLiteStorage) are async, ready for true async implementations like Postgres-backed multi-tenant storage. All tests use pytest-asyncio with asyncio_mode = "auto".
Variables in filters
Filters now support {variable} placeholders, substituted from the query's variables dict. This keeps filter templates reusable across different parameter values without string concatenation: "status = '{status}' AND amount > {min}" with variables={"status": "completed", "min": 100}. Variable names must be alphanumeric, values must be strings or numbers (inserted as-is). {{/}} produce literal braces. Undefined variables raise clear errors.
Meta fields on models, measures, and dimensions
Models, dimensions, and measures now have an optional meta: Dict[str, Any] field for arbitrary user-defined JSON metadata (e.g., {"owner": "analytics", "pii": true}). Persisted in YAML and SQLite storage, editable via MCP edit_model (pass null to clear), HTTP API, and CLI. The MCP sentinel pattern distinguishes "not provided" from "explicitly clear."
Default storage path
slayer serve and slayer mcp now work with zero configuration. When no --storage flag or $SLAYER_STORAGE is set, SLayer uses a platform-appropriate default: ~/.local/share/slayer on Linux (XDG-compliant), ~/Library/Application Support/slayer on macOS, %LOCALAPPDATA%\slayer on Windows.
dbt Semantic Layer import
SLayer can now import models from a dbt project via slayer import-dbt. Semantic models and metrics from dbt_project.yml are converted into SLayer models with dimensions, measures, and joins. The --include-hidden-models flag additionally imports every regular dbt model not wrapped by a semantic_model as a hidden SLayer model (queryable by name but excluded from discovery endpoints). Requires the dbt extra: pip install motley-slayer[dbt].
- Models must validate `allowed_aggregations` lists at creation time; existing models without this field will be rejected until updated.
- Queries previously embedding aggregation functions directly in measure definitions need to be rewritten using the new colon syntax.
- Filters relying on legacy operator syntax should be migrated to the SQL‑style operators (`=`, `<>`, `IN`, `IS NULL`).
- Measures must be defined as pure row‑level expressions (`{name, sql}`) without aggregation; aggregation is now specified at query time using colon syntax (e.g., `"revenue:sum").
- Query dimensions, order, and fields can accept plain strings instead of object wrappers.
- Filters use SQL‑style operators (`=`, `<>`, `IN`, `IS NULL`) and support multi‑hop notation.
- Cross‑model measures with dot syntax (e.g., `"customers.score:avg"`), supporting multi‑hop aggregation.
- Multistage queries allowing a query to serve as the source for another query or be saved as a permanent model via `create_model_from_query`.
- Dynamic joins with automatic resolution of diamond inheritance and path‑based aliases for disambiguation.
Full changelog
SLayer 0.2 Release Notes
Aggregation separated from measures
Measures are now pure row-level expressions ({name: "revenue", sql: "amount"}); aggregation is chosen at query time with colon syntax ("revenue:sum", "*:count", "price:weighted_avg(weight=quantity)"). Built-in aggregations: sum, avg, min, max, count, count_distinct, first, last, weighted_avg, median, percentile. Custom aggregations with SQL templates can be defined at the model level. Per-measure allowed_aggregations whitelists are validated at model creation and query time.
Cross-model measures
Query measures from joined models with dot syntax and colon aggregation ("customers.*:count", "customers.score:avg", multi-hop: "customers.regions.population:sum"). Sub-query isolation prevents JOIN row multiplication. Transforms compose on cross-model measures ("cumsum(customers.score:avg)").
Multistage queries
Use a query as the source for another query, or save any query as a permanent model with create_model_from_query. ModelExtension extends models inline with extra dimensions, measures, or joins at query time.
Dynamic joins with diamond support
Joins are auto-resolved at query time by walking the join graph. Path-based aliases (customers__regions vs warehouses__regions) disambiguate when the same table is reachable via multiple FK paths.
Model filters
Always-applied WHERE conditions on models (e.g., "deleted_at IS NULL").
DuckDB support
New Tier 1 database, fully integration-tested, no Docker required.
Query introspection
dry_run previews generated SQL without executing; explain runs dialect-aware EXPLAIN (Postgres, SQLite, MySQL, DuckDB, ClickHouse, Snowflake, SQL Server, Oracle each use the correct syntax).
Simpler query syntax
Dimensions, time dimensions, and order accept plain strings ("status" instead of {"name": "status"}). Fields accept strings with colon syntax ("revenue:sum" instead of {"formula": "revenue_sum"}).
SQL-style filters
Operators =, <>, IN, IS NULL; multi-hop filters ("customers.regions.name = 'US'"); computed-column filters ("change(revenue:sum) > 0").
Extended CLI
- Full CRUD for datasources:
slayer datasources create,create-inline,delete,test(connectivity check) slayer ingestnow supports--includeand--excludetable filters--storageflag on all commands: pass a directory for YAML or a.dbfile for SQLite storage--password-stdinfor secure credential input- Rich help text with usage examples on every subcommand (
slayer query --help, etc.)
uvx / uv support
SLayer now works with uvx --from motley-slayer slayer serve and uv tool install motley-slayer out of the box. PEP 621 [project] metadata added to pyproject.toml.
Documentation overhaul
- Getting Started split into four audience-specific guides: MCP (AI agents), CLI (terminal), REST API (any language), Python SDK
- Interfaces → Reference — interface docs repositioned as reference, getting-started pages are task-oriented
- Each getting-started page has a "Verify it works" section
- Pre-commit hooks with ruff (lint + format check)
- CONTRIBUTING.md with dev setup, test matrix, and extension guides
Tutorials
Seven new tutorial chapters with Jupyter notebooks covering dynamic models, SQL vs DSL, auto-ingestion, time operations, joins, cross-model measures, multistage queries, and aggregations — all using the Jaffle Shop dataset on DuckDB.
- Agent-Native Interfaces: full MCP support via stdio, HTTP SSE, REST and CLI
- Nested Formula API for complex derived metrics (e.g., change(cumsum(revenue)))
- Time‑Aware Analytics with precise gap handling using CTEs
Full changelog
SLayer v0.1.0 – First Stable Release 🚀
SLayer is a lightweight, open-source semantic layer designed to give AI agents (and humans) a PhD in your data. It represents a shift toward "intent-based" data access – maximizing the reliability and minimizing SQL boilerplate while constructing complex real-world business reports.
Core Architecture
- Agent-Native Interfaces: Full support for the Model Context Protocol (MCP) via stdio and HTTP SSE, enabling seamless integration with AI environments like Claude Code and Cursor, as well as REST and CLI interfaces
- Flexible Deployment: Operates as a standalone service or an embeddable & extendable Python library. Its pluggable
StorageBackendallows for dynamic models updates without redeploys. - Dialect Agnostic: Leverages
sqlglotto provide consistent behavior across 10+ databases.
Advanced Query Engine
- Nested Formula API: Support for complex, derived metrics via arithmetic and nested transforms at query time (e.g.,
change(cumsum(revenue))), abstracting away the complexity of window functions. -
- Time-Aware Analytics: A robust implementation of
time_shift,change, andchange_pcttransforms, andlastas aggregation method. SLayer utilizes self-join CTEs to ensure precise gap handling and eliminate spurious NULLs during period-over-period comparisons.
- Time-Aware Analytics: A robust implementation of
- Smart Filter Routing: Logic-aware filtering that automatically extracts inline expressions and routes them to the appropriate
WHEREorHAVINGclauses based on aggregation context.
Automation & Modeling
- Auto-Ingestion: Intelligent schema introspection that auto-generates denormalized models and FK-based rollup joins, significantly reducing the time to first query.
- Flexible Storage & Auditability: Models and datasources are managed via a version-control friendly YAML layer, a single-file SQLite backend for embedded use, or any custom implementation via the pluggable
StorageBackendprotocol.