VAST releases - releaseport

No immediate action

v6.8.1 Bug fix 2d

Corrupt store file fixes

Open

No immediate action

v6.8.0 Mixed 5d

Splunk input + ClickHouse enhancements

Open

No immediate action

v6.7.0 Mixed 11d

fork_merge + force-stop + performance + Google SecOps

Open

No immediate action

v6.6.0 Mixed 17d

ClickHouse column support + bugfixes

Open

No immediate action

v6.5.0 Breaking risk 23d

Partition omission + map_keys + summarize speed

Open

No immediate action

v6.4.0 Mixed 27d

Package constants + Platform token embedding + ClickHouse JSON +

Open

No immediate action

v6.3.0 Mixed 1mo

Drain data + UDO diagnostics + ClickHouse source

Open

No immediate action

v6.2.0 Mixed 1mo

Kinesis ops + Kafka regex + SecOps + null fields

Open

No immediate action

v6.1.0 Breaking risk 1mo

Windowing, AI prompt, MSSQL source, Prometheus sink

Open

v6.0.0 Breaking risk 1mo

Broad release touches 🚀 Features, 💥 Breaking changes, 🐞 Bug fixes, and 🔧 Changes.

Full changelog

Tenzir v6 ships with a rewritten execution engine that unlocks faster, more capable, and more scalable pipelines. Refer to the migration guide at https://docs.tenzir.com/guides/tenzir-v6-migration before upgrading.

💥 Breaking changes

`$file` let-binding for filesystem readers

The filesystem and cloud object reader operators (from_file, from_s3, from_azure_blob_storage, from_google_cloud_storage) no longer accept the path_field option. Instead, the parsing subpipeline now has access to a $file let-binding describing the source file:

To attach the source path to each event:

// Before:
from_file "/data/*.json", path_field=source

// After:
from_file "/data/*.json" {
  read_json
  source = $file.path
}

This makes per-file metadata available throughout the parsing subpipeline rather than only on emitted events.

By @raxyte in #6001.

`from_http` is now a pure HTTP client

The from_http operator no longer doubles as an HTTP server. It is now a pure HTTP client that issues one request and returns the response.

For accepting incoming HTTP requests, use the dedicated accept_http operator instead:

// Before:
from_http "0.0.0.0:8080", server=true { read_json }

// After:
accept_http "0.0.0.0:8080" { read_json }

In the parsing subpipeline, the response metadata is now exposed as the $response let-binding instead of being written into a metadata_field:

from_http "https://api.example.com/status" {
  read_json
  status_code = $response.code
  server = $response.headers.Server
}

Additionally, the url and headers arguments are now resolved as secrets, so you can pass secret names instead of hardcoding tokens or sensitive URLs.

By @aljazerzen in #5953.

`to_kafka` defaults to NDJSON-encoded messages

The default message expression of the to_kafka operator is now this.print_ndjson() instead of this.print_json(). Kafka messages are single-line records by default, so each event is now emitted as a single NDJSON line:

{"timestamp":"2024-03-15T10:30:00.000000","source_ip":"192.168.1.100","alert_type":"brute_force"}

instead of pretty-printed multi-line JSON.

To restore the previous behavior, pass message=this.print_json() explicitly.

By @lava in #5742.

`yara` requires finite input

The yara operator no longer accepts the blockwise argument. Instead, it buffers the entire input as one contiguous byte sequence and runs the YARA scan when the input ends. Matches can therefore span chunk boundaries, but yara is now only suitable for finite byte streams. Don't use it on never-ending inputs.

The rule argument now also accepts a single string in addition to a list of strings:

from_file "evil.exe", mmap=true {
  yara "rule.yara"
}

Removed:

yara ["rule.yara"], blockwise=true

By @mavam in #6035.

Dedicated FTP source and sink operators

Two new operators provide first-class FTP and FTPS support with parsing and printing subpipelines:

from_ftp downloads bytes from an FTP or FTPS server and forwards them to the parsing subpipeline.
to_ftp uploads bytes produced by the printing subpipeline to an FTP or FTPS server.

from_ftp "ftp://user:[email protected]/path/to/file.ndjson" {
  read_ndjson
}

to_ftp "ftp://user:[email protected]/a/b/c/events.ndjson" {
  write_ndjson
}

The load_ftp and save_ftp operators have been removed, and the ftp:// and ftps:// URL schemes no longer dispatch via from and to. Use from_ftp and to_ftp directly.

By @mavam in #6044.

OpenSearch ingestion with `accept_opensearch`

The new accept_opensearch operator starts an OpenSearch-compatible HTTP server and turns incoming Bulk API requests into events:

accept_opensearch "0.0.0.0:9200"
publish "events"

The operator buffers each bulk request body up to max_request_size, optionally decompresses it based on the Content-Encoding header, parses the NDJSON payload, and emits the resulting records. Set keep_actions=true to also keep the OpenSearch action objects (e.g., {"create": ...}) in the stream.

The from_opensearch operator has been removed. Use accept_opensearch instead. The elasticsearch:// and opensearch:// URL schemes now dispatch to accept_opensearch via from.

By @aljazerzen in #6066.

Removed `real_time` argument from `measure`

The measure operator no longer accepts the real_time argument. The operator's emission cadence is now governed entirely by the executor's backpressure, so the option no longer has a meaningful effect.

Remove real_time=true or real_time=false from your pipelines:

// Before:
measure real_time=true

// After:
measure

By @aljazerzen in #5880.

Renamed `from_gcs` to `from_google_cloud_storage`

The from_gcs operator has been renamed to from_google_cloud_storage so that its name matches the new to_google_cloud_storage writer:

// Before:
from_gcs "gs://my-bucket/data/**.json"

// After:
from_google_cloud_storage "gs://my-bucket/data/**.json"

Update test suites that reference from_gcs in requires.operators accordingly.

By @raxyte in #5766.

🚀 Features

`from_http` infers response parsers

The from_http operator now accepts requests without an explicit parser subpipeline when Tenzir can infer the response format from the Content-Type header or URL extension:

from_http "https://example.com/events.json"

Explicit parser subpipelines continue to take precedence over inferred formats.

By @mavam and @codex.

Add `auto_fill` option to `read_csv`, `read_tsv`, `read_ssv`, and `read_xsv`

The read_csv, read_tsv, read_ssv, and read_xsv operators now accept an auto_fill=true option. When set, the parser silently fills missing trailing columns with null instead of emitting a warning, which is useful when working with feeds that legitimately omit optional trailing fields.

By @jachris and @claude.

CloudWatch Logs operators

Tenzir now supports reading from and writing to CloudWatch Logs with the new from_amazon_cloudwatch and to_amazon_cloudwatch operators. The source can subscribe to live streams with mode="live", search historical log groups with mode="search", or replay one stream with mode="replay".

from_amazon_cloudwatch "/aws/lambda/api", mode="search", filter="ERROR"

The default sink can send events with PutLogEvents, including configurable batching, timestamp handling, parallel requests, and AWS IAM authentication via aws_iam. The sink can also write to the CloudWatch HTTP ingestion endpoints by setting method to json, ndjson, or hlc, with either SigV4 or bearer-token authentication.

to_amazon_cloudwatch "/tenzir/events",
  stream="default",
  payload=message,
  timestamp=ts

By @mavam and @codex in #6180.

Dedicated TCP source and sink operators

Tenzir now has dedicated TCP source and sink operators that match the same client/server split as the HTTP operators:

from_tcp connects to a remote TCP or TLS endpoint as a client.
accept_tcp listens on a local endpoint and spawns a subpipeline per accepted connection. Inside the subpipeline, $peer.ip and $peer.port identify the connecting client; with resolve_hostnames=true, $peer.hostname is also available from reverse DNS.
to_tcp connects to a remote endpoint and writes serialized bytes.
serve_tcp listens for incoming connections and broadcasts pipeline output to all connected clients.

Each operator takes a parsing or printing subpipeline so connection management, framing, and serialization stay separate concerns:

accept_tcp "0.0.0.0:8090" {
  read_json
}

to_tcp "collector.example.com:5044" {
  write_json
}

from_tcp and to_tcp reconnect with exponential backoff on connection failure. All four operators support TLS via the tls option.

The legacy load_tcp and save_tcp operators are now deprecated. The tcp:// and tcps:// URL schemes still dispatch to them via from and to.

By @mavam in #5744 and #6017.

DNS result caching in `dns_lookup`

The dns_lookup operator now caches DNS results and reuses them across lookups. Forward-lookup results gain a ttl field that shows the remaining lifetime of the cached answer:

from {host: "example.com"}
dns_lookup host

If Tenzir cannot initialize DNS resolution at all, the operator now emits an error and stops instead of writing null results for every event. Individual failed or timed-out lookups still produce null, as before.

By @mavam in #6034.

Event throughput metrics for the new executor

Pipeline metrics now report event throughput alongside byte throughput for pipelines running on the new executor:

metrics "pipeline"
summarize ingress_events=sum(ingress.events), ingress_bytes=sum(ingress.bytes), egress_events=sum(egress.events), pipeline_id
sort -egress_events

This makes node metrics distinguish the amount of data transferred from the number of events processed.

By @mavam and @codex.

from_amqp queue arguments

from_amqp now accepts a queue_arguments record for RabbitMQ queue declaration arguments:

from_amqp "amqp://broker/vhost",
          queue="events",
          queue_arguments={
            "x-queue-type": "quorum",
            "x-quorum-initial-group-size": 3
          }

Use this to declare queues with broker-specific settings such as quorum queues, maximum lengths, message TTLs, single active consumers, and dead-letter exchanges.

By @mavam and @codex in #6139.

HEC metadata and raw endpoint support in `to_splunk`

The to_splunk operator gains three new options for richer HEC metadata.

Use time= to set the per-event Splunk timestamp from an expression that evaluates to a Tenzir time or a non-negative epoch in seconds:

from {message: "login succeeded", observed_at: 2026-04-24T08:30:00Z}
to_splunk "https://localhost:8088",
  hec_token=secret("splunk-hec-token"),
  time=observed_at

Use fields= to attach indexed HEC fields. The expression must evaluate to a flat record whose values are strings or lists of strings:

to_splunk "https://localhost:8088",
  hec_token=secret("splunk-hec-token"),
  event={message: message},
  fields={user: user, tags: tags}

Use raw= to send already-formatted text to the HEC raw endpoint (/services/collector/raw). The raw expression must evaluate to a string. Multiple events in one request are separated by newlines, and request-level metadata such as host, source, sourcetype, index, and time is sent as query parameters:

to_splunk "https://localhost:8088",
  hec_token=secret("splunk-hec-token"),
  raw=line,
  source=source,
  sourcetype="linux_secure"

raw and event are mutually exclusive; fields is not supported with raw.

By @mavam in #6074.

HEC queue selection in `to_splunk`

The neo to_splunk implementation now accepts queue="indexing" and queue="typing" for selecting the Splunk HEC processing queue. The default indexing path keeps Splunk's regular HEC behavior, while typing sends the Splunk parsingQueue hint in HEC event envelopes for receivers that support this non-standard HEC metadata.

The default is queue="indexing". The typing queue is rejected with raw=..., because Splunk's raw HEC endpoint sends raw requests to the indexer queue.

By @mavam and @codex.

High-level filesystem and object store writers

Four new high-level writer operators serialize events to local filesystems and cloud object stores with rotation, hive-style partitioning, and per-partition unique filenames:

to_file writes to a local filesystem.
to_s3 writes to Amazon S3.
to_azure_blob_storage writes to Azure Blob Storage.
to_google_cloud_storage writes to Google Cloud Storage.

Each takes a printing subpipeline, a URL with optional ** and {uuid} placeholders, and rotation parameters. The ** placeholder expands into a hive partitioning hierarchy based on partition_by, and {uuid} ensures each partition gets unique destination names:

subscribe "events"
to_s3 "s3://my-bucket/year=**/month=**/{uuid}.json",
  partition_by=[year, month] {
  write_ndjson
}

Files rotate automatically when the configured max_size or timeout is reached, so long-running pipelines do not produce single huge objects.

By @raxyte in #6053.

Internal memory size function

The new internal_memory_size function estimates the size of each event in bytes:

size = internal_memory_size(this)

This is useful for building pipelines that inspect or route events based on their approximate in-memory payload size.

By @IyeOnline and @codex.

Keyed routing and source mode for `parallel`

The parallel operator gains two enhancements:

The jobs argument is now optional and defaults to the number of available CPU cores:

subscribe "events"
parallel {
  parsed = data.parse_json()
}

The new route_by argument routes events to workers deterministically by key. Events with the same route_by value always go to the same worker, which is required for stateful subpipelines like deduplicate or summarize:

subscribe "events"
parallel route_by=src_ip {
  deduplicate src_ip, dst_ip, dst_port
}

Additionally, parallel may now be used as a source operator (without upstream input). This spawns multiple independent instances of the subpipeline, which is useful for running the same source pipeline with concurrent connections.

By @jachris in #5821.

Keyed subpipeline routing with `group`

The new group operator routes events with the same key through a shared subpipeline. Inside the subpipeline, $group refers to the key for that subpipeline:

group tenant {
  summarize count()
}

The subpipeline either emits events—which are forwarded as the operator's output—or ends with a sink, in which case group itself becomes a sink. Use group when you need keyed routing through a stateful subpipeline, such as a per-tenant sink or a per-session transformation. For grouped aggregations, keep using summarize.

By @jachris in #5980.

Live packet capture with `from_nic`

The new from_nic operator captures packets from a network interface and emits them as events directly:

from_nic "eth0"

Without an explicit subpipeline, from_nic parses the captured PCAP byte stream with read_pcap. Provide a subpipeline when you want to change how the byte stream is parsed. Use the filter option to apply a Berkeley Packet Filter (BPF) expression so libpcap drops unwanted traffic before parsing:

from_nic "eth0", filter="tcp port 443"

The companion read_pcap and write_pcap operators have been refreshed: read_pcap now also emits a pcap.file_header event when emit_file_headers=true, which write_pcap consumes to preserve the original timestamp precision and byte order. The pcap.packet schema's time.timestamp field is now a top-level timestamp field, and data is now a blob.

By @mavam in #6022.

Memory-mapped reads in `from_file`

The from_file operator now accepts an mmap=bool option that uses memory-mapped I/O for reading local files instead of regular reads. This can improve performance for large files:

from_file "/var/log/large.json", mmap=true {
  read_json
}

Defaults to false.

By @raxyte in #6036.

Microsoft Graph source operator

Tenzir now includes a Microsoft Graph source operator for reads from Microsoft Graph v1.0 and beta collections with app-only Microsoft Entra authentication and OData pagination.

For example, you can read Entra ID sign-in logs with client credentials and push down OData query options:

from_microsoft_graph "auditLogs/signIns",
  auth={
    tenant_id: "contoso.onmicrosoft.com",
    client_id: "00000000-0000-0000-0000-000000000000",
    client_secret: secret("ms-graph-client-secret"),
  },
  odata={
    filter: "createdDateTime ge 2026-04-24T00:00:00Z",
    select: ["id", "createdDateTime", "userPrincipalName", "status"],
    top: 1000,
  }

The operator emits each object from the response value array as a separate event and follows @odata.nextLink until the collection is exhausted.

The operator can also use Microsoft Graph delta queries with delta=true, storing the returned @odata.deltaLink in memory and polling it with a configurable poll_interval. OData query options apply to the initial delta request only, subject to Microsoft Graph's resource-specific support, and subsequent polls use the opaque delta link exactly as Microsoft Graph returned it.

It also retries throttled and transient Microsoft Graph requests, respecting Retry-After when present.

By @mavam and @codex in #6165, #6179, and #6182.

MySQL source operator

The from_mysql operator lets you read data directly from MySQL databases.

Read a table:

from_mysql table="users", host="localhost", port=3306, user="admin", password="secret", database="mydb"

List tables:

from_mysql show="tables", host="localhost", port=3306, user="admin", password="secret", database="mydb"

Show columns:

from_mysql table="users", show="columns", host="localhost", port=3306, user="admin", password="secret", database="mydb"

And ultimately execute a custom SQL query:

from_mysql sql="SELECT id, name FROM users WHERE active = 1",
           host="localhost",
           port=3306,
           user="admin",
           password="secret",
           database="mydb"

The operator supports TLS/SSL connections for secure communication with MySQL servers. Use tls=true for default TLS settings, or pass a record for fine-grained control:

from_mysql table="users", host="db.example.com", database="prod", tls={
  cacert: "/path/to/ca.pem",
  certfile: "/path/to/client-cert.pem",
  keyfile: "/path/to/client-key.pem",
}

The operator supports MySQL's caching_sha2_password authentication method and automatically maps MySQL data types to Tenzir types.

Use live=true to continuously stream new rows from a table. The operator tracks progress using a watermark on an integer column, polling for rows above the last-seen value:

from_mysql table="events", live=true, host="localhost", database="mydb"

By default, the tracking column is auto-detected from the table's auto-increment primary key. To specify one explicitly:

from_mysql table="events", live=true, tracking_column="event_id",
           host="localhost", database="mydb"

By @mavam and @claude in #5721 and #5738.

NATS JetStream operators

Tenzir can now consume from and publish to NATS JetStream subjects with from_nats and to_nats.

Use from_nats to receive one event per message. The raw payload appears in the message blob field, and metadata_field attaches NATS metadata:

from_nats "alerts", metadata_field=nats
parsed = string(message).parse_json()

Use to_nats to publish one message per event. By default, the operator serializes the whole event with this.print_ndjson():

from {severity: "high", alert_type: "suspicious-login"}
to_nats "alerts"

Both operators support configurable connection settings, authentication, and the standard Tenzir tls record.

By @mavam and @codex.

OData pagination for from_http

The from_http operator now supports paginate="odata" for OData collection responses such as Microsoft Graph:

from_http "https://graph.microsoft.com/v1.0/users",
  headers={"ConsistencyLevel": "eventual"},
  paginate="odata" {
  read_json
}

This mode emits the objects from the response body's top-level value array and follows top-level @odata.nextLink URLs until no next link is present. The next link can be absolute or relative to the current response URL.

By @mavam and @codex.

Parse TQL records with `read_tql`

The new read_tql operator parses an incoming byte stream of TQL-formatted records into events. Each top-level record expression becomes one event:

load_file "events.tql"
read_tql

The input format matches the output of write_tql, so read_tql is useful for round-tripping data through TQL notation, reading TQL-formatted files, or processing data piped from other Tenzir pipelines.

By @mavam in #5707.

Per-event subpipelines with `each`

The new each operator runs a fresh subpipeline for every input event. The event is bound to $this inside the subpipeline so it can parametrize the nested logic on a per-event basis:

from [
  {file: "a.json"},
  {file: "b.json"},
]
each {
  from $this.file
}

The subpipeline takes no input from each. It either emits events—which are forwarded as the operator's output—and may also end with a sink, in which case each itself becomes a sink.

Use each for per-event jobs such as a lookup, an export, or a sink whose source depends on the incoming event. For keyed streams that should keep one subpipeline alive per key, use group instead.

By @jachris in #5981.

Prometheus shape for `metrics`

The metrics operator now accepts shape="prometheus" to emit metrics from metrics plugins as canonical {metric, value, timestamp, labels, type, unit} records. The default remains shape="raw", which preserves the existing tenzir.metrics.* schemas.

By @mavam and @codex in #6190.

Raw byte output with write_all

The new write_all operator concatenates one selected string or blob field into raw bytes:

from_file "/tmp/report.pdf" {
  read_all binary=true
}
to_file "/tmp/report-copy.pdf" {
  write_all data
}

Use it to copy binary payloads, reconstruct byte streams after event processing, or write string fields without separators or escaping.

By @mavam and @codex.

Read from standard input with `from_stdin`

The new from_stdin operator reads bytes from standard input through a parsing subpipeline:

from_stdin {
  read_json
}

This is useful when piping data into the tenzir executable as part of a shell script or command chain.

By @raxyte in #5731.

Repeat string function

The new repeat function repeats a string a given number of times:

message = "na".repeat(8)

{
  message: "nananananananana",
}

By @mavam and @codex in #6181.

Request records for `from_http` pagination

The from_http operator now supports returning request records from paginate lambdas. This lets APIs keep pagination state in the next request body or headers instead of only in the next URL:

from_http "https://opensearch.example.com/logs/_search",
  method="post",
  body={size: 500, query: {match_all: {}}},
  paginate=(x => {
    body: {
      size: 500,
      query: {match_all: {}},
      search_after: x.hits.hits[-1].sort,
    },
  } if x.hits.hits != []) {
  read_json
}

Returned request records can patch url, method, headers, and body. Missing fields inherit from the current request, and body: null clears the body.

By @mavam and @codex.

Send events to webhooks with `to_http`

The new to_http operator sends each input event as an HTTP request to a webhook or API endpoint. By default, it JSON-encodes the entire event as the request body and sends it as a POST:

subscribe "alerts"
to_http "https://example.com/webhook"

to_http shares its options with from_http and http: configure method, body, encode, headers, TLS, retries, and pagination per request. Use parallel to issue multiple concurrent requests when the target endpoint can keep up with a single pipeline.

This is useful for pushing alerts to webhooks, forwarding events to SIEMs, and calling external APIs once per event.

By @aljazerzen in #6019.

SQS receive controls

The from_sqs operator now gives you explicit control over how messages are received from SQS. Use keep_messages=true to inspect or replay messages without removing them from the queue, batch_size=<1..10> to control how many messages each receive request may return, and visibility_timeout=<duration> to override the queue visibility timeout for received messages:

from_sqs "events", keep_messages=true, batch_size=10, visibility_timeout=30s

By default, from_sqs keeps deleting each received message after emitting it. With keep_messages=true, SQS makes the message visible again after the queue's visibility timeout.

By @mavam and @codex in #6167 and #6174.

Stream pipeline output with `serve_http`

The new serve_http operator starts an HTTP server and broadcasts the bytes produced by a nested pipeline to all connected clients:

from_file "example.yaml"
serve_http "0.0.0.0:8080" {
  write_ndjson
}

Clients connect with a GET request and receive a continuous HTTP response body. Pick the wire format with the nested pipeline: write_ndjson for NDJSON streams, write_lines for plain text, and so on. The operator does not buffer output for clients that connect later—each client receives the bytes produced after it connects. TLS, connection limits, and graceful disconnect are all configurable.

By @aljazerzen in #6070.

Synthetic event generation with `anonymize`

The new anonymize operator generates synthetic events that share the schemas of its input. The operator first samples a configurable number of input events to learn what schemas are present and to summarize their values, and then replaces the input with generated events that match those schemas:

subscribe "events"
anonymize count=1000

By default, generated values follow the aggregate statistics of the sampled input: null rates, list lengths, numeric ranges, time and duration ranges, boolean and enum frequencies, string and blob lengths and byte frequencies, IP address family frequencies, and subnet prefix length frequencies. Use fully_random=true to ignore those statistics and instead pick values uniformly from each type's full range. The optional seed argument makes output reproducible.

Use anonymize to share representative event traces without leaking the underlying values.

By @IyeOnline.

TQL match statements

TQL now supports statement-level match blocks for branching on patterns:

match action {
  "accept" | "allow" => { verdict = "allowed" }
  "deny" | "drop" => { verdict = "blocked" }
  _ => { verdict = "unknown" }
}

Patterns can be constants, exclusive ranges, alternatives separated by |, or the final wildcard _. Every match must include an unguarded final wildcard arm, so Tenzir can prove at compile time that all possible values are covered. This provides a concise alternative to long else if chains when routing events by field value.

By @mavam and @codex.

Uncompressed Feather output

The write_feather operator now supports compression_type="uncompressed" to disable compression entirely. Previously, only zstd and lz4 were accepted:

to_file "events.feather" {
  write_feather compression_type="uncompressed"
}

By @mavam in #6045.

🔧 Changes

Add `accept_http` operator for receiving HTTP requests

We added a new operator to accept data from incoming HTTP connections.

The server option of the from_http operator is now deprecated. Going forward, it should only be used for client-mode HTTP operations, and the new accept_http operator should be used for server-mode operations.

By @lava.

Per-schema buffering and default timeout for `batch`

The batch operator now maintains separate buffers for each distinct schema. Each buffer has independent timeout tracking and fills until reaching the limit, at which point it flushes immediately. Previously, mixed-schema streams could stall waiting for a single combined buffer to fill.

The timeout argument now defaults to 1min instead of an infinite duration, so buffered events are flushed at least once per minute when no new events arrive.

By @aljazerzen in #5878 and #5906.

Preserve categorical order in `chart_bar`

The chart_bar and chart_pie operators now preserve the incoming row order for categorical x-axis values such as strings, IP addresses, and subnets. This allows users to control bar order with regular TQL operators such as sort before charting.

By @mavam.

Region derivation and endpoint logging for SQS

The from_sqs and to_sqs operators now derive the AWS region from a queue URL when aws_region is not set, so passing a full URL such as https://sqs.us-west-2.amazonaws.com/123456789012/my-queue works without having to specify the region again:

from_sqs "https://sqs.us-west-2.amazonaws.com/123456789012/my-queue"

Previously, this would fall back to the SDK default region and fail with a SigV4 signature mismatch. Explicit aws_region, resolved IAM credentials, and the SDK default still apply in that order when the URL has no region (for example VPC endpoints, LocalStack, or an AWS_ENDPOINT_URL override).

SQS API errors and HTTP failures now also include the endpoint URL in their log lines and diagnostic notes, which makes it easier to tell which queue produced an error when multiple SQS pipelines run side by side.

By @lava in #6168.

🐞 Bug fixes

Compaction resolves package UDOs at startup

The compaction plugin no longer fails to start with module <package> not found when a rule's pipeline references an operator defined by an installed package. Previously, depending on the order in which the node's components were initialized, the compactor's eager rule-pipeline parse could run before the package manager had published its operator modules to the global registry.

By @raxyte in #6210.

Faster drop_null_fields on heterogeneous data

The drop_null_fields operator is now much faster on heterogeneous input with many changing null patterns.

By @jachris, @mavam, and @codex in #5963.

Reduce disk I/O of time-based compaction

Time-based compaction rules no longer cause the node to reprocess data that has already been compacted in a previous run.

By @lava.

SentinelOne Data Lake sink support in the new executor

The to_sentinelone_data_lake operator now works in pipelines that run on the new executor. Previously, using it there failed before the pipeline could send events.

from {message: "hello"}
to_sentinelone_data_lake "https://example.com", token="TOKEN"

By @mavam and @codex in #6081.

Top-level package metadata

Packages can now include a top-level metadata field for data consumed by external tools. Unknown package keys still fail validation, and the error now points users to metadata for non-engine package data.

By @tobim and @codex in #6149.

View release on GitHub

v5.36.0 New feature 2mo

Notable features

OCSF `ocsf::derive` now supports list-valued enum fields for bidirectional normalization
Lossless merging of int64 and uint64 columns during parsing eliminates extraneous table-slice splits

Full changelog

This release makes int64/uint64 column merging lossless during parsing, so fields like flow_id that mix signed and unsigned values no longer cause unnecessary table-slice splits. It also extends ocsf::derive to handle list-valued enum fields for full bidirectional OCSF enum normalization.

🚀 Features

OCSF enum list derivation

ocsf::derive now derives OCSF enum sibling fields for lists, not just scalar enum fields. For example, DNS answers with flag_ids: [1, 3, 4] now also get flags: ["Authoritative Answer", "Recursion Desired", "Recursion Available"], and the reverse direction works for flags to flag_ids as well.

By @jachris, @mavam, and @codex in #5354.

🔧 Changes

Lossless int64/uint64 merging during parsing

Parsing data that mixes int64 and uint64 values in the same field no longer produces unnecessary table-slice splits, improving batching performance. Fields like flow_id that are always non-negative but occasionally exceed the signed integer limit of 2^63 − 1 are now merged into a single uint64 column where possible, instead of being emitted as separate slices.

By @IyeOnline and @claude.

🐞 Bug fixes

Empty if branches in the new executor

Empty if branches no longer crash when running pipelines with the new executor. For example, if false {} now behaves like an empty pass-through branch instead of triggering an internal assertion failure.

By @mavam and @codex in #6128.

View release on GitHub

v5.35.2 Bug fix 2mo

Fixed startup pipelines referencing static‑package operators and UDOs with slash‑delimited string defaults.

Full changelog

This release fixes two package-related bugs: startup pipelines can now reliably reference operators from static packages, and UDOs with slash-delimited string defaults (e.g. "/tmp-data/") load correctly without internal errors.

🐞 Bug fixes

Configured pipelines with package operators

Configured startup pipelines can now reference operators from static packages reliably. Previously, such pipelines could fail during node startup with module <package> not found, even though the same package operator worked when run manually after startup.

By @mavam and @codex.

Slash-delimited UDO defaults

Package UDOs now load correctly when a typed string default looks like a TQL pattern, such as default: "/tmp-data/".

Previously, loading such a package could abort with an unexpected internal error before any pipeline ran.

By @mavam and @codex in #6108.

View release on GitHub

v5.35.1 Bug fix 2mo

Fixed ClickHouse TLS mismatch diagnostics to include actionable hints and restored correct retention for mixed-age metrics partitions.

Full changelog

This release restores correct retention for metrics and diagnostics in mixed-age partitions and brings back actionable TLS hints when ClickHouse connections fail due to a TLS/plaintext mismatch.

🐞 Bug fixes

ClickHouse TLS mismatch diagnostics

ClickHouse connection errors caused by TLS/plaintext mismatches now include the TLS notes and hint again. This helps identify when to_clickhouse is using TLS against a plaintext ClickHouse endpoint and suggests setting tls=false when appropriate.

By @mavam and @codex in #6098.

Retention for mixed-age metrics partitions

Default retention policies now continue deleting metrics and diagnostics as their timestamps age into the retention window, even when older and newer events share a partition.

Previously, a partition that still contained newer events after retention could be skipped by later retention runs, leaving those events behind after they expired.

By @tobim and @codex in #6086.

View release on GitHub

v5.35.0 New feature 2mo

Notable features

New `from_nats` operator to consume events from NATS JetStream subjects, exposing raw payload and metadata
New `to_nats` operator to publish events to NATS JetStream subjects, defaulting to NDJSON serialization

Full changelog

Tenzir can now consume from and publish to NATS JetStream subjects with from_nats and to_nats. This release also fixes crashes in static musl builds when evaluating deeply nested generated TQL expressions.

🚀 Features

NATS JetStream operators

Tenzir can now consume from and publish to NATS JetStream subjects with from_nats and to_nats.

Use from_nats to receive one event per message. The raw payload appears in the message blob field, and metadata_field attaches NATS metadata:

from_nats "alerts", metadata_field=nats
parsed = string(message).parse_json()

Use to_nats to publish one message per event. By default, the operator serializes the whole event with this.print_ndjson():

from {severity: "high", alert_type: "suspicious-login"}
to_nats "alerts"

Both operators support configurable connection settings, authentication, and the standard Tenzir tls record.

By @mavam and @codex.

🐞 Bug fixes

Static musl builds no longer crash on deep TQL expressions

Static musl builds of tenzir no longer crash on deeply nested generated TQL expressions.

This affected generated pipelines with deeply nested expressions, for example rules or transformations that expand into long left-associated operator chains.

The tenzir binary now links with a larger default thread stack size on musl, which brings its behavior in line with non-static builds for these pipelines.

By @tobim and @codex in #6082.

View release on GitHub

v5.34.0 New feature 3mo

Notable features

from_http operator now supports paginate="odata" to iterate over OData v4 collection responses (e.g., MS Graph) using value arrays and @odata.nextLink links

Full changelog

This release adds OData pagination support to the from_http operator, enabling seamless iteration over Microsoft Graph and other OData v4 collection responses.

🚀 Features

OData pagination for from_http

The from_http operator now supports paginate="odata" for OData collection responses such as Microsoft Graph:

from_http "https://graph.microsoft.com/v1.0/users",
  headers={"ConsistencyLevel": "eventual"},
  paginate="odata" {
  read_json
}

This mode emits the objects from the response body's top-level value array and follows top-level @odata.nextLink URLs until no next link is present. The next link can be absolute or relative to the current response URL.

By @mavam and @codex.

View release on GitHub

v5.33.0 New feature 3mo

Notable features

subnet now accepts typed IPs, plain strings, and subnets with optional prefix length; defaults to /32 for IPv4 and /128 for IPv6

Full changelog

This release makes the subnet function work directly with typed and string IP addresses, which removes boilerplate in TQL pipelines. It also fixes several stability issues in where, unroll, files, context::enrich, and collection indexing.

🚀 Features

IP address support in subnet

The subnet function now accepts typed IP addresses, plain IP strings, and existing subnet values with an optional prefix length:

from {source_ip: 10.10.1.124}
net = subnet(source_ip, 24)

This returns 10.10.1.0/24 without converting the IP address to a string first. When you omit the prefix, IPv4 addresses become /32 host subnets and IPv6 addresses become /128 host subnets.

By @mavam and @codex.

🐞 Bug fixes

Crash fix for deep left-associated where expressions

Tenzir no longer segfaults on some very deep left-associated boolean expressions in where clauses due to source-location handling.

By @tobim and @codex in #6068.

Fixed unbounded memory growth `context::enrich`

We fixed an issue in the context::enrich operator that did cause unbounded memory growth.

By @IyeOnline.

Large unroll output stability

The unroll operator no longer crashes when expanding very large lists into output that exceeds Arrow's per-array capacity.

By @mavam and @codex.

Recursive files traversal of unreadable directories

The files operator now skips unreadable child directories during recursive traversal, emits a warning for each skipped directory by default, and continues listing accessible siblings. Set skip_permission_denied=true to ignore permission-denied paths silently: this suppresses warnings for skipped child directories and still makes an unreadable initial directory produce no events instead of an error. Non-permission filesystem errors continue to fail the pipeline.

By @mavam and @codex.

Unsigned integer indexing in TQL

Both list and record indexing in TQL now work with signed and unsigned integer indices. This also applies to record field-position indexing and to the get function for records and lists.

By @mavam and @codex.

View release on GitHub

v5.32.1 Bug fix 3mo

Fixed deterministic periodic summarize output for delayed or sparse streams.

Full changelog

This patch release fixes two correctness issues in stateful pipeline execution. Partition rebuilds now complete after writing replacement partitions, and periodic summarize output remains deterministic for delayed or sparse streams.

🐞 Bug fixes

Deterministic periodic summarize output

The summarize operator now starts frequency-based emission with the first input event and emits overdue periodic results before later events are aggregated. This makes periodic output deterministic in reset, cumulative, and update modes for delayed or sparse streams.

For example:

from {ts: 0ms.from_epoch(), x: 1},
     {ts: 90ms.from_epoch(), x: 1},
     {ts: 360ms.from_epoch(), x: 1}
delay ts
summarize count=count(), options={frequency: 300ms, mode: "cumulative"}

The first periodic result now consistently reports a count of 2 before the third event arrives.

By @mavam and @codex.

Partition rebuild completion

Partition rebuilds now finish after persisting rebuilt partitions. Previously, rebuild jobs could remain stuck indefinitely even though the replacement partitions were written successfully.

By @tobim and @codex in #6059.

View release on GitHub

v5.32.0 Breaking risk 3mo

⚠ Upgrade required

The `server` option of the `from_http` operator is deprecated; use the new `accept_http` operator for server-mode HTTP operations.

Notable features

Tenzir nodes honor standard HTTP proxy environment variables (HTTPS_PROXY, NO_PROXY) for Platform websocket connections

Full changelog

Tenzir nodes now honor standard HTTP proxy environment variables when connecting to the Tenzir Platform, and hash functions produce correct checksums for binary values.

🚀 Features

Platform websocket proxy support

Tenzir nodes now honor standard HTTP proxy environment variables when connecting to Tenzir Platform:

HTTPS_PROXY=http://proxy.example:3128 tenzir-node

Use NO_PROXY to bypass the proxy for selected hosts. This helps deployments where outbound connections to the Platform websocket gateway must go through an HTTP proxy.

By @tobim and @codex in #6039.

🔧 Changes

Add `accept_http` operator for receiving HTTP requests

We added a new operator to accept data from incoming HTTP connections.

The server option of the from_http operator is now deprecated. Going forward, it should only be used for client-mode HTTP operations, and the new accept_http operator should be used for server-mode operations.

By @lava.

🐞 Bug fixes

Raw-byte hashing for binary values

The hash_* functions now hash blob values by their raw bytes. This makes checksums computed from binary data match external tools such as md5sum and sha256sum.

For example:

from_file "trace.pcap" {
  read_all binary=true
}
md5 = data.hash_md5()

This is useful for verifying file contents and round-tripping binary formats without leaving TQL.

By @mavam and @codex in #6022.

View release on GitHub

v5.31.0 New feature 3mo

Notable features

Unified live and retrospective context lookups via `context::lookup` operator
Diagnostics and metrics now include human-readable `pipeline_name` field

Full changelog

Tenzir now unifies live and retrospective context matching with the new context::lookup operator, and it adds pipeline names to diagnostics and metrics for easier operational correlation. This release also improves export reliability under load and fixes Azure transport errors, HTTP Host headers for non-standard ports, and rebuilt-partition export correctness.

🚀 Features

Include pipeline names in diagnostics and metrics

The metrics and diagnostics operators now include a pipeline_name field.

Previously, output from these operators only identified the source pipeline by its ID. Now the human-readable name is available too, making it straightforward to filter or group results by pipeline name without needing to look up IDs separately.

Please keep in mind that pipeline names are not unique.

By @IyeOnline and @claude in #5959.

Unified context lookups with `context::lookup` operator

The context::lookup operator enables unified matching of events against contexts by combining live and retrospective filtering in a single operation.

The operator automatically translates context updates into historical queries while simultaneously filtering all newly ingested data against any context updates.

This provides:

Live matching: Filter incoming events through a context with live=true
Retrospective matching: Apply context updates to historical data with retro=true
Unified operation: Use both together (default) to match all events—new and historical

Example usage:

context::lookup "feodo", field=src_ip
where @name == "suricata.flow"

By @IyeOnline in #5964.

🐞 Bug fixes

Fix crash on Azure SSL/transport errors during read and write operations

Bumped Apache Arrow from 23.0.0 to 23.0.1, which includes an upstream fix for unhandled Azure::Core::Http::TransportException in Arrow's AzureFileSystem methods. Previously, transport-level errors (e.g., SSL certificate failures) could crash the node during file listing, reading, or writing. Additionally, the direct Azure SDK calls in the blob deletion code paths now catch Azure::Core::RequestFailedException (the common base of both StorageException and TransportException) instead of listing specific exception types.

By @claude.

Fix HTTP Host header missing port for non-standard ports

The from_http and http operators now include the port in the Host header when the URL uses a non-standard port. Previously, the port was omitted, which caused requests to fail with HTTP 403 when the server validates the Host header against the full authority, such as for pre-signed URL signature verification.

Reliable export for null rows in rebuilt partitions

The export operator no longer emits partially populated events from rebuilt partitions when a row is null at the record level. Previously, some events could appear with most fields set to null while a few values, such as event_type or interface fields, were still present.

This makes exports from rebuilt data more reliable when investigating sparse or malformed-looking events.

By @tobim and @codex in #5988.

Reliable recent exports during partition flushes

The export command no longer fails or misses recent events when a node is flushing active partitions to disk under heavy load. Recent exports now keep the in-memory partitions they depend on alive until the snapshot completes, which preserves correctness for concurrent import and export workloads.

By @tobim and @codex.

View release on GitHub

v5.30.0 Mixed 3mo

Notable features

OIDC web identity authentication for AWS operators via AssumeRoleWithWebIdentity
30× faster evaluation of `and`, `or`, and `if`/`else` expressions in pipelines

Full changelog

This release adds OIDC web identity authentication for AWS operators, so you can assume AWS roles from external identity providers without long-lived credentials. It also speeds up logical and conditional expression evaluation and fixes several crashes and configuration diagnostics.

🚀 Features

OIDC web identity authentication for AWS operators

AWS operators now support OIDC-based authentication via the AssumeRoleWithWebIdentity API.

You can authenticate with AWS resources using OpenID Connect tokens from external identity providers like Azure, Google Cloud, or custom endpoints. This enables secure cross-cloud authentication without sharing long-lived AWS credentials.

Configure web identity authentication in any AWS operator by specifying a token source and target role:

from_s3 "s3://bucket/path", aws_iam={
  region: "us-east-1",
  assume_role: "arn:aws:iam::123456789012:role/cross-cloud-role",
  web_identity: {
    token_file: "/path/to/oidc/token"
  }
}

The web_identity option accepts three token sources: token_file (path to a token file), token_endpoint (HTTP endpoint that returns a token), or token (direct token value). For HTTP endpoints, you can extract tokens from JSON responses using path.

Credentials automatically refresh before expiration, with exponential backoff retry logic for transient failures. This is especially useful for long-running pipelines that need persistent authentication.

By @tobim and @codex in #5703.

🔧 Changes

Faster evaluation of logical and conditional expressions

Pipelines that use and, or, or if-else expressions run significantly faster in certain cases — up to 30× in our benchmarks. The improvement is most noticeable in pipelines with complex filtering or branching logic. No pipeline changes are needed to benefit.

By @jachris in #5954.

OCSF 1.8.0 support in ocsf::derive

The ocsf::derive operator now supports OCSF 1.8.0 events.

For example, you can now derive enum and sibling fields for events that declare metadata.version: "1.8.0":

from {metadata: {version: "1.8.0"}, class_uid: 1007}
ocsf::derive

This keeps OCSF normalization pipelines working when producers emit 1.8.0 events.

By @mavam and @codex in #5939.

Platform configuration error message

Platform configuration validation now provides clearer error messages when an invalid configuration is encountered, helping you quickly diagnose and fix configuration issues.

By @lava in #5341.

🐞 Bug fixes

Fix crash on Azure SSL/transport errors

The Azure Blob Storage connector now handles Azure::Core::Http::TransportException (e.g., SSL certificate errors) gracefully instead of crashing. Previously, a self-signed certificate in the certificate chain would cause an unhandled exception and terminate the node.

By @lava.

Fix crash when connecting to unresolvable host

Setting TENZIR_ENDPOINT to an unresolvable hostname no longer crashes the pipeline with a segfault.

By @lava in #5827.

Spurious warning for Other (99) enum sibling in ocsf::derive

ocsf::derive no longer emits a false warning when an _id field is set to 99 (Other) and the sibling string contains a source-specific value.

Per the OCSF specification, 99/Other is an explicit escape hatch: the integer signals that the value is not in the schema's enumeration and the companion string must hold the raw value from the data source. For example, the following is now accepted silently:

from {
  metadata: { version: "1.7.0" },
  type_uid: 300201,
  class_uid: 3002,
  auth_protocol_id: 99,
  auth_protocol: "Negotiate",
}
ocsf::derive

Previously this produced a spurious warning: found invalid value for 'auth_protocol' because "Negotiate" is not a named enum caption.

By @mavam and @claude in #5949.

View release on GitHub

v5.29.4 Security 4mo

Security fixes

Prevented exposure of registry secrets in GitHub Actions workflows by utilizing stdin‑based Docker login instead of command‑line arguments

Notable features

Switched reusable manifest workflow to use the workflow token with correctly paired registry credentials
Avoided exposing registry secrets on command line by using stdin‑based Docker login

Full changelog

This patch release hardens container manifest publishing in GitHub Actions by switching the reusable manifest workflow to the workflow token with correctly paired registry credentials. It also avoids exposing registry secrets on the command line by using stdin-based Docker logins.

View release on GitHub

v5.29.3 Maintenance 4mo

Minor fixes and improvements.

Full changelog

This patch release keeps the 5.29 line moving with a small maintenance update and validates the refreshed release automation. It ships as a clean follow-up release without additional user-facing changes.

View release on GitHub

v5.29.2 Bug fix 4mo

⚠ Upgrade required

AWS Marketplace ECR `tenzir-node` image now correctly ships; if you relied on the previous `tenzir` image behavior, set `tenzir` as a custom entrypoint in ECS task definitions.

Notable features

Store origin metadata (`TENZIR:store:origin`) in Feather files indicating `ingest`, `rebuild`, or `compaction` source
Install Tenzir via Homebrew on Apple Silicon macOS
Complete Suricata 8 schema coverage with IKE, HTTP/2, PostgreSQL, and Modbus event types

Full changelog

This patch release fixes several correctness and performance issues across parsing, querying, and storage, and completes Suricata 8 schema coverage.

🚀 Features

Add store origin metadata to feather files

Feather store files now include a TENZIR:store:origin key in the Arrow table schema metadata. The value is "ingest" for freshly ingested data, "rebuild" for partitions created by the rebuild command, and "compaction" for partitions created by the compaction plugin. This allows external tooling such as pyarrow to distinguish how a partition was produced.

By @tobim.

Install Tenzir via Homebrew on macOS

You can now install Tenzir on Apple Silicon macOS via Homebrew:

brew tap tenzir/tenzir
brew install --cask tenzir

You can also install directly without tapping first:

brew install --cask tenzir/tenzir/tenzir

The release workflow keeps the Homebrew cask in sync with the signed macOS package so installs and uninstalls stay current across releases.

By @mavam in #5876.

🔧 Changes

Add Suricata schema types for IKE, HTTP2, PGSQL, and Modbus

The bundled Suricata schema now covers the remaining event types listed in the Suricata 8.0.3 EVE JSON format documentation: IKE (IKEv1/IKEv2), HTTP/2, PostgreSQL, and Modbus. This completes Suricata 8 schema coverage for Tenzir.

By @tobim in #5914.

Correct AWS Marketplace container image

The AWS Marketplace ECR repository tenzir-node was incorrectly populated with the tenzir image. It now correctly ships tenzir-node, which runs a Tenzir node by default.

If you relied on the previous behavior, you can restore it by setting tenzir as a custom entrypoint in your ECS task definition.

By @lava in #5925.

🐞 Bug fixes

Fix batch timeout to flush asynchronously

The batch timeout was only checked when a new event arrived, so a single event followed by an idle stream would never be emitted. The timeout now fires independently of upstream activity.

By @aljazerzen in #5906.

Fix over-reservation in partition_array for string/blob types

Splitting Arrow arrays for string and blob types no longer over-reserves memory. Previously both output builders reserved the full input size each, using up to twice the necessary memory.

By @jachris in #5899.

Fix parse_winlog batch splitting

The parse_winlog function could fragment output into thousands of tiny batches due to type conflicts in RenderingInfo/Keywords, where events with one <Keyword> emitted a string but events with multiple emitted a list. Additionally, EventData with unnamed <Data> elements is now always emitted as a record with _0, _1, etc. as field names instead of a list.

By @jachris in #5901.

Fix pattern equality ignoring case-insensitive flag

Pattern equality checks now correctly consider the case-insensitive flag. Previously, two patterns that differed only in case sensitivity were treated as equal, violating the hash/equality contract.

By @jachris in #5900.

Fix secret comparison bypass in `in` operator fast path

The in operator fast path now correctly prevents comparison of secret values. Previously, secret_value in [...] would silently compare instead of returning null with a warning, bypassing the established secret comparison policy.

By @jachris in #5899.

Optimize `in` operator and fix eq/neq null semantics

The in operator for list expressions is up to 33x faster. Previously it created and finalized entire Arrow arrays for every element comparison, causing severe overhead for expressions like EventID in [5447, 4661, ...].

Additionally, comparing a typed null value with == now returns false instead of null, and != returns true, fixing a correctness issue with null handling in equality comparisons.

By @jachris in #5899.

Support long syslog structured-data parameter names

The read_syslog operator and parse_syslog function now accept RFC 5424 structured-data parameter names longer than 32 characters, which some vendors emit despite the specification limit.

For example, this message now parses successfully instead of being rejected:

<134>1 2026-03-18T11:00:51.194137+01:00 HOSTNAME abc 9043 23003147 [F5@12276 thx_f5_for_ignoring_the_32_char_limit_in_structured_data="thx"] broken example

This improves interoperability with vendor syslog implementations that exceed the RFC limit for structured-data parameter names.

By @mavam and @codex.

View release on GitHub

v5.29.1 Bug fix 4mo

Fixed a scheduling bug causing nodes to become unresponsive when deploying many pipelines with detached operators.

Full changelog

This release fixes a scheduling issue introduced in v5.24.0 that could cause the node to become unresponsive when too many pipelines using detached operators were deployed simultaneously.

🐞 Bug fixes

Scheduling issue with detached operators

Fixed a scheduling issue introduced in v5.24.0 that could cause the node to become unresponsive when too many pipelines using detached operators like from_udp were deployed simultaneously.

By @lava in #5895.

View release on GitHub

v5.29.0 New feature 4mo

Notable features

Extract leading RFC 5424‑style structured data from RFC 3164 syslog messages (read_syslog/parse_syslog).
Bundled Suricata schema updated to match Suricata 8, adding POP3, ARP, BitTorrent DHT events and enhancing QUIC, DHCP, TLS fields.

Full changelog

This release improves log ingestion by extracting structured data from legacy syslog messages and aligning the bundled schema with Suricata 8. It also republishes the previous release after an error in the earlier release process.

🚀 Features

Extract structured data from legacy syslog content

read_syslog and parse_syslog now extract a leading RFC 5424-style structured-data block from RFC 3164 message content.

This pattern occurs in practice with some VMware ESXi messages, where components such as Hostd emit a legacy syslog record and prepend structured metadata before the human-readable message text.

For example, this raw syslog line:

<166>2026-02-11T18:01:45.587Z esxi-01.example.invalid Hostd[2099494]: [Originator@6876 sub=Vimsvc.TaskManager opID=11111111-2222-3333-4444-555555555555] Task Completed

now parses as:

{
  facility: 20,
  severity: 6,
  timestamp: "2026-02-11T18:01:45.587Z",
  hostname: "esxi-01.example.invalid",
  app_name: "Hostd",
  process_id: "2099494",
  structured_data: {
    "Originator@6876": {
      sub: "Vimsvc.TaskManager",
      opID: "11111111-2222-3333-4444-555555555555",
    },
  },
  content: "Task Completed",
}

Events without extracted structured data keep the existing syslog.rfc3164 schema. Events with extracted structured data use syslog.rfc3164.structured.

By @mavam and @codex in #5902.

Support for Suricata 8 schema

The bundled Suricata schema now aligns with Suricata 8, enabling proper parsing and representation of events from Suricata 8 deployments.

This update introduces support for new event types including POP3, ARP, and BitTorrent DHT, along with enhancements to existing event types. QUIC events now include ja4 and ja4s fields for fingerprinting, DHCP events include vendor_class_identifier, and TLS certificate timestamps now use the precise time type instead of string representation.

These schema changes ensure that Tenzir can reliably ingest and process telemetry from Suricata 8 without data loss or type mismatches.

By @IyeOnline and @satta in #5888.

🐞 Bug fixes

Fix pipeline startup timeouts

In some situations, pipelines could not be successfully started, leading to timeouts and a non-responsive node, especially during node start.

By @jachris in #5893.

Graceful handling of Google Cloud Pub/Sub authentication errors

Invalid Google Cloud credentials in from_google_cloud_pubsub no longer crash the node. Authentication errors now surface as operator diagnostics instead.

By @mavam and @codex in #5877.

Prevent where/map assertion crash on sliced list batches

Pipelines using chained list transforms such as xs.where(...).map(...).where(...) no longer trigger an internal assertion on sliced input batches.

By @IyeOnline and @codex in #5886.

View release on GitHub

v5.28.0 New feature 4mo

Notable features

Support for parsing non‑RFC5424 compliant Check Point structured-data with normalization under `checkpoint_2620`
`load_tcp` operator now has opt‑in DNS hostname resolution via `resolve_hostnames` (default false)

Full changelog

This release adds support for parsing Check Point syslog structured-data dialects that deviate from RFC 5424, improving out-of-the-box interoperability with Check Point exports. It also makes DNS hostname resolution in the load_tcp operator opt-in and fixes several parser bugs related to schema changes between events.

🚀 Features

Check Point syslog structured-data dialect parsing

parse_syslog() and read_syslog now accept common Check Point structured-data variants that are not strictly RFC 5424 compliant. This includes key:"value" parameters, semicolon-separated parameters, and records that omit an SD-ID entirely.

For records without an SD-ID, Tenzir now normalizes the structured data under checkpoint_2620, so downstream pipelines can use a stable field path.

For example, the message <134>1 ... - [action:"Accept"; conn_direction:"Incoming"] now parses successfully and maps to structured_data.checkpoint_2620. This improves interoperability with Check Point exports and reduces ingestion-time preprocessing.

By @mavam and @codex in #5851.

🔧 Changes

DNS hostname resolution opt-in for load_tcp operator

The load_tcp operator now makes DNS hostname resolution opt-in with the resolve_hostnames parameter (defaults to false).

Previously, the operator always attempted reverse DNS lookups for peer endpoints, which could fail in environments without working reverse DNS configurations. Now you can enable this behavior by setting resolve_hostnames to true:

load_tcp endpoint="0.0.0.0:5555" resolve_hostnames=true {
  read_json
}

When enabled and DNS resolution fails, the operator emits a warning diagnostic (once) instead of failing. This allows the operator to continue functioning in environments where reverse DNS is unavailable or unreliable.

By @tobim and @codex in #5865.

JSON parse error context

JSON parsing errors now display the surrounding bytes at the error location. This makes it easier to diagnose malformed JSON in your data pipelines.

For example, if your JSON is missing a closing bracket, the error message shows you the bytes around that location and marks where the parser stopped expecting more input.

By @IyeOnline in #5805.

🐞 Bug fixes

Parser bug fixes for schema changes

Fixed multiple issues that could cause errors or incorrect behavior when the schema of parsed events changes between records. This is particularly important when ingesting data from sources that may add, remove, or modify fields over time.

Schema mismatch warnings for repeated fields in JSON objects (which Tenzir interprets as lists) now include an explanatory hint, making it clearer what's happening when a field appears multiple times where a single value was expected.

By @IyeOnline in #5805.

Uncaught exception reporting

We improved the reporting for unexpected diagnostics outside of operator execution, such as during startup. In these cases you will now get the diagnostic message.

By @IyeOnline in #5805.

View release on GitHub

v5.27.3 Bug fix 4mo

Fixed crash when reading JSON data and improved CEF parsing to handle unescaped equals characters.

Full changelog

This release fixes a crash that could occur when reading JSON data. It also improves CEF parsing to handle non-conforming unescaped equals characters.

🐞 Bug fixes

Fix CEF parsing for unescaped equals

The CEF parser now handles unescaped = characters (which are not conforming to the specification) by using a heuristic.

By @jachris in #5841.

JSON reading crash fix

We fixed a bug that could cause a crash when reading JSON data.

By @IyeOnline in #5855.

View release on GitHub

v5.27.2 Bug fix 4mo

Notable features

Added hmac function for computing Hash-based Message Authentication Codes over strings and blobs

Full changelog

This release adds the hmac function for computing Hash-based Message Authentication Codes over strings and blobs. It also fixes an assertion failure in array slicing that was introduced in v5.27.0.

🐞 Bug fixes

Fixed an assertion failure in slicing

We fixed a bug that would cause an assertion failure "Index error: array slice would exceed array length". This was introduced as part of an optimization in Tenzir Node v5.27.0.

By @IyeOnline in #5842.

View release on GitHub

v5.27.1 Bug fix 5mo

Fixed platform plugin to respect certfile, keyfile, and cafile options for client certificate authentication.

Full changelog

This release fixes an issue where the platform plugin did not correctly use the configured certfile, keyfile, and cafile options for client certificate authentication.

🐞 Bug fixes

Fix platform plugin not respecting `certfile` and `keyfile` options

Fixed in issue where the platform plugin did not correctly use the configured certfile, keyfile and cafile options for client certificate authentication, and improved the error messages for TLS issues during platform connection.

By @lava.

View release on GitHub

v5.27.0 New feature 5mo

Notable features

sort function gains `desc` parameter for descending order
sort function gains `cmp` parameter for custom comparator lambdas
slice function extended to operate on list types with begin, end, and stride support

Full changelog

This release enhances the sort function with custom comparators and descending order support, and extends the slice function to work with lists.

🚀 Features

Enhance `sort` function with `desc` and `cmp` parameters

The sort function now supports two new parameters: desc for controlling sort direction and cmp for custom comparison logic via binary lambdas.

Sort in descending order:

from {xs: [3, 1, 2]}
select ys = sort(xs, desc=true)

{ys: [3, 2, 1]}

Sort records by a specific field using a custom comparator:

from {xs: [{v: 2, id: "b"}, {v: 1, id: "a"}, {v: 2, id: "c"}]}
select ys = sort(xs, cmp=(left, right) => left.v < right.v)

{
  ys: [
    {v: 1, id: "a"},
    {v: 2, id: "b"},
    {v: 2, id: "c"},
  ],
}

The cmp lambda receives two elements and returns a boolean indicating whether the first element should come before the second. Both parameters can be combined to reverse a custom comparison.

By @mavam and @codex in #5767.

Slice function extended to support lists

The slice function now supports list types in addition to string. You can slice lists using the same begin, end, and stride parameters. Negative stride values are now supported for lists, letting you reverse or step backward through list data. String slicing continues to require a positive stride.

Example usage with lists:

[1, 2, 3, 4, 5].slice(begin=1, end=4) returns [2, 3, 4]
[1, 2, 3, 4, 5].slice(stride=-1) returns the list in reverse order
[1, 2, 3, 4, 5].slice(begin=1, end=5, stride=-2) returns [5, 3]

By @mavam and @codex in #5819.

🐞 Bug fixes

Fix `read_lines` operator for old executor

The read_lines operator was accidently broken while it was ported to the new execution API. This change restores its functionality.

By @tobim.

HTTP header values can contain colons

HTTP header values containing colons are now parsed correctly.

By @lava in #5693.

View release on GitHub

v5.26.0 New feature 5mo

Notable features

from_mysql operator for reading tables, listing schemas, executing custom SQL and streaming live updates with TLS support
Link‑header based pagination (`paginate="link"`) for `from_http` and `http` operators following RFC 8288
Optional field parameters in user‑defined operators allowing null defaults

Full changelog

This release introduces the from_mysql operator for reading data directly from MySQL databases, with support for live streaming, custom SQL queries, and TLS connections. It also adds link-based HTTP pagination and optional field parameters for user-defined operators.

🚀 Features

Link header pagination for HTTP operators

The paginate parameter for the from_http and http operators now supports link-based pagination via the Link HTTP header.

Previously, pagination was only available through a lambda function that extracted the next URL from response data. Now you can use paginate="link" to automatically follow pagination links specified in the response's Link header, following RFC 8288. This is useful for APIs that use HTTP header-based pagination instead of embedding next URLs in the response body.

The operator parses the Link header and follows the rel=next relation to automatically fetch the next page of results.

Example:

from_http "https://api.example.com/data", paginate="link"

If an invalid pagination mode is provided (neither a lambda nor "link"), the operator now reports a clear error message.

By @mavam and @claude.

MySQL source operator

The from_mysql operator lets you read data directly from MySQL databases.

Read a table:

from_mysql table="users", host="localhost", port=3306, user="admin", password="secret", database="mydb"

List tables:

from_mysql show="tables", host="localhost", port=3306, user="admin", password="secret", database="mydb"

Show columns:

from_mysql table="users", show="columns", host="localhost", port=3306, user="admin", password="secret", database="mydb"

And ultimately execute a custom SQL query:

from_mysql sql="SELECT id, name FROM users WHERE active = 1",
           host="localhost",
           port=3306,
           user="admin",
           password="secret",
           database="mydb"

The operator supports TLS/SSL connections for secure communication with MySQL servers. Use tls=true for default TLS settings, or pass a record for fine-grained control:

from_mysql table="users", host="db.example.com", database="prod", tls={
  cacert: "/path/to/ca.pem",
  certfile: "/path/to/client-cert.pem",
  keyfile: "/path/to/client-key.pem",
}

The operator supports MySQL's caching_sha2_password authentication method and automatically maps MySQL data types to Tenzir types.

Use live=true to continuously stream new rows from a table. The operator tracks progress using a watermark on an integer column, polling for rows above the last-seen value:

from_mysql table="events", live=true, host="localhost", database="mydb"

By default, the tracking column is auto-detected from the table's auto-increment primary key. To specify one explicitly:

from_mysql table="events", live=true, tracking_column="event_id",
           host="localhost", database="mydb"

By @mavam and @claude in #5721 and #5738.

Optional field parameters for user-defined operators

User-defined operators in packages can now declare optional field-type parameters with null as the default value. This allows operators to accept field selectors that are not required to be provided.

When a field parameter is declared with type: field and default: null, you can omit the argument when calling the operator, and the parameter will receive a null value instead. You can then check whether a field was provided by comparing the parameter to null within the operator definition.

Example:

In your package's operator definition, declare an optional field parameter:

args:
  named:
    - name: selector
      type: field
      default: null

In the operator implementation, check if the field was provided:

set result = if $selector != null then "field provided" else "field omitted"

When calling the operator, the field argument becomes optional:

my_operator                    # field is null
my_operator selector=x.y       # field is x.y

Only null is allowed as the default value for field parameters. Non-null defaults are rejected with an error during package loading.

By @mavam and @claude in #5753.

🐞 Bug fixes

Improve write_lines operator performance

We have significantly improved the performance of the write_lines operator.

By @IyeOnline.

merge() function recursive deep merge for nested records

The merge() function now performs a recursive deep merge when merging two records. Previously, nested fields were dropped when merging, so merge({hw: {sn: "XYZ123"}}, {hw: {model: "foobar"}}) would incorrectly produce {hw: {model: "foobar"}} instead of recursively merging the nested fields. The function now correctly produces {hw: {sn: "XYZ123", model: "foobar"}} by materializing both input records and performing a deep merge on them.

By @mavam and @claude in #5728.

Secret type support for user-defined operator parameters

User-defined operators in packages can now declare parameters with the secret type to ensure that secret values are properly handled as secret expressions:

args:
  positional:
    - name: api_key
      type: secret
      description: "API key to use for authentication"

By @mavam and @claude in #5752.

View release on GitHub

v5.25.2 Bug fix 5mo

Fixed sigma operator to load all Sigma rule files from a directory and its subdirectories.

Full changelog

This release fixes the sigma operator to correctly load all rule files from a directory.

🐞 Bug fixes

Fix sigma operator directory handling to load all rules

The sigma operator now correctly loads all rules when given a directory containing multiple Sigma rule files. Previously, only the last processed rule file would be retained because the rules collection was being cleared on every recursive directory traversal.

sigma "/path/to/sigma/rules"

All rules found in the directory and its subdirectories will now be loaded and used to match against input events.

By @mavam and @claude in #5715.

View release on GitHub

v5.25.1 Bug fix 5mo

Notable features

read_syslog operator gains `raw_message` parameter to preserve unparsed syslog input
Kafka operators now support decompression for zstd, lz4, and gzip

Full changelog

This release includes several bug fixes for the JSON parser, where, replace, and if operators, along with Kafka decompression support and a new raw_message option for the read_syslog operator.

🚀 Features

Raw message field support for read_syslog operator

The read_syslog operator now supports a raw_message parameter that preserves the original, unparsed syslog message in a field of your choice. This is useful when you need to retain the exact input for auditing, debugging, or compliance purposes.

When you specify raw_message=<field>, the operator stores the complete input message (including all lines for multiline messages) in the specified field. This works with all syslog formats, including RFC 5424, RFC 3164, and octet-counted messages.

For example:

read_syslog raw_message=original_input

This stores the unparsed message in the original_input field alongside the parsed structured fields like hostname, app_name, message, and others.

By @mavam and @claude in #5687.

🐞 Bug fixes

Fix assertion failure in replace operator when replacing with null

The replace operator no longer triggers an assertion failure when using with=null on data processed by operators like ocsf::cast.

load_file "dns.json"
read_json
ocsf::cast "dns_activity"
replace what="", with=null

By @mavam and @claude in #5696.

Fix intermittent UTF-8 errors in JSON parser

The JSON parser no longer intermittently fails with "The input is not valid UTF-8" when parsing data containing multi-byte UTF-8 characters such as accented letters or emojis.

By @jachris and @claude in #5698.

Fix overzealous constant evaluation in `if` statements

The condition of if statements is no longer erroneously evaluated early when it contains a lambda expression that references runtime fields.

By @jachris in #5701.

Support decompression for Kafka operators

Kafka connectors now support decompressing messages with zstd, lz4 and gzip.

By @raxyte and @claude in #5697.

Where operator optimization for optional fields

The where operator optimization now correctly handles optional fields marked with ?. Previously, the optimizer didn't account for the optional marker, which could result in incorrect query optimization. This fix ensures that optional field accesses are handled properly without affecting the optimization of regular field accesses.

By @jachris and @claude.

View release on GitHub

v5.25.0 New feature 6mo

Notable features

AWS IAM authentication option (aws_iam) added to load_sqs, save_sqs, from_s3, to_s3, from_kafka, and to_kafka operators with credential fields and region support.
Per‑actor/per‑thread memory allocation tracking feature (opt-in).
RFC 6587 octet‑counting support in parse_syslog via `octet_counting` parameter.

Full changelog

This release adds periodic emission to the summarize operator, enabling real-time streaming analytics with configurable intervals and accumulation modes. It also introduces AWS IAM authentication across SQS, S3, and Kafka operators, and fixes memory instability in from_http when used with slow downstream consumers.

🚀 Features

AWS IAM authentication for load_sqs, save_sqs, from_s3, to_s3, from_kafka, and to_kafka

The load_sqs, save_sqs, from_s3, to_s3, from_kafka, and to_kafka operators now support AWS IAM authentication through a new aws_iam option. You can configure explicit credentials, assume IAM roles, use AWS CLI profiles, or rely on the default credential chain.

The aws_iam option accepts these fields:

profile: AWS CLI profile name for credential resolution
access_key_id: AWS access key ID
secret_access_key: AWS secret access key
session_token: AWS session token for temporary credentials
assume_role: IAM role ARN to assume
session_name: Session name for role assumption
external_id: External ID for role assumption

Additionally, the SQS and Kafka operators accept a top-level aws_region option:

For load_sqs and save_sqs: Configures the AWS SDK client region for queue URL resolution
For from_kafka and to_kafka: Required for MSK authentication (used to construct the authentication endpoint URL)

You can also combine explicit credentials with role assumption. This uses the provided credentials to call STS AssumeRole and obtain temporary credentials for the assumed role:

load_sqs "my-queue", aws_iam={
  access_key_id: "AKIAIOSFODNN7EXAMPLE",
  secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
  assume_role: "arn:aws:iam::123456789012:role/my-role"
}

For example, to load from SQS with a specific region:

load_sqs "my-queue", aws_region="us-east-1", aws_iam={
  access_key_id: "AKIAIOSFODNN7EXAMPLE",
  secret_access_key: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
}

To use an AWS CLI profile:

load_sqs "my-queue", aws_iam={
  profile: "production"
}

To assume an IAM role:

from_s3 "s3://bucket/path", aws_iam={
  assume_role: "arn:aws:iam::123456789012:role/my-role",
  session_name: "tenzir-session",
  external_id: "unique-id"
}

For Kafka MSK authentication, the aws_region option is required:

from_kafka "my-topic", aws_region="us-east-1", aws_iam={
  profile: "production"
}

When no explicit credentials or profile are configured, operators use the AWS SDK's default credential provider chain, which checks environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY), AWS configuration files (~/.aws/credentials), EC2/ECS instance metadata, and other standard sources. This applies both when aws_iam is omitted entirely and when aws_iam is specified without access_key_id, secret_access_key, or profile.

By @tobim and @claude in #5675.

Per-actor memory allocation tracking

We have added support for per-actor/per-thread allocation tracking. When enabled, these stats will track which actor (or thread) allocated how much memory. This gives much more detailed insights into where memory is allocated. By default these detailed statistics are not collected, as they introduce a cost to every allocation.

By @IyeOnline in #5646.

Periodic emission for summarize operator

The summarize operator now supports periodic emission of aggregation results at fixed intervals, enabling real-time streaming analytics and monitoring use cases.

Use the options named argument with frequency to emit results every N seconds:

summarize count(this), src_ip, options={frequency: 5s}

This emits aggregation results every 5 seconds, showing the count per source IP for events received during each interval:

{src_ip: 192.168.1.1, count: 42}
{src_ip: 192.168.1.2, count: 17}
// ... 5 seconds later ...
{src_ip: 192.168.1.1, count: 38}
{src_ip: 192.168.1.3, count: 9}

The mode parameter controls how aggregations behave across emissions:

Reset mode (default) resets aggregations after each emission, providing per-interval metrics:

summarize sum(bytes), options={frequency: 10s}
// Shows bytes per 10-second window

Cumulative mode accumulates values across emissions, providing running totals:

summarize sum(bytes), options={frequency: 10s, mode: "cumulative"}
// Shows total bytes seen so far

Update mode emits only when values change from the previous emission, reducing output noise in monitoring scenarios:

summarize count(this), severity, options={frequency: 1s, mode: "update"}
// Emits only when the count for a severity level changes

The operator always emits final results when the input stream ends, ensuring no data is lost.

By @tobim and @claude in #5605.

RFC 6587 octet-counting support for syslog parsing

The parse_syslog function now supports RFC 6587 octet-counted framing, where syslog messages are prefixed with their byte length (for example, 65 <syslog-message>). This framing is commonly used in TCP-based syslog transport to handle message boundaries.

The new octet_counting parameter for parse_syslog offers three modes:

Not specified (default): Auto-detect. The parser strips a length prefix if present and valid, otherwise parses the input as-is. This prevents false positives where input coincidentally starts with digits and a space.
octet_counting=true: Require a length prefix. Emits a warning and returns null if the input lacks a valid prefix.
octet_counting=false: Never strip a length prefix. Parse the input as-is.

By @mavam and @claude.

🔧 Changes

Cleanup of existing directory markers in from_s3 and from_abs

The from_s3 and from_azure_blob_storage operators now also delete existing directory marker objects along the glob path when remove=true. Directory markers are zero-byte objects with keys ending in / that some cloud storage tools create. These artifacts can accumulate over time, increasing API costs and slowing down listing operations.

By @jachris in #5670.

Preserve original field order in ocsf::derive

The ocsf::derive operator now preserves original field order instead of reordering alphabetically. Derived enum/sibling pairs are inserted at the position of the first field, ordered alphabetically within each pair (e.g., activity_id before activity_name). Non-OCSF fields remain at their original positions.

For example, given the input:

{foo: 1, class_uid: 1001}

The output is now:

{foo: 1, class_name: "...", class_uid: 1001}

Previously, the output was alphabetically sorted:

{class_name: "...", class_uid: 1001, foo: 1}

By @mavam and @claude in #5673.

🐞 Bug fixes

Correct multi-partition commits in `from_kafka`

The from_kafka operator now commits offsets per partition and tracks partition EOFs based on the current assignment, preventing premature exits and cross-partition replays after restarts.

By @raxyte and @codex in #5654.

No more directory markers for S3 and Azure

Deleting files from S3 or Azure Blob Storage via from_s3 or from_azure_blob_storage with the remove=true option no longer creates empty directory marker objects in the parent directory when the last file of the directory is deleted.

By @jachris in #5669.

Phantom pipeline entries with empty IDs

In rare cases, a phantom pipeline with an empty ID could appear in the pipeline list that couldn't be deleted through the API.

By @jachris and @claude in #5680.

Stable memory usage for `from_http` server

The from_http server now has a stable memory usage when used with a slow downstream, especially in situations where the client timeouts and retries requests.

By @raxyte in #5677.

View release on GitHub

v5.24.0 Bug fix 6mo

⚠ Upgrade required

Duplicate diagnostics now resurface every 4 hours instead of being permanently suppressed.
Throttle operator rate-limits events (not bytes) with new options: `rate`, `weight`, and `drop`.

Notable features

Parallel operator for executing pipelines across multiple instances (configurable via `jobs`)
XML parsing functions `parse_xml` and specialized `parse_winlog` for Windows Event Logs
Per-pipeline memory consumption metrics via `tenzir.metrics.operator_buffers`

Full changelog

This release adds XML parsing functions (parse_xml and parse_winlog) for analyzing XML-formatted logs including Windows Event Logs. It also introduces the parallel operator for parallel pipeline execution, fixes a socket leak in from_http that could cause resource exhaustion, and includes several stability fixes for gRPC operators and the pipeline API.

🚀 Features

Easy parallel pipeline execution

The parallel operator executes a pipeline across multiple parallel pipeline instances to improve throughput for computationally expensive operations. It automatically distributes input events across the pipeline instances and merges their outputs back into a single stream.

Use the jobs parameter to specify how many pipeline instances to spawn. For example, to parse JSON in parallel across 4 pipeline instances:

from_file "input.ndjson"
read_lines
parallel 4 {
  this = line.parse_json()
}

By @raxyte in #5632.

Per-pipeline memory consumption metrics

The new tenzir.metrics.operator_buffers metrics track the total bytes and events buffered across all operators of a pipeline. The metrics are emitted every second and include:

timestamp: The point in time when the data was recorded
pipeline_id: The pipeline's unique identifier
bytes: Total bytes currently buffered
events: Total events currently buffered (for events only)

Use metrics "operator_buffers" to access these metrics.

By @jachris in #5644.

XML parsing functions for TQL

The new parse_xml and parse_winlog functions parse XML strings into structured records, enabling analysis of XML-formatted logs and data sources.

The parse_xml function offers flexible XML parsing with XPath-based element selection, configurable attribute handling, namespace management, and depth limiting. It supports multiple match results as lists and handles both simple and complex XML structures.

The parse_winlog function specializes in parsing Windows Event Log XML format, automatically finding Event elements and transforming EventData/UserData sections into properly structured fields.

Both functions integrate with Tenzir's multi-series builder for schema inference and type handling.

By @mavam and @claude in #5640 and #5645.

🔧 Changes

Duplicate diagnostics only suppressed for 4 hours

Repeated warnings and errors now resurface every 4 hours instead of being suppressed forever. Previously, once a diagnostic was shown, it would never appear again even if the underlying issue persisted. This change helps users notice recurring problems that may require attention.

By @raxyte and @claude in #5652.

Event-based rate limiting for throttle operator

The throttle operator now rate-limits events instead of bytes. Use the rate option to specify the maximum number of events per window, weight to assign custom per-event weights, and drop to discard excess events instead of waiting. The operator also emits metrics for dropped events.

By @raxyte in #5642.

🐞 Bug fixes

Crashes during gRPC operator shutdown

We fixed bugs in several gRPC-based operators:

A potential crash in from_velociraptor on shutdown.
Potentially not publishing final messages in to_google_cloud_pubsub on shutdown.
A concurrency bug in from_google_cloud_pubsub that could cause a crash.

By @mavam and @claude in #5661.

Error propagation in every and cron operators

The every and cron operators now correctly propagate errors from their subpipelines instead of silently swallowing them.

By @raxyte in #5632.

Fixed `from_kafka` not producing events

We fixed a bug in from_kafka that would cause it to not produce events.

By @IyeOnline in #5659.

Missing events when using `in` with `export`

The export operator incorrectly skipped partitions when evaluating in predicates with uncertain membership. This caused queries like export | where field in [values...] to potentially miss matching events.

By @raxyte in #5660.

Socket leak in `from_http`

The from_http operator sometimes left sockets in CLOSE_WAIT state instead of closing them properly. This could lead to resource exhaustion on long-running nodes receiving many HTTP requests.

By @jachris and @claude in #5647.

Timezone handling in static binary

The format_time and parse_time functions in the static binary now correctly use the operating system's timezone database.

By @tobim and @claude in #5649.

Unresponsive pipeline API

Previously, it was possible for the node to enter a state where the internal pipeline API was no longer responding, thus rendering the platform unresponsive.

By @jachris in #5651.

View release on GitHub

v5.23.1 Bug fix 6mo

Fixed expression evaluation errors for heterogeneous data, crash in operator with comma usage, and ensured graceful shutdown of save_tcp connector.

Full changelog

This release fixes internal errors in expression evaluation for heterogeneous data, resolves a crash in the operator when using , and ensures the connector shuts down gracefully.

🐞 Bug fixes

Assertion failure in deduplicate with count_field

The deduplicate operator with count_field option could cause assertion failures when discarding events.

By @raxyte.

Graceful shutdown for save_tcp connector

The save_tcp connector now gracefully shuts down on pipeline stop and connection failures. Previously, the connector could abort the entire application on exit.

By @raxyte in #5637.

Length mismatch in expression evaluation for heterogeneous data

Expression evaluation could produce a length mismatch when processing heterogeneous data, potentially causing assertion failures. This affected various operations including binary and unary operators, field access, indexing, and aggregation functions.

By @raxyte and @codex.

View release on GitHub

All releases

💥 Breaking changes

$file let-binding for filesystem readers

from_http is now a pure HTTP client

to_kafka defaults to NDJSON-encoded messages

yara requires finite input

Dedicated FTP source and sink operators

OpenSearch ingestion with accept_opensearch

Removed real_time argument from measure

Renamed from_gcs to from_google_cloud_storage

🚀 Features

from_http infers response parsers

Add auto_fill option to read_csv, read_tsv, read_ssv, and read_xsv

CloudWatch Logs operators

Dedicated TCP source and sink operators

DNS result caching in dns_lookup

Event throughput metrics for the new executor

from_amqp queue arguments

HEC metadata and raw endpoint support in to_splunk

HEC queue selection in to_splunk

High-level filesystem and object store writers

Internal memory size function

Keyed routing and source mode for parallel

Keyed subpipeline routing with group

Live packet capture with from_nic

Memory-mapped reads in from_file

Microsoft Graph source operator

MySQL source operator

NATS JetStream operators

OData pagination for from_http

Parse TQL records with read_tql

Per-event subpipelines with each

Prometheus shape for metrics

Raw byte output with write_all

Read from standard input with from_stdin

Repeat string function

Request records for from_http pagination

Send events to webhooks with to_http

SQS receive controls

Stream pipeline output with serve_http

Synthetic event generation with anonymize

TQL match statements

Uncompressed Feather output

🔧 Changes

Add accept_http operator for receiving HTTP requests

Per-schema buffering and default timeout for batch

Preserve categorical order in chart_bar

Region derivation and endpoint logging for SQS

🐞 Bug fixes

Compaction resolves package UDOs at startup

Faster drop_null_fields on heterogeneous data

Reduce disk I/O of time-based compaction

SentinelOne Data Lake sink support in the new executor

Top-level package metadata

🚀 Features

OCSF enum list derivation

🔧 Changes

Lossless int64/uint64 merging during parsing

🐞 Bug fixes

Empty if branches in the new executor

🐞 Bug fixes

Configured pipelines with package operators

Slash-delimited UDO defaults

🐞 Bug fixes

ClickHouse TLS mismatch diagnostics

Retention for mixed-age metrics partitions

🚀 Features

NATS JetStream operators

🐞 Bug fixes

Static musl builds no longer crash on deep TQL expressions

🚀 Features

OData pagination for from_http

🚀 Features

IP address support in subnet

🐞 Bug fixes

Crash fix for deep left-associated where expressions

Fixed unbounded memory growth context::enrich

Large unroll output stability

Recursive files traversal of unreadable directories

Unsigned integer indexing in TQL

`$file` let-binding for filesystem readers

`from_http` is now a pure HTTP client

`to_kafka` defaults to NDJSON-encoded messages

`yara` requires finite input

OpenSearch ingestion with `accept_opensearch`

Removed `real_time` argument from `measure`

Renamed `from_gcs` to `from_google_cloud_storage`

`from_http` infers response parsers

Add `auto_fill` option to `read_csv`, `read_tsv`, `read_ssv`, and `read_xsv`

DNS result caching in `dns_lookup`

HEC metadata and raw endpoint support in `to_splunk`

HEC queue selection in `to_splunk`

Keyed routing and source mode for `parallel`

Keyed subpipeline routing with `group`

Live packet capture with `from_nic`

Memory-mapped reads in `from_file`

Parse TQL records with `read_tql`

Per-event subpipelines with `each`

Prometheus shape for `metrics`

Read from standard input with `from_stdin`

Request records for `from_http` pagination

Send events to webhooks with `to_http`

Stream pipeline output with `serve_http`

Synthetic event generation with `anonymize`

Add `accept_http` operator for receiving HTTP requests

Per-schema buffering and default timeout for `batch`

Preserve categorical order in `chart_bar`

Fixed unbounded memory growth `context::enrich`

Add `accept_http` operator for receiving HTTP requests

Unified context lookups with `context::lookup` operator

Fix secret comparison bypass in `in` operator fast path

Optimize `in` operator and fix eq/neq null semantics

Fix platform plugin not respecting `certfile` and `keyfile` options

Enhance `sort` function with `desc` and `cmp` parameters