pathway

Streaming & Message Queues

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG

Track releases GitHub Website

Python Latest v0.31.1 · 1mo ago Security brief →

Features

Wide range of connectors (Kafka, GDrive, PostgreSQL, etc.)
Unified batch‑and‑streaming engine with Python API
Scalable Rust execution for incremental computation

Recent releases

View all 20 releases →

No immediate action

v0.31.1 New feature 1mo

Elasticsearch read, ClickHouse write, Iceberg struct, MongoDB BSON, MySQL CDC

Open

Review required

v0.31.0 Breaking risk 2mo

Breaking upgrade

Glue write DateTimeUtc breaking change

Open

v0.30.1 New feature 3mo

Notable features

RabbitMQ Streams connectors (`pw.io.rabbitmq.read` and `pw.io.rabbitmq.write`) supporting JSON, plaintext, raw formats; streaming/static modes; offset recovery; dynamic topics; TLS; AMQP metadata.
`pathway spawn` now supports `--addresses` and `--process-id` for multi‑machine clusters.
`pw.xpacks.llm.parsers.AudioParser` for audio transcription via OpenAI Whisper.

Full changelog

Added

pw.io.rabbitmq.read and pw.io.rabbitmq.write connectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row); start_from parameter ("beginning", "end", or "timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.
pw.io.mssql.read connector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).
pw.io.mssql.write connector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.
pw.io.milvus.write connector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.
pathway spawn now supports the --addresses and --process-id flags for multi-machine deployments. Pass a comma-separated list of host:port addresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.
pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.
pw.io.leann.write connector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.
pw.iterate now supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.

View release on GitHub

v0.30.0 Breaking risk 4mo

Breaking changes

pw.io.mongodb.write/read now serialize/deserialize np.ndarray columns as nested BSON arrays preserving shape (previously flattened).
Dependencies for pw.io.pyfilesystem.read are no longer included in the default package; install with `pip install pathway[pyfilesystem]`.

Notable features

`pw.io.mongodb.read` connector added – provides full snapshot and real‑time change stream.
`pw.io.postgres.read` connector added – reads directly from PostgreSQL WAL.
`pw.io.postgres.read/write` now support serialization/deserialization of np.ndarray, homogeneous tuple, and list via Postgres ARRAY.

Full changelog

Added

pw.io.mongodb.read connector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.
pw.io.postgres.read connector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).
pw.io.postgres.write and pw.io.postgres.read now support serialization/deserialization of np.ndarray (int/float elements), homogeneous tuple and list (via Postgres ARRAY; multidimensional rectangular arrays supported).
pw.io.airbyte.read now accepts a dependency_overrides parameter, allowing users to pin specific versions of transitive dependencies (e.g. airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.

Changed

BREAKING: pw.io.mongodb.write and pw.io.mongodb.read now serialize and deserialize np.ndarray columns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected.
BREAKING: The dependencies for pw.io.pyfilesystem.read are no longer included in the default package installation. To install them, please use pip install pathway[pyfilesystem].
Asynchronous callback for pw.io.python.write is now available as pw.io.OnChangeCallbackAsync.
pw.run and pw.run_all now have the event_loop parameter to support reusing async state across multiple graph runs.

Fixed

pathway web-dashboard now waits for the metrics database to be created instead of terminating instantly.

View release on GitHub

v0.29.1 New feature 5mo

⚠ Upgrade required

`pw.io.postgres.write` properly supports TLS configuration through `sslmode` and `sslrootcert` connection string parameters
Worker autoscaling requires persistence to be enabled; configure via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`

Notable features

`pw.io.kafka.read`/`.write` now support OAUTHBEARER authentication
`pw.io.mongodb.write` introduces `output_table_type` with `snapshot` mode (maintains current state using `_id`) and retains `stream_of_changes` default
Workers can automatically scale up/down based on pipeline load via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`

Full changelog

Added

pw.io.kafka.read and pw.io.kafka.write connectors now support OAUTHBEARER authentication.
pw.io.mongodb.write connector now supports an output_table_type parameter with two modes: stream_of_changes (default) and snapshot. In snapshot mode, the connector maintains the current state of the Pathway table in MongoDB using the _id field as the primary key, while stream_of_changes preserves the existing behavior by writing all events with time and diff flags to reflect transactional minibatches and the nature of each change.
Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via worker_scaling_enabled and workload_tracking_window_ms in pw.persistence.Config. Please refer to the tutorial for more details.
pw.io.postgres.write now properly supports TLS configuration via sslmode and sslrootcert connection string parameters.