pathway
Streaming & Message QueuesFeatures
- Python ETL framework for stream processing and real‑time analytics
- Supports batch and streaming with the same codebase
- Powered by a scalable Rust engine enabling multithreading, multiprocessing, and distributed computation
Recent releases
View all 19 releases →
v0.30.1
New feature
Notable features
- RabbitMQ Streams connectors (`pw.io.rabbitmq.read` and `pw.io.rabbitmq.write`) supporting JSON, plaintext, raw formats; streaming/static modes; offset recovery; dynamic topics; TLS; AMQP metadata.
- `pathway spawn` now supports `--addresses` and `--process-id` for multi‑machine clusters.
- `pw.xpacks.llm.parsers.AudioParser` for audio transcription via OpenAI Whisper.
Full changelog
Added
pw.io.rabbitmq.readandpw.io.rabbitmq.writeconnectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row);start_fromparameter ("beginning","end", or"timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.pw.io.mssql.readconnector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).pw.io.mssql.writeconnector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.pw.io.milvus.writeconnector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.pathway spawnnow supports the--addressesand--process-idflags for multi-machine deployments. Pass a comma-separated list ofhost:portaddresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.pw.io.leann.writeconnector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.pw.iteratenow supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.
v0.30.0
Breaking risk
Breaking changes
- pw.io.mongodb.write/read now serialize/deserialize np.ndarray columns as nested BSON arrays preserving shape (previously flattened).
- Dependencies for pw.io.pyfilesystem.read are no longer included in the default package; install with `pip install pathway[pyfilesystem]`.
Notable features
- `pw.io.mongodb.read` connector added – provides full snapshot and real‑time change stream.
- `pw.io.postgres.read` connector added – reads directly from PostgreSQL WAL.
- `pw.io.postgres.read/write` now support serialization/deserialization of np.ndarray, homogeneous tuple, and list via Postgres ARRAY.
Full changelog
Added
pw.io.mongodb.readconnector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.pw.io.postgres.readconnector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).pw.io.postgres.writeandpw.io.postgres.readnow support serialization/deserialization ofnp.ndarray(int/floatelements), homogeneoustupleandlist(via PostgresARRAY; multidimensional rectangular arrays supported).pw.io.airbyte.readnow accepts adependency_overridesparameter, allowing users to pin specific versions of transitive dependencies (e.g.airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.
Changed
- BREAKING:
pw.io.mongodb.writeandpw.io.mongodb.readnow serialize and deserializenp.ndarraycolumns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected. - BREAKING: The dependencies for
pw.io.pyfilesystem.readare no longer included in the default package installation. To install them, please usepip install pathway[pyfilesystem]. - Asynchronous callback for
pw.io.python.writeis now available aspw.io.OnChangeCallbackAsync. pw.runandpw.run_allnow have theevent_loopparameter to support reusing async state across multiple graph runs.
Fixed
pathway web-dashboardnow waits for the metrics database to be created instead of terminating instantly.
v0.29.1
New feature
⚠ Upgrade required
- `pw.io.postgres.write` properly supports TLS configuration through `sslmode` and `sslrootcert` connection string parameters
- Worker autoscaling requires persistence to be enabled; configure via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Notable features
- `pw.io.kafka.read`/`.write` now support OAUTHBEARER authentication
- `pw.io.mongodb.write` introduces `output_table_type` with `snapshot` mode (maintains current state using `_id`) and retains `stream_of_changes` default
- Workers can automatically scale up/down based on pipeline load via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Full changelog
Added
pw.io.kafka.readandpw.io.kafka.writeconnectors now support OAUTHBEARER authentication.pw.io.mongodb.writeconnector now supports anoutput_table_typeparameter with two modes:stream_of_changes(default) andsnapshot. Insnapshotmode, the connector maintains the current state of the Pathway table in MongoDB using the_idfield as the primary key, whilestream_of_changespreserves the existing behavior by writing all events withtimeanddiffflags to reflect transactional minibatches and the nature of each change.- Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via
worker_scaling_enabledandworkload_tracking_window_msinpw.persistence.Config. Please refer to the tutorial for more details. pw.io.postgres.writenow properly supports TLS configuration viasslmodeandsslrootcertconnection string parameters.
Changed
pw.xpacks.connectors.readnow retries initial connection requests.
v0.29.0
Breaking risk
⚠ Upgrade required
- Conditional import of Python dependencies based on usage; ensure required packages are installed if using related capabilities.
Breaking changes
- Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS; None is serialized as an empty header in Kafka and as the literal string "None" in NATS.
Notable features
- Pathway Web Dashboard for real‑time pipeline monitoring with interactive graph plotting, latency, and memory metrics
- pw.io.kafka.read now includes message headers in top‑level metadata `headers` array (base64‑encoded values)
- Native AWS Bedrock chat integration via pw.xpacks.llm.llms.BedrockChat supporting multiple models
Full changelog
Added
- Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
pw.io.kafka.readnow includes message headers in the parsed metadata. The headers are available at the top level of the metadata in theheadersarray. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.pw.xpacks.llm.llms.BedrockChat- Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.pw.xpacks.llm.embedders.BedrockEmbedder- Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.
Changed
- Most Python dependencies are now imported only if the related capabilities are used by a program.
- BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The
Nonevalue is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string"None".
Weekly OSS security release digest.
The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.
No spam, unsubscribe anytime.
Install & Platforms
Install via
pip