Skip to content

pathway

Streaming & Message Queues
Python Latest v0.31.0 · 9d ago Security brief →

Features

  • Python ETL framework for stream processing and real‑time analytics
  • Supports batch and streaming with the same codebase
  • Powered by a scalable Rust engine enabling multithreading, multiprocessing, and distributed computation

Recent releases

View all 19 releases →
Review required
v0.31.0 Breaking risk
Breaking upgrade

Glue write DateTimeUtc breaking change

v0.30.1 New feature
Notable features
  • RabbitMQ Streams connectors (`pw.io.rabbitmq.read` and `pw.io.rabbitmq.write`) supporting JSON, plaintext, raw formats; streaming/static modes; offset recovery; dynamic topics; TLS; AMQP metadata.
  • `pathway spawn` now supports `--addresses` and `--process-id` for multi‑machine clusters.
  • `pw.xpacks.llm.parsers.AudioParser` for audio transcription via OpenAI Whisper.
Full changelog

Added

  • pw.io.rabbitmq.read and pw.io.rabbitmq.write connectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row); start_from parameter ("beginning", "end", or "timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.
  • pw.io.mssql.read connector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).
  • pw.io.mssql.write connector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.
  • pw.io.milvus.write connector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.
  • pathway spawn now supports the --addresses and --process-id flags for multi-machine deployments. Pass a comma-separated list of host:port addresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.
  • pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.
  • pw.io.leann.write connector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.
  • pw.iterate now supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.
v0.30.0 Breaking risk
Breaking changes
  • pw.io.mongodb.write/read now serialize/deserialize np.ndarray columns as nested BSON arrays preserving shape (previously flattened).
  • Dependencies for pw.io.pyfilesystem.read are no longer included in the default package; install with `pip install pathway[pyfilesystem]`.
Notable features
  • `pw.io.mongodb.read` connector added – provides full snapshot and real‑time change stream.
  • `pw.io.postgres.read` connector added – reads directly from PostgreSQL WAL.
  • `pw.io.postgres.read/write` now support serialization/deserialization of np.ndarray, homogeneous tuple, and list via Postgres ARRAY.
Full changelog

Added

  • pw.io.mongodb.read connector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.
  • pw.io.postgres.read connector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).
  • pw.io.postgres.write and pw.io.postgres.read now support serialization/deserialization of np.ndarray (int/float elements), homogeneous tuple and list (via Postgres ARRAY; multidimensional rectangular arrays supported).
  • pw.io.airbyte.read now accepts a dependency_overrides parameter, allowing users to pin specific versions of transitive dependencies (e.g. airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.

Changed

  • BREAKING: pw.io.mongodb.write and pw.io.mongodb.read now serialize and deserialize np.ndarray columns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected.
  • BREAKING: The dependencies for pw.io.pyfilesystem.read are no longer included in the default package installation. To install them, please use pip install pathway[pyfilesystem].
  • Asynchronous callback for pw.io.python.write is now available as pw.io.OnChangeCallbackAsync.
  • pw.run and pw.run_all now have the event_loop parameter to support reusing async state across multiple graph runs.

Fixed

  • pathway web-dashboard now waits for the metrics database to be created instead of terminating instantly.
v0.29.1 New feature
⚠ Upgrade required
  • `pw.io.postgres.write` properly supports TLS configuration through `sslmode` and `sslrootcert` connection string parameters
  • Worker autoscaling requires persistence to be enabled; configure via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Notable features
  • `pw.io.kafka.read`/`.write` now support OAUTHBEARER authentication
  • `pw.io.mongodb.write` introduces `output_table_type` with `snapshot` mode (maintains current state using `_id`) and retains `stream_of_changes` default
  • Workers can automatically scale up/down based on pipeline load via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Full changelog

Added

  • pw.io.kafka.read and pw.io.kafka.write connectors now support OAUTHBEARER authentication.
  • pw.io.mongodb.write connector now supports an output_table_type parameter with two modes: stream_of_changes (default) and snapshot. In snapshot mode, the connector maintains the current state of the Pathway table in MongoDB using the _id field as the primary key, while stream_of_changes preserves the existing behavior by writing all events with time and diff flags to reflect transactional minibatches and the nature of each change.
  • Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via worker_scaling_enabled and workload_tracking_window_ms in pw.persistence.Config. Please refer to the tutorial for more details.
  • pw.io.postgres.write now properly supports TLS configuration via sslmode and sslrootcert connection string parameters.

Changed

  • pw.xpacks.connectors.read now retries initial connection requests.
v0.29.0 Breaking risk
⚠ Upgrade required
  • Conditional import of Python dependencies based on usage; ensure required packages are installed if using related capabilities.
Breaking changes
  • Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS; None is serialized as an empty header in Kafka and as the literal string "None" in NATS.
Notable features
  • Pathway Web Dashboard for real‑time pipeline monitoring with interactive graph plotting, latency, and memory metrics
  • pw.io.kafka.read now includes message headers in top‑level metadata `headers` array (base64‑encoded values)
  • Native AWS Bedrock chat integration via pw.xpacks.llm.llms.BedrockChat supporting multiple models
Full changelog

Added

  • Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
  • pw.io.kafka.read now includes message headers in the parsed metadata. The headers are available at the top level of the metadata in the headers array. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.
  • pw.xpacks.llm.llms.BedrockChat - Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.
  • pw.xpacks.llm.embedders.BedrockEmbedder - Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.

Changed

  • Most Python dependencies are now imported only if the related capabilities are used by a program.
  • BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The None value is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string "None".

Weekly OSS security release digest.

The CVE patches and breaking changes that affected production tools this week. One email, every Sunday.

No spam, unsubscribe anytime.

About

Stars
63,112
Forks
1,678
Languages
Python Rust HTML

Install & Platforms

Install via
pip

Community & Support

Beta — feedback welcome: [email protected]