Skip to content

Release history

pathway releases

All releases

19 shown

Review required
v0.31.0 Breaking risk
Breaking upgrade

Glue write DateTimeUtc breaking change

v0.30.1 New feature
Notable features
  • RabbitMQ Streams connectors (`pw.io.rabbitmq.read` and `pw.io.rabbitmq.write`) supporting JSON, plaintext, raw formats; streaming/static modes; offset recovery; dynamic topics; TLS; AMQP metadata.
  • `pathway spawn` now supports `--addresses` and `--process-id` for multi‑machine clusters.
  • `pw.xpacks.llm.parsers.AudioParser` for audio transcription via OpenAI Whisper.
Full changelog

Added

  • pw.io.rabbitmq.read and pw.io.rabbitmq.write connectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row); start_from parameter ("beginning", "end", or "timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.
  • pw.io.mssql.read connector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).
  • pw.io.mssql.write connector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.
  • pw.io.milvus.write connector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.
  • pathway spawn now supports the --addresses and --process-id flags for multi-machine deployments. Pass a comma-separated list of host:port addresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.
  • pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.
  • pw.io.leann.write connector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.
  • pw.iterate now supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.
v0.30.0 Breaking risk
Breaking changes
  • pw.io.mongodb.write/read now serialize/deserialize np.ndarray columns as nested BSON arrays preserving shape (previously flattened).
  • Dependencies for pw.io.pyfilesystem.read are no longer included in the default package; install with `pip install pathway[pyfilesystem]`.
Notable features
  • `pw.io.mongodb.read` connector added – provides full snapshot and real‑time change stream.
  • `pw.io.postgres.read` connector added – reads directly from PostgreSQL WAL.
  • `pw.io.postgres.read/write` now support serialization/deserialization of np.ndarray, homogeneous tuple, and list via Postgres ARRAY.
Full changelog

Added

  • pw.io.mongodb.read connector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.
  • pw.io.postgres.read connector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).
  • pw.io.postgres.write and pw.io.postgres.read now support serialization/deserialization of np.ndarray (int/float elements), homogeneous tuple and list (via Postgres ARRAY; multidimensional rectangular arrays supported).
  • pw.io.airbyte.read now accepts a dependency_overrides parameter, allowing users to pin specific versions of transitive dependencies (e.g. airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.

Changed

  • BREAKING: pw.io.mongodb.write and pw.io.mongodb.read now serialize and deserialize np.ndarray columns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected.
  • BREAKING: The dependencies for pw.io.pyfilesystem.read are no longer included in the default package installation. To install them, please use pip install pathway[pyfilesystem].
  • Asynchronous callback for pw.io.python.write is now available as pw.io.OnChangeCallbackAsync.
  • pw.run and pw.run_all now have the event_loop parameter to support reusing async state across multiple graph runs.

Fixed

  • pathway web-dashboard now waits for the metrics database to be created instead of terminating instantly.
v0.29.1 New feature
⚠ Upgrade required
  • `pw.io.postgres.write` properly supports TLS configuration through `sslmode` and `sslrootcert` connection string parameters
  • Worker autoscaling requires persistence to be enabled; configure via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Notable features
  • `pw.io.kafka.read`/`.write` now support OAUTHBEARER authentication
  • `pw.io.mongodb.write` introduces `output_table_type` with `snapshot` mode (maintains current state using `_id`) and retains `stream_of_changes` default
  • Workers can automatically scale up/down based on pipeline load via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Full changelog

Added

  • pw.io.kafka.read and pw.io.kafka.write connectors now support OAUTHBEARER authentication.
  • pw.io.mongodb.write connector now supports an output_table_type parameter with two modes: stream_of_changes (default) and snapshot. In snapshot mode, the connector maintains the current state of the Pathway table in MongoDB using the _id field as the primary key, while stream_of_changes preserves the existing behavior by writing all events with time and diff flags to reflect transactional minibatches and the nature of each change.
  • Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via worker_scaling_enabled and workload_tracking_window_ms in pw.persistence.Config. Please refer to the tutorial for more details.
  • pw.io.postgres.write now properly supports TLS configuration via sslmode and sslrootcert connection string parameters.

Changed

  • pw.xpacks.connectors.read now retries initial connection requests.
v0.29.0 Breaking risk
⚠ Upgrade required
  • Conditional import of Python dependencies based on usage; ensure required packages are installed if using related capabilities.
Breaking changes
  • Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS; None is serialized as an empty header in Kafka and as the literal string "None" in NATS.
Notable features
  • Pathway Web Dashboard for real‑time pipeline monitoring with interactive graph plotting, latency, and memory metrics
  • pw.io.kafka.read now includes message headers in top‑level metadata `headers` array (base64‑encoded values)
  • Native AWS Bedrock chat integration via pw.xpacks.llm.llms.BedrockChat supporting multiple models
Full changelog

Added

  • Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
  • pw.io.kafka.read now includes message headers in the parsed metadata. The headers are available at the top level of the metadata in the headers array. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.
  • pw.xpacks.llm.llms.BedrockChat - Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.
  • pw.xpacks.llm.embedders.BedrockEmbedder - Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.

Changed

  • Most Python dependencies are now imported only if the related capabilities are used by a program.
  • BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The None value is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string "None".
v0.28.0 Breaking risk
Breaking changes
  • `pw.Json.__str__` and `dumps` methods no longer enforce result to be an ASCII string.
Notable features
  • Connector groups support idle duration exclusion
  • Source priorities within connector groups for lag control
  • Connector groups usable in multiprocess runs
Full changelog

Added

  • pw.io.kafka.read and pw.io.redpanda.read now allow each schema field to be specified as coming from either the message key or the message value.
  • Connector groups now support the specification of an idle duration. When this is set, if a source does not provide any data for the specified period of time, it will be excluded from the group until it produces data again.
  • It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
  • Connector groups can now be used in the multiprocess runs.

Changed

  • BREAKING: The __str__ and dumps methods in pw.Json no longer enforce the result to be an ASCII string. This way, the behavior of pw.debug.compute_and_print is now consistent with other output connectors.
  • The window functions now internally use deterministic UDFs, where possible.
v0.27.1 New feature
Notable features
  • pw.Table.filter_out_results_of_forgetting method added to revert forgetting effects
  • MCP server `tool` method now accepts optional `description` with docstring default
  • `pw.io.kafka.read` and `pw.io.redpanda.read` create a `key` column from message keys
Full changelog

[0.27.1] - 2025-12-08

Added

  • pw.Table.filter_out_results_of_forgetting method, allowing to revert the effects of forgetting at a later stage.

Changed

  • The MCP server tool method now allows to pass an optional description, default value ​​being kept as the handler's docstring.
  • pw.io.kafka.read and pw.io.redpanda.read now create a key column storing the contents of the message keys.
v0.27.0 Breaking risk
Breaking changes
  • Iceberg connector APIs `pw.io.iceberg.read` and `pw.io.iceberg.write` now require a mandatory `catalog` parameter (type `RestCatalog` or `GlueCatalog`).
  • `paddlepaddle` is no longer a dependency of the Pathway package; install separately if needed.
Notable features
  • JetStream extension supported in NATS read and write connectors.
  • Iceberg connectors now support Glue as a catalog backend.
  • New `Table.add_update_timestamp_utc` function for tracking row update times.
Full changelog

Added

  • JetStream extension is now supported in both NATS read and write connectors.
  • The Iceberg connectors now support Glue as a catalog backend.
  • New Table.add_update_timestamp_utc function for tracking update time of rows in the table

Changed

  • BREAKING The API for the Iceberg connectors has changed. The catalog parameter is now required in both pw.io.iceberg.read and pw.io.iceberg.write. This parameter can be either of type pw.io.iceberg.RestCatalog or pw.io.iceberg.GlueCatalog, and it must contain the connection parameters.
  • BREAKING paddlepaddle is no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To install paddlepaddle follow instructions on https://www.paddlepaddle.org.cn/en/install/quick.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.

Fixed

  • Endpoints created by pw.io.http.rest_connector now accept requests both with and without a trailing slash. For example, /endpoint/ and /endpoint are now treated equivalently.
  • Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
  • Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
  • Fixed unique name autogeneration for the Python connectors.
v0.26.4 Breaking risk
⚠ Upgrade required
  • Asynchronous UDFs for API‑based LLM and embedding models default to `pw.udfs.ExponentialRetryStrategy()`
  • `pw.io.deltalake.read` accepts `start_from_timestamp_ms` to replay history from a given timestamp atomically
Breaking changes
  • `pw.io.postgres.write_snapshot` method deprecated
Notable features
  • New Qdrant external integration
  • `pw.io.mysql.write` supports stream of changes and realtime-updated data snapshot
  • `pw.io.postgres.write` now supports two output table types via `output_table_type` parameter
Full changelog

Added

  • New external integration with Qdrant.
  • pw.io.mysql.write method for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.

Changed

  • pw.io.deltalake.read now accepts the start_from_timestamp_ms parameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.
  • Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to pw.udfs.ExponentialRetryStrategy()
  • pw.io.postgres.write method now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with the output_table_type parameter.
  • pw.io.postgres.write_snapshot method has been deprecated.
v0.26.3 Feature
Notable features
  • New parser PaddleOCRParser supporting PDF, PPTX, and image parsing
Full changelog

Added

  • New parser pathway.xpacks.llm.parsers.PaddleOCRParser supporting parsing of PDF, PPTX and images.
v0.26.2 Bug fix
⚠ Upgrade required
  • Background operator snapshot compression limited to max(snapshot_interval, 30 minutes) when using S3 or Azure backends
  • Google Drive input connector performance improved for deeply nested directories
  • MCP server `tool` method now accepts optional data fields: `title`, `output_schema`, `annotations`, and `meta`
Notable features
  • `pw.io.gdrive.read` supports `"only_metadata"` format for metadata‑only updates
  • Detailed metrics export to SQLite via `PATHWAY_DETAILED_METRICS_DIR` or `pw.set_monitoring_config()`
  • `pw.io.kinesis.read` and `pw.io.kinesis.write` methods added
Full changelog

Added

  • pw.io.gdrive.read now supports the "only_metadata" format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading object contents.
  • Detailed metrics can now be exported to SQLite. Enable this feature using the environment variable PATHWAY_DETAILED_METRICS_DIR or via pw.set_monitoring_config().
  • pw.io.kinesis.read and pw.io.kinesis.write methods for reading from and writing to AWS Kinesis.

Fixed

  • A bug leading to potentially unbounded memory consumption that could occur in Table.forget and Table.sort operators during multi-worker runs has been fixed.
  • Improved memory efficiency during cold starts by compacting intermediary structures and reducing retained memory after backfilling.

Changed

  • The frequency of background operator snapshot compression in data persistence is limited to the greater of the user-defined snapshot_interval or 30 minutes when S3 or Azure is used as the backend, in order to avoid frequent calls to potentially expensive operations.
  • The Google Drive input connector performance has been improved, especially when handling directories with many nested subdirectories.
  • The MCP server tool method now allows to pass the optional data title, output_schema, annotations and meta to inform the LLM client.
  • Relaxed boto3 dependency to <2.0.0.
v0.26.1 Breaking risk
Breaking changes
  • Removed `optimize_transaction_log` option from `pw.io.deltalake.TableOptimizer`.
Notable features
  • `pw.Table.forget` to remove old entries by event time
  • `pw.Table.buffer` stateful buffering operator delaying entries until `time_column <= max(time_column) - threshold`
  • `pw.Table.ignore_late` filters out old entries by event time
Full changelog

Added

  • pw.Table.forget to remove old (in terms of event time) entries from the pipeline.
  • pw.Table.buffer, a stateful buffering operator that delays entries until time_column <= max(time_column) - threshold condition is met.
  • pw.Table.ignore_late to filter out old (in terms of event time) entries.
  • Rows batching for async UDFs. It can be enabled with max_batch_size parameter.

Changed

  • pw.io.subscribe and pw.io.python.write now work with async callbacks.
  • The diff column in tables automatically created by pw.io.postgres.write and pw.io.postgres.write_snapshot in replace and create_if_not_exists initialization modes now uses the smallint type.
  • optimize_transaction_log option has been removed from pw.io.deltalake.TableOptimizer.

Fixed

  • pw.io.postgres.write and pw.io.postgres.write_snapshot now respect the type optionality defined in the Pathway table schema when creating a new PostgreSQL table. This applies to the replace and create_if_not_exists initialization modes.
v0.26.0 Breaking risk
Breaking changes
  • Optimized implementation of pw.reducers.min, max, argmin, argmax, any for append-only tables – persisted state must be recomputed.
  • Optimized implementation of pw.reducers.sum on float and np.ndarray columns – persisted state must be recomputed.
  • Optimized data persistence for many small objects in filesystem and S3 connectors – persisted state must be recomputed.
Notable features
  • `path_filter` parameter added to pw.io.s3.read and pw.io.minio.read enabling post‑filtering with wildcard patterns.
  • Backpressure control via `max_backlog_size` added to input connectors limiting read events per connector.
  • `pw.reducers.count_distinct` and `pw.reducers.count_distinct_approximate` introduced for distinct element counting with adjustable precision.
Full changelog

Added

  • path_filter parameter in pw.io.s3.read and pw.io.minio.read functions. It enables post-filtering of object paths using a wildcard pattern (*, ?), allowing exclusion of paths that pass the main path filter but do not match path_filter.
  • Input connectors now support backpressure control via max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates.
  • pw.reducers.count_distinct and pw.reducers.count_distinct_approximate to count the number of distinct elements in a table. The pw.reducers.count_distinct_approximate allows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using the precision parameter.
  • pw.Table.join (and its variants) now has two additional parameters - left_exactly_once and right_exactly_once. If the elements from a side of a join should be joined exactly once, *_exactly_once parameter of the side can be set to True. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.

Changed

  • Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
  • Improved initialization speed of pw.io.s3.read and pw.io.minio.read.
  • pw.io.s3.read and pw.io.minio.read now limit the number and the total size of objects to be predownloaded.
  • BREAKING optimized the implementation of pw.reducers.min, pw.reducers.max, pw.reducers.argmin, pw.reducers.argmax, pw.reducers.any reducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
  • BREAKING optimized the implementation of pw.reducers.sum reducer on float and np.ndarray columns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed.
  • BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
  • BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
  • Improved precision of pw.reducers.sum on float columns by introducing Neumeier summation.
v0.25.1 New feature
Notable features
  • Added `pw.xpacks.llm.mcp_server.PathwayMcp` to serve DocumentStore and question_answering endpoints as MCP tools.
  • Added `pw.io.dynamodb.write` method for writing to DynamoDB.
Full changelog

Added

  • pw.xpacks.llm.mcp_server.PathwayMcp that allows serving pw.xpacks.llm.document_store.DocumentStore and pw.xpacks.llm.question_answering endpoints as MCP (Model Context Protocol) tools.
  • pw.io.dynamodb.write method for writing to Dynamo DB.
v0.24.0 Breaking risk
⚠ Upgrade required
  • `pw.io.kafka.read_from_upstash` has been removed; migrate to an alternative Kafka client as Upstash's managed service is deprecated.
Breaking changes
  • Arguments `api_key` and `base_url` for `pw.xpacks.llm.llms.OpenAIChat` can no longer be set in the `__call__` method; they must be provided to the constructor.
  • Argument `api_key` for `pw.xpacks.llm.llms.OpenAIEmbedder` can no longer be set in the `__call__` method; it must be provided to the constructor.
Notable features
  • `pw.io.mqtt.read` and `pw.io.mqtt.write` methods added for MQTT interaction.
  • `pw.xpacks.llm.embedders.SentenceTransformerEmbedder` and `pw.xpacks.llm.llms.HFPipelineChat` now support batch computation with configurable `max_batch_size`.
Full changelog

Added

  • pw.io.mqtt.read and pw.io.mqtt.write methods for reading from and writing to MQTT.

Changed

  • pw.xpacks.llm.embedders.SentenceTransformerEmbedder and pw.xpacks.llm.llms.HFPipelineChat are now computed in batches. The maximum size of a single batch can be set in the constructor with the argument max_batch_size.
  • BREAKING Arguments api_key and base_url for pw.xpacks.llm.llms.OpenAIChat can no longer be set in the __call__ method, and instead, if needed, should be set in the constructor.
  • BREAKING Argument api_key for pw.xpacks.llm.llms.OpenAIEmbedder can no longer be set in the __call__ method, and instead, if needed, should be set in the constructor.
  • pw.io.postgres.write now accepts arbitrary types for the values of the postgres_settings dict. If a value is not a string, Python's str() method will be used.

Removed

  • pw.io.kafka.read_from_upstash has been removed, as the managed Kafka service in Upstash has been deprecated.
v0.24.1 New feature
Notable features
  • Confluent Schema Registry support in Kafka and Redpanda input and output connectors
Full changelog

Added

  • Confluent Schema Registry support in Kafka and Redpanda input and output connectors.

Changed

  • pw.io.airbyte.read will now retry the pip install command if it fails during the installation of a connector. It only applies when using the PyPI version of the connector, not the Docker one.
v0.25.0 Breaking risk
Breaking changes
  • Elasticsearch and BigQuery connectors now require the Scale license tier (available free at https://pathway.com/get-license).
  • `pw.io.fs.read` no longer accepts `format="raw"`; use `binary`, `plaintext_by_file`, or `plaintext`.
  • The `pw.io.s3_csv.read` connector has been removed; replace with `pw.io.s3.read` using `format="csv"`.
Notable features
  • `pw.io.questdb.write` method added for writing to Quest DB.
  • `pw.io.fs.read` now supports the `"only_metadata"` format, returning only metadata updates without reading file contents.
Full changelog

Added

  • pw.io.questdb.write method for writing to Quest DB.
  • pw.io.fs.read now supports the "only_metadata" format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading file contents.

Changed

  • BREAKING The Elasticsearch and BigQuery connectors have been moved to the Scale license tier. You can obtain the Scale tier license for free at https://pathway.com/get-license.
  • BREAKING pw.io.fs.read no longer accepts format="raw". Use format="binary" to read binary objects, format="plaintext_by_file" to read plaintext objects per file, or format="plaintext" to read plaintext objects split into lines.
  • BREAKING The pw.io.s3_csv.read connector has been removed. Please use pw.io.s3.read with format="csv" instead.

Fixed

  • pw.io.s3.read and pw.io.s3.write now also check the AWS_PROFILE environment variable for AWS credentials if none are explicitly provided.
v0.23.0 Breaking risk
Breaking changes
  • Installation requirement changed: `pw.sql` now requires installing `pathway[sql]`.
Full changelog

Changed

  • BREAKING: To use pw.sql you now have to install pathway[sql].

Fixed

  • pw.io.deltalake.read now correctly reads data from partitioned tables in all cases.
  • Added retries for all cloud-based persistence backend operations to improve reliability.
v0.22.0 Breaking risk
Breaking changes
  • Creating `pw.DateTimeUtc` now obligatorily requires time zone information.
  • Passing time zone information when creating `pw.DateTimeNaive` is no longer allowed.
Notable features
  • Data persistence can be configured to use Azure Blob Storage via `pw.persistence.Backend.azure`.
  • UDFs support batching with the new `max_batch_size` argument.
Full changelog

Added

  • Data persistence can now be configured to use Azure Blob Storage as a backend. An Azure backend instance can be created using pw.persistence.Backend.azure and included in the persistence config.
  • Added batching to UDFs. It is now possible to make UDFs operate on batches of data instead of single rows. To do so max_batch_size argument has to be set.

Changed

  • BREAKING: when creating pw.DateTimeUtc it is now obligatory to pass the time zone information.
  • BREAKING: when creating pw.DateTimeNaive passing time zone information is not allowed.
  • BREAKING: expressions are now evaluated in batches. Generally, it speeds up the computations but might increase the memory usage if the intermediate state in the expressions is large.

Fixed

  • Synchronization groups now correctly handle cases where the source file-like object is updated during the reading process.

Beta — feedback welcome: [email protected]