Release history
pathway releases
All releases
19 shown
v0.30.1
New feature
Notable features
- RabbitMQ Streams connectors (`pw.io.rabbitmq.read` and `pw.io.rabbitmq.write`) supporting JSON, plaintext, raw formats; streaming/static modes; offset recovery; dynamic topics; TLS; AMQP metadata.
- `pathway spawn` now supports `--addresses` and `--process-id` for multi‑machine clusters.
- `pw.xpacks.llm.parsers.AudioParser` for audio transcription via OpenAI Whisper.
Full changelog
Added
pw.io.rabbitmq.readandpw.io.rabbitmq.writeconnectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row);start_fromparameter ("beginning","end", or"timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.pw.io.mssql.readconnector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).pw.io.mssql.writeconnector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.pw.io.milvus.writeconnector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.pathway spawnnow supports the--addressesand--process-idflags for multi-machine deployments. Pass a comma-separated list ofhost:portaddresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.pw.io.leann.writeconnector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.pw.iteratenow supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.
v0.30.0
Breaking risk
Breaking changes
- pw.io.mongodb.write/read now serialize/deserialize np.ndarray columns as nested BSON arrays preserving shape (previously flattened).
- Dependencies for pw.io.pyfilesystem.read are no longer included in the default package; install with `pip install pathway[pyfilesystem]`.
Notable features
- `pw.io.mongodb.read` connector added – provides full snapshot and real‑time change stream.
- `pw.io.postgres.read` connector added – reads directly from PostgreSQL WAL.
- `pw.io.postgres.read/write` now support serialization/deserialization of np.ndarray, homogeneous tuple, and list via Postgres ARRAY.
Full changelog
Added
pw.io.mongodb.readconnector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.pw.io.postgres.readconnector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).pw.io.postgres.writeandpw.io.postgres.readnow support serialization/deserialization ofnp.ndarray(int/floatelements), homogeneoustupleandlist(via PostgresARRAY; multidimensional rectangular arrays supported).pw.io.airbyte.readnow accepts adependency_overridesparameter, allowing users to pin specific versions of transitive dependencies (e.g.airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.
Changed
- BREAKING:
pw.io.mongodb.writeandpw.io.mongodb.readnow serialize and deserializenp.ndarraycolumns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected. - BREAKING: The dependencies for
pw.io.pyfilesystem.readare no longer included in the default package installation. To install them, please usepip install pathway[pyfilesystem]. - Asynchronous callback for
pw.io.python.writeis now available aspw.io.OnChangeCallbackAsync. pw.runandpw.run_allnow have theevent_loopparameter to support reusing async state across multiple graph runs.
Fixed
pathway web-dashboardnow waits for the metrics database to be created instead of terminating instantly.
v0.29.1
New feature
⚠ Upgrade required
- `pw.io.postgres.write` properly supports TLS configuration through `sslmode` and `sslrootcert` connection string parameters
- Worker autoscaling requires persistence to be enabled; configure via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Notable features
- `pw.io.kafka.read`/`.write` now support OAUTHBEARER authentication
- `pw.io.mongodb.write` introduces `output_table_type` with `snapshot` mode (maintains current state using `_id`) and retains `stream_of_changes` default
- Workers can automatically scale up/down based on pipeline load via `worker_scaling_enabled` and `workload_tracking_window_ms` in `pw.persistence.Config`
Full changelog
Added
pw.io.kafka.readandpw.io.kafka.writeconnectors now support OAUTHBEARER authentication.pw.io.mongodb.writeconnector now supports anoutput_table_typeparameter with two modes:stream_of_changes(default) andsnapshot. Insnapshotmode, the connector maintains the current state of the Pathway table in MongoDB using the_idfield as the primary key, whilestream_of_changespreserves the existing behavior by writing all events withtimeanddiffflags to reflect transactional minibatches and the nature of each change.- Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via
worker_scaling_enabledandworkload_tracking_window_msinpw.persistence.Config. Please refer to the tutorial for more details. pw.io.postgres.writenow properly supports TLS configuration viasslmodeandsslrootcertconnection string parameters.
Changed
pw.xpacks.connectors.readnow retries initial connection requests.
v0.29.0
Breaking risk
⚠ Upgrade required
- Conditional import of Python dependencies based on usage; ensure required packages are installed if using related capabilities.
Breaking changes
- Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS; None is serialized as an empty header in Kafka and as the literal string "None" in NATS.
Notable features
- Pathway Web Dashboard for real‑time pipeline monitoring with interactive graph plotting, latency, and memory metrics
- pw.io.kafka.read now includes message headers in top‑level metadata `headers` array (base64‑encoded values)
- Native AWS Bedrock chat integration via pw.xpacks.llm.llms.BedrockChat supporting multiple models
Full changelog
Added
- Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
pw.io.kafka.readnow includes message headers in the parsed metadata. The headers are available at the top level of the metadata in theheadersarray. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.pw.xpacks.llm.llms.BedrockChat- Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.pw.xpacks.llm.embedders.BedrockEmbedder- Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.
Changed
- Most Python dependencies are now imported only if the related capabilities are used by a program.
- BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The
Nonevalue is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string"None".
v0.28.0
Breaking risk
Breaking changes
- `pw.Json.__str__` and `dumps` methods no longer enforce result to be an ASCII string.
Notable features
- Connector groups support idle duration exclusion
- Source priorities within connector groups for lag control
- Connector groups usable in multiprocess runs
Full changelog
Added
pw.io.kafka.readandpw.io.redpanda.readnow allow each schema field to be specified as coming from either the message key or the message value.- Connector groups now support the specification of an idle duration. When this is set, if a source does not provide any data for the specified period of time, it will be excluded from the group until it produces data again.
- It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
- Connector groups can now be used in the multiprocess runs.
Changed
- BREAKING: The
__str__anddumpsmethods inpw.Jsonno longer enforce the result to be an ASCII string. This way, the behavior ofpw.debug.compute_and_printis now consistent with other output connectors. - The window functions now internally use deterministic UDFs, where possible.
v0.27.1
New feature
Notable features
- pw.Table.filter_out_results_of_forgetting method added to revert forgetting effects
- MCP server `tool` method now accepts optional `description` with docstring default
- `pw.io.kafka.read` and `pw.io.redpanda.read` create a `key` column from message keys
Full changelog
[0.27.1] - 2025-12-08
Added
pw.Table.filter_out_results_of_forgettingmethod, allowing to revert the effects of forgetting at a later stage.
Changed
- The MCP server
toolmethod now allows to pass an optionaldescription, default value being kept as the handler's docstring. pw.io.kafka.readandpw.io.redpanda.readnow create akeycolumn storing the contents of the message keys.
v0.27.0
Breaking risk
Breaking changes
- Iceberg connector APIs `pw.io.iceberg.read` and `pw.io.iceberg.write` now require a mandatory `catalog` parameter (type `RestCatalog` or `GlueCatalog`).
- `paddlepaddle` is no longer a dependency of the Pathway package; install separately if needed.
Notable features
- JetStream extension supported in NATS read and write connectors.
- Iceberg connectors now support Glue as a catalog backend.
- New `Table.add_update_timestamp_utc` function for tracking row update times.
Full changelog
Added
- JetStream extension is now supported in both NATS read and write connectors.
- The Iceberg connectors now support Glue as a catalog backend.
- New
Table.add_update_timestamp_utcfunction for tracking update time of rows in the table
Changed
- BREAKING The API for the Iceberg connectors has changed. The
catalogparameter is now required in bothpw.io.iceberg.readandpw.io.iceberg.write. This parameter can be either of typepw.io.iceberg.RestCatalogorpw.io.iceberg.GlueCatalog, and it must contain the connection parameters. - BREAKING
paddlepaddleis no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To installpaddlepaddlefollow instructions on https://www.paddlepaddle.org.cn/en/install/quick. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerernow supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.
Fixed
- Endpoints created by
pw.io.http.rest_connectornow accept requests both with and without a trailing slash. For example,/endpoint/and/endpointare now treated equivalently. - Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
- Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
- Fixed unique name autogeneration for the Python connectors.
v0.26.4
Breaking risk
⚠ Upgrade required
- Asynchronous UDFs for API‑based LLM and embedding models default to `pw.udfs.ExponentialRetryStrategy()`
- `pw.io.deltalake.read` accepts `start_from_timestamp_ms` to replay history from a given timestamp atomically
Breaking changes
- `pw.io.postgres.write_snapshot` method deprecated
Notable features
- New Qdrant external integration
- `pw.io.mysql.write` supports stream of changes and realtime-updated data snapshot
- `pw.io.postgres.write` now supports two output table types via `output_table_type` parameter
Full changelog
Added
- New external integration with Qdrant.
pw.io.mysql.writemethod for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.
Changed
pw.io.deltalake.readnow accepts thestart_from_timestamp_msparameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.- Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to
pw.udfs.ExponentialRetryStrategy() pw.io.postgres.writemethod now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with theoutput_table_typeparameter.pw.io.postgres.write_snapshotmethod has been deprecated.
v0.26.3
Feature
Notable features
- New parser PaddleOCRParser supporting PDF, PPTX, and image parsing
Full changelog
Added
- New parser
pathway.xpacks.llm.parsers.PaddleOCRParsersupporting parsing of PDF, PPTX and images.
v0.26.2
Bug fix
⚠ Upgrade required
- Background operator snapshot compression limited to max(snapshot_interval, 30 minutes) when using S3 or Azure backends
- Google Drive input connector performance improved for deeply nested directories
- MCP server `tool` method now accepts optional data fields: `title`, `output_schema`, `annotations`, and `meta`
Notable features
- `pw.io.gdrive.read` supports `"only_metadata"` format for metadata‑only updates
- Detailed metrics export to SQLite via `PATHWAY_DETAILED_METRICS_DIR` or `pw.set_monitoring_config()`
- `pw.io.kinesis.read` and `pw.io.kinesis.write` methods added
Full changelog
Added
pw.io.gdrive.readnow supports the"only_metadata"format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading object contents.- Detailed metrics can now be exported to SQLite. Enable this feature using the environment variable
PATHWAY_DETAILED_METRICS_DIRor viapw.set_monitoring_config(). pw.io.kinesis.readandpw.io.kinesis.writemethods for reading from and writing to AWS Kinesis.
Fixed
- A bug leading to potentially unbounded memory consumption that could occur in
Table.forgetandTable.sortoperators during multi-worker runs has been fixed. - Improved memory efficiency during cold starts by compacting intermediary structures and reducing retained memory after backfilling.
Changed
- The frequency of background operator snapshot compression in data persistence is limited to the greater of the user-defined
snapshot_intervalor 30 minutes when S3 or Azure is used as the backend, in order to avoid frequent calls to potentially expensive operations. - The Google Drive input connector performance has been improved, especially when handling directories with many nested subdirectories.
- The MCP server
toolmethod now allows to pass the optional datatitle,output_schema,annotationsandmetato inform the LLM client. - Relaxed boto3 dependency to <2.0.0.
v0.26.1
Breaking risk
Breaking changes
- Removed `optimize_transaction_log` option from `pw.io.deltalake.TableOptimizer`.
Notable features
- `pw.Table.forget` to remove old entries by event time
- `pw.Table.buffer` stateful buffering operator delaying entries until `time_column <= max(time_column) - threshold`
- `pw.Table.ignore_late` filters out old entries by event time
Full changelog
Added
pw.Table.forgetto remove old (in terms of event time) entries from the pipeline.pw.Table.buffer, a stateful buffering operator that delays entries untiltime_column <= max(time_column) - thresholdcondition is met.pw.Table.ignore_lateto filter out old (in terms of event time) entries.- Rows batching for async UDFs. It can be enabled with
max_batch_sizeparameter.
Changed
pw.io.subscribeandpw.io.python.writenow work with async callbacks.- The
diffcolumn in tables automatically created bypw.io.postgres.writeandpw.io.postgres.write_snapshotinreplaceandcreate_if_not_existsinitialization modes now uses thesmallinttype. optimize_transaction_logoption has been removed frompw.io.deltalake.TableOptimizer.
Fixed
pw.io.postgres.writeandpw.io.postgres.write_snapshotnow respect the type optionality defined in the Pathway table schema when creating a new PostgreSQL table. This applies to thereplaceandcreate_if_not_existsinitialization modes.
v0.26.0
Breaking risk
Breaking changes
- Optimized implementation of pw.reducers.min, max, argmin, argmax, any for append-only tables – persisted state must be recomputed.
- Optimized implementation of pw.reducers.sum on float and np.ndarray columns – persisted state must be recomputed.
- Optimized data persistence for many small objects in filesystem and S3 connectors – persisted state must be recomputed.
Notable features
- `path_filter` parameter added to pw.io.s3.read and pw.io.minio.read enabling post‑filtering with wildcard patterns.
- Backpressure control via `max_backlog_size` added to input connectors limiting read events per connector.
- `pw.reducers.count_distinct` and `pw.reducers.count_distinct_approximate` introduced for distinct element counting with adjustable precision.
Full changelog
Added
path_filterparameter inpw.io.s3.readandpw.io.minio.readfunctions. It enables post-filtering of object paths using a wildcard pattern (*,?), allowing exclusion of paths that pass the mainpathfilter but do not matchpath_filter.- Input connectors now support backpressure control via
max_backlog_size, allowing to limit the number of read events in processing per connector. This is useful when the data source emits a large initial burst followed by smaller, incremental updates. pw.reducers.count_distinctandpw.reducers.count_distinct_approximateto count the number of distinct elements in a table. Thepw.reducers.count_distinct_approximateallows you to save memory by decreasing the accuracy. It is possible to control this tradeoff by using theprecisionparameter.pw.Table.join(and its variants) now has two additional parameters -left_exactly_onceandright_exactly_once. If the elements from a side of a join should be joined exactly once,*_exactly_onceparameter of the side can be set toTrue. Then after getting a match an entry will be removed from the join state and the memory consumption will be reduced.
Changed
- Delta table compression logging has been improved: logs now include table names, and verbose messages have been streamlined while preserving details of important processing steps.
- Improved initialization speed of
pw.io.s3.readandpw.io.minio.read. pw.io.s3.readandpw.io.minio.readnow limit the number and the total size of objects to be predownloaded.- BREAKING optimized the implementation of
pw.reducers.min,pw.reducers.max,pw.reducers.argmin,pw.reducers.argmax,pw.reducers.anyreducers for append-only tables. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING optimized the implementation of
pw.reducers.sumreducer onfloatandnp.ndarraycolumns. It is a breaking change for programs using operator persistence. The persisted state will have to be recomputed. - BREAKING the implementation of data persistence has been optimized for the case of many small objects in filesystem and S3 connectors. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- BREAKING the data snapshot logic in persistence has been optimized for the case of big input snapshots. It is a breaking change for programs using data persistence. The persisted state will have to be recomputed.
- Improved precision of
pw.reducers.sumonfloatcolumns by introducing Neumeier summation.
v0.25.1
New feature
Notable features
- Added `pw.xpacks.llm.mcp_server.PathwayMcp` to serve DocumentStore and question_answering endpoints as MCP tools.
- Added `pw.io.dynamodb.write` method for writing to DynamoDB.
Full changelog
Added
pw.xpacks.llm.mcp_server.PathwayMcpthat allows servingpw.xpacks.llm.document_store.DocumentStoreandpw.xpacks.llm.question_answeringendpoints as MCP (Model Context Protocol) tools.pw.io.dynamodb.writemethod for writing to Dynamo DB.
v0.24.0
Breaking risk
⚠ Upgrade required
- `pw.io.kafka.read_from_upstash` has been removed; migrate to an alternative Kafka client as Upstash's managed service is deprecated.
Breaking changes
- Arguments `api_key` and `base_url` for `pw.xpacks.llm.llms.OpenAIChat` can no longer be set in the `__call__` method; they must be provided to the constructor.
- Argument `api_key` for `pw.xpacks.llm.llms.OpenAIEmbedder` can no longer be set in the `__call__` method; it must be provided to the constructor.
Notable features
- `pw.io.mqtt.read` and `pw.io.mqtt.write` methods added for MQTT interaction.
- `pw.xpacks.llm.embedders.SentenceTransformerEmbedder` and `pw.xpacks.llm.llms.HFPipelineChat` now support batch computation with configurable `max_batch_size`.
Full changelog
Added
pw.io.mqtt.readandpw.io.mqtt.writemethods for reading from and writing to MQTT.
Changed
pw.xpacks.llm.embedders.SentenceTransformerEmbedderandpw.xpacks.llm.llms.HFPipelineChatare now computed in batches. The maximum size of a single batch can be set in the constructor with the argumentmax_batch_size.- BREAKING Arguments
api_keyandbase_urlforpw.xpacks.llm.llms.OpenAIChatcan no longer be set in the__call__method, and instead, if needed, should be set in the constructor. - BREAKING Argument
api_keyforpw.xpacks.llm.llms.OpenAIEmbeddercan no longer be set in the__call__method, and instead, if needed, should be set in the constructor. pw.io.postgres.writenow accepts arbitrary types for the values of thepostgres_settingsdict. If a value is not a string, Python'sstr()method will be used.
Removed
pw.io.kafka.read_from_upstashhas been removed, as the managed Kafka service in Upstash has been deprecated.
v0.24.1
New feature
Notable features
- Confluent Schema Registry support in Kafka and Redpanda input and output connectors
Full changelog
Added
- Confluent Schema Registry support in Kafka and Redpanda input and output connectors.
Changed
pw.io.airbyte.readwill now retry the pip install command if it fails during the installation of a connector. It only applies when using the PyPI version of the connector, not the Docker one.
v0.25.0
Breaking risk
Breaking changes
- Elasticsearch and BigQuery connectors now require the Scale license tier (available free at https://pathway.com/get-license).
- `pw.io.fs.read` no longer accepts `format="raw"`; use `binary`, `plaintext_by_file`, or `plaintext`.
- The `pw.io.s3_csv.read` connector has been removed; replace with `pw.io.s3.read` using `format="csv"`.
Notable features
- `pw.io.questdb.write` method added for writing to Quest DB.
- `pw.io.fs.read` now supports the `"only_metadata"` format, returning only metadata updates without reading file contents.
Full changelog
Added
pw.io.questdb.writemethod for writing to Quest DB.pw.io.fs.readnow supports the"only_metadata"format. When this format is used, the table will contain only metadata updates for the tracked directory, without reading file contents.
Changed
- BREAKING The Elasticsearch and BigQuery connectors have been moved to the Scale license tier. You can obtain the Scale tier license for free at https://pathway.com/get-license.
- BREAKING
pw.io.fs.readno longer acceptsformat="raw". Useformat="binary"to read binary objects,format="plaintext_by_file"to read plaintext objects per file, orformat="plaintext"to read plaintext objects split into lines. - BREAKING The
pw.io.s3_csv.readconnector has been removed. Please usepw.io.s3.readwithformat="csv"instead.
Fixed
pw.io.s3.readandpw.io.s3.writenow also check theAWS_PROFILEenvironment variable for AWS credentials if none are explicitly provided.
v0.23.0
Breaking risk
Breaking changes
- Installation requirement changed: `pw.sql` now requires installing `pathway[sql]`.
Full changelog
Changed
- BREAKING: To use
pw.sqlyou now have to installpathway[sql].
Fixed
pw.io.deltalake.readnow correctly reads data from partitioned tables in all cases.- Added retries for all cloud-based persistence backend operations to improve reliability.
v0.22.0
Breaking risk
Breaking changes
- Creating `pw.DateTimeUtc` now obligatorily requires time zone information.
- Passing time zone information when creating `pw.DateTimeNaive` is no longer allowed.
Notable features
- Data persistence can be configured to use Azure Blob Storage via `pw.persistence.Backend.azure`.
- UDFs support batching with the new `max_batch_size` argument.
Full changelog
Added
- Data persistence can now be configured to use Azure Blob Storage as a backend. An Azure backend instance can be created using
pw.persistence.Backend.azureand included in the persistence config. - Added batching to UDFs. It is now possible to make UDFs operate on batches of data instead of single rows. To do so
max_batch_sizeargument has to be set.
Changed
- BREAKING: when creating
pw.DateTimeUtcit is now obligatory to pass the time zone information. - BREAKING: when creating
pw.DateTimeNaivepassing time zone information is not allowed. - BREAKING: expressions are now evaluated in batches. Generally, it speeds up the computations but might increase the memory usage if the intermediate state in the expressions is large.
Fixed
- Synchronization groups now correctly handle cases where the source file-like object is updated during the reading process.