(DEPRECATED) Apache Flink User Mailing List archive.

Iterator Data Sync

Classic

List

Threaded

2 messages Options

Mikhail Pryakhin-2

Iterator Data Sync

Hello Flink community!

I've come across of employing an "Iterator Data Sync"[1] approach to test output from a streaming pipeline. The pipeline consists of a single ProcessFunction which side-outputs some events. I'd like to collect both the primary and the side-output streams in my test. I do so by calling DataStreamUtils#collect[2]. The problem is that the implementation of DataStreamUtils#collect[2] method calls the StreamEnvironment#execute[3] method which makes it impossible to collect output from both streams.

The preferable behaviour would be not to trigger a pipeline execution and leave it to a user.

What do you think about that? I don't mind to submit a PR.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#iterator-data-sink

[2]https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L85

[3]https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L158

Kind Regards,

Mike Pryakhin

smime.p7s (2K) Download Attachment

Andrey Zagrebin-3

Re: Iterator Data Sync

Hi Mikhail,

could you create a JIRA issue to discuss the change?

Best,

Andrey

On Mon, Mar 18, 2019 at 3:10 PM Mikhail Pryakhin <[hidden email]> wrote:

Hello Flink community!

I've come across of employing an "Iterator Data Sync"[1] approach to test output from a streaming pipeline. The pipeline consists of a single ProcessFunction which side-outputs some events. I'd like to collect both the primary and the side-output streams in my test. I do so by calling DataStreamUtils#collect[2]. The problem is that the implementation of DataStreamUtils#collect[2] method calls the StreamEnvironment#execute[3] method which makes it impossible to collect output from both streams.
The preferable behaviour would be not to trigger a pipeline execution and leave it to a user.
What do you think about that? I don't mind to submit a PR.

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/datastream_api.html#iterator-data-sink
[2]https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L85
[3]https://github.com/apache/flink/blob/e07fc39d4bb15dabdedb2eb80b862646de32d82c/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/DataStreamUtils.java#L158

Kind Regards,
Mike Pryakhin