Iterator Data Sync

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Iterator Data Sync

Mikhail Pryakhin-2
Hello Flink community!

I've come across of employing an "Iterator Data Sync"[1] approach to test output from a streaming pipeline. The pipeline consists of a single ProcessFunction which side-outputs some events. I'd like to collect both the primary and the side-output streams in my test. I do so by calling DataStreamUtils#collect[2]. The problem is that the implementation of DataStreamUtils#collect[2] method calls the StreamEnvironment#execute[3] method which makes it impossible to collect output from both streams. 
The preferable behaviour would be not to trigger a pipeline execution and leave it to a user. 
What do you think about that? I don't mind to submit a PR.


Kind Regards,
Mike Pryakhin


smime.p7s (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Iterator Data Sync

Andrey Zagrebin-3
Hi Mikhail,

could you create a JIRA issue to discuss the change?

Best,
Andrey

On Mon, Mar 18, 2019 at 3:10 PM Mikhail Pryakhin <[hidden email]> wrote:
Hello Flink community!

I've come across of employing an "Iterator Data Sync"[1] approach to test output from a streaming pipeline. The pipeline consists of a single ProcessFunction which side-outputs some events. I'd like to collect both the primary and the side-output streams in my test. I do so by calling DataStreamUtils#collect[2]. The problem is that the implementation of DataStreamUtils#collect[2] method calls the StreamEnvironment#execute[3] method which makes it impossible to collect output from both streams. 
The preferable behaviour would be not to trigger a pipeline execution and leave it to a user. 
What do you think about that? I don't mind to submit a PR.


Kind Regards,
Mike Pryakhin