Hi all,
I'm finding it hard to unit test my Flink application. Are there any guidelines / best practices for unit testing a Flink application, especially for programming for the streaming API with Scala? Having a few good examples would also help a lot. I'm aware of flink-spector[1], and it looks great. Although the API is not ready to be used from Scala yet, I've created my tests with Java and managed to get it working for some very simple cases . But it does still lack documentation / examples, so I'm having trouble using it for most of the unit tests that I would like to create. Thanks, Filipe [1] https://github.com/ottogroup/flink-spector |
Hi! Are you referring to testing streaming programs? What is the main obstacle for you? Generating test data streams? Thanks, Stephan On Thu, Dec 31, 2015 at 12:43 PM, Filipe Correia <[hidden email]> wrote: Hi all, |
In reply to this post by Filipe Correia
Hi,
I'm currently updating and improving the documentation[1] of flink-spector. Regarding missing examples: I'm planning to include small examples showing the behaviour of output matchers, as the documentation already includes several demonstrating how to assemble test cases. Please let me know, if you're having problems with a particular aspect of the framework or the documentation or like to have an example for a special case. I will try to work it into the next version. At the moment the framework is lacking support to tests the scala api. As you've experienced It's possible to create simple cases but you'll soon discover problems with hamcrest and scala types in general. I have written a test environment for the scala api. And it should be easy to provide a nice trait for scalatest and some wrappers around the java classes. What I'm struggling with at the moment, is to utilize the scalatest matchers for output verification. Best, Alex [1]https://github.com/ottogroup/flink-spector/wiki |
In reply to this post by Stephan Ewen
Hi Stephan,
Yes, both generating the datastreams and verifying expectations. Is the recommended way to create custom data sources and data sinks? I've meanwhile started down this road, but still hoping for a better way. Filipe On Thu, Dec 31, 2015 at 3:20 PM, Stephan Ewen <[hidden email]> wrote: > Hi! > > Are you referring to testing streaming programs? > What is the main obstacle for you? Generating test data streams? > > Thanks, > Stephan > > > On Thu, Dec 31, 2015 at 12:43 PM, Filipe Correia <[hidden email]> > wrote: >> >> Hi all, >> >> I'm finding it hard to unit test my Flink application. Are there any >> guidelines / best practices for unit testing a Flink application, >> especially for programming for the streaming API with Scala? >> >> Having a few good examples would also help a lot. >> >> I'm aware of flink-spector[1], and it looks great. Although the API is >> not ready to be used from Scala yet, I've created my tests with Java >> and managed to get it working for some very simple cases . But it does >> still lack documentation / examples, so I'm having trouble using it >> for most of the unit tests that I would like to create. >> >> Thanks, >> >> Filipe >> >> [1] https://github.com/ottogroup/flink-spector > > |
In reply to this post by lofifnc
Hello Alex,
Thanks for the reply. On Sat, Jan 2, 2016 at 2:03 PM, lofifnc <[hidden email]> wrote: > I'm currently updating and improving the documentation[1] of flink-spector. > Regarding missing examples: I'm planning to include small examples showing > the behaviour of output matchers, as the documentation already includes > several demonstrating how to assemble test cases. Sounds good, I would find those useful! > Please let me know, if > you're having problems with a particular aspect of the framework or the > documentation or like to have an example for a special case. I will try to > work it into the next version. Ok. Like you said, I quickly found problems trying to test my scala code, so I didn't get very far. Scalatest support will probably help with most issues that I've found. I will be sure to try flinkspector again when it arrives > At the moment the framework is lacking support to tests the scala api. Thanks for your work on this. It's something that I'm interested in, and I'm sure that I'm not the only one! Regards, Filipe |
In reply to this post by Filipe Correia
Hi list,
Here's a concrete example of an issue that I've found when trying to unit test a flink app (scroll down to see the console output): https://gist.github.com/filipefigcorreia/fdf106eb3d40e035f82a I am creating a custom datasink to collect the results, but the execution seems to finish before having the chance of actually collecting any results (race condition?). Any ideas of what I may be doing wrong? I have found this thread on the list that seems to describe a similar problem, although the solution of using "env.setParallelism(1)" didn't work for me: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Published-test-artifacts-for-flink-streaming-tp3379p3560.html Thanks in advance, Filipe On Sat, Jan 2, 2016 at 2:26 PM, Filipe Correia <[hidden email]> wrote: > Hi Stephan, > > Yes, both generating the datastreams and verifying expectations. Is > the recommended way to create custom data sources and data sinks? I've > meanwhile started down this road, but still hoping for a better way. > > Filipe > > On Thu, Dec 31, 2015 at 3:20 PM, Stephan Ewen <[hidden email]> wrote: >> Hi! >> >> Are you referring to testing streaming programs? >> What is the main obstacle for you? Generating test data streams? >> >> Thanks, >> Stephan >> >> >> On Thu, Dec 31, 2015 at 12:43 PM, Filipe Correia <[hidden email]> >> wrote: >>> >>> Hi all, >>> >>> I'm finding it hard to unit test my Flink application. Are there any >>> guidelines / best practices for unit testing a Flink application, >>> especially for programming for the streaming API with Scala? >>> >>> Having a few good examples would also help a lot. >>> >>> I'm aware of flink-spector[1], and it looks great. Although the API is >>> not ready to be used from Scala yet, I've created my tests with Java >>> and managed to get it working for some very simple cases . But it does >>> still lack documentation / examples, so I'm having trouble using it >>> for most of the unit tests that I would like to create. >>> >>> Thanks, >>> >>> Filipe >>> >>> [1] https://github.com/ottogroup/flink-spector >> >> |
Hi Filipe,
The problem your encountering most likely stems from the fact that Flink serializes all operators before running them in the (local) cluster. During this process all outside references inside your sink are lost. In the thread you've found are two solutions for this: Use the collect sink: contrib/streaming/DataStreamUtils.java which provides you with an iterator containing the results. I've no idea if this works with the current version of Flink or even with the scala api DataStream class. And the solution from nick who directly uses a local static variable for the results: https://gist.github.com/ndimiduk/5f3b4757eb772feed6e6 you have to set the parallelism to 1 as you've discovered. (Haven't tested this either) The third solution would be of course to use flink-spector, which takes care of these issues. But I have currently no time to finish support for scala. You can find an example and the current state here: [DataStreamSpec]https://github.com/lofifnc/flink-spector/blob/scala_api/flinkspector-datastream-scala_2.11/src/test/scala/org/flinkspector/scala/datastream/DataStreamSpec.scala. (This is works but has also not really been tested) Best, Alex |
Hi Alex,
Thanks for the summary! I've tried those options and couldn't make the 2nd and 3rd work as I expected, so I've settled on using DataStreamUtils.collect(), and it's working fine so far. I've made a pull request with an example of this to be added to the docs: https://github.com/apache/flink/pull/1487 Thanks, Filipe On Wed, Jan 6, 2016 at 2:37 PM, lofifnc <[hidden email]> wrote: > Hi Filipe, > > The problem your encountering most likely stems from the fact that Flink > serializes all operators before > running them in the (local) cluster. During this process all outside > references inside your sink are lost. > > In the thread you've found are two solutions for this: Use the collect sink: > contrib/streaming/DataStreamUtils.java > <https://github.com/apache/flink/blob/master/flink-contrib/flink-streaming-contrib/src/main/java/org/apache/flink/contrib/streaming/DataStreamUtils.java> > which provides you with an iterator containing the results. I've no idea if > this works with the current version of Flink or even with the scala api > DataStream class. > > And the solution from nick who directly uses a local static variable for the > results: https://gist.github.com/ndimiduk/5f3b4757eb772feed6e6 > <https://gist.github.com/ndimiduk/5f3b4757eb772feed6e6> you have to set > the parallelism to 1 as you've discovered. (Haven't tested this either) > > The third solution would be of course to use flink-spector, which takes care > of these issues. But I have currently no time to finish support for scala. > You can find an example and the current state here: > [DataStreamSpec]https://github.com/lofifnc/flink-spector/blob/scala_api/flinkspector-datastream-scala_2.11/src/test/scala/org/flinkspector/scala/datastream/DataStreamSpec.scala. > (This is works but has also not really been tested) > > Best, > Alex > > > > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unit-testing-support-for-flink-application-tp4130p4189.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Free forum by Nabble | Edit this page |