Unit testing support for flink application?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Unit testing support for flink application?

Filipe Correia
Hi all,

I'm finding it hard to unit test my Flink application. Are there any
guidelines / best practices for unit testing a Flink application,
especially for programming for the streaming API with Scala?

Having a few good examples would also help a lot.

I'm aware of flink-spector[1], and it looks great. Although the API is
not ready to be used from Scala yet, I've created my tests with Java
and managed to get it working for some very simple cases . But it does
still lack documentation / examples, so I'm having trouble using it
for most of the unit tests that I would like to create.

Thanks,

Filipe

[1] https://github.com/ottogroup/flink-spector
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

Stephan Ewen
Hi!

Are you referring to testing streaming programs?
What is the main obstacle for you? Generating test data streams?

Thanks,
Stephan


On Thu, Dec 31, 2015 at 12:43 PM, Filipe Correia <[hidden email]> wrote:
Hi all,

I'm finding it hard to unit test my Flink application. Are there any
guidelines / best practices for unit testing a Flink application,
especially for programming for the streaming API with Scala?

Having a few good examples would also help a lot.

I'm aware of flink-spector[1], and it looks great. Although the API is
not ready to be used from Scala yet, I've created my tests with Java
and managed to get it working for some very simple cases . But it does
still lack documentation / examples, so I'm having trouble using it
for most of the unit tests that I would like to create.

Thanks,

Filipe

[1] https://github.com/ottogroup/flink-spector

Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

lofifnc
In reply to this post by Filipe Correia
Hi,

I'm currently updating and improving the documentation[1] of flink-spector. Regarding missing examples: I'm planning to include small examples showing the behaviour of output matchers, as the documentation already includes several demonstrating how to assemble test cases. Please let me know, if you're having problems with a particular aspect of the framework or the documentation or like to have an example for a special case. I will try to work it into the next version.

At the moment the framework is lacking support to tests the scala api. As you've experienced It's possible to  create simple cases but you'll soon discover problems with hamcrest and scala types in general. I have written a test environment for the scala api. And it should be easy to provide a nice trait for scalatest and some wrappers around the java classes. What I'm struggling with at the moment, is to utilize the scalatest matchers for output verification.

Best,
Alex

[1]https://github.com/ottogroup/flink-spector/wiki
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

Filipe Correia
In reply to this post by Stephan Ewen
Hi Stephan,

Yes, both generating the datastreams and verifying expectations. Is
the recommended way to create custom data sources and data sinks? I've
meanwhile started down this road, but still hoping for a better way.

Filipe

On Thu, Dec 31, 2015 at 3:20 PM, Stephan Ewen <[hidden email]> wrote:

> Hi!
>
> Are you referring to testing streaming programs?
> What is the main obstacle for you? Generating test data streams?
>
> Thanks,
> Stephan
>
>
> On Thu, Dec 31, 2015 at 12:43 PM, Filipe Correia <[hidden email]>
> wrote:
>>
>> Hi all,
>>
>> I'm finding it hard to unit test my Flink application. Are there any
>> guidelines / best practices for unit testing a Flink application,
>> especially for programming for the streaming API with Scala?
>>
>> Having a few good examples would also help a lot.
>>
>> I'm aware of flink-spector[1], and it looks great. Although the API is
>> not ready to be used from Scala yet, I've created my tests with Java
>> and managed to get it working for some very simple cases . But it does
>> still lack documentation / examples, so I'm having trouble using it
>> for most of the unit tests that I would like to create.
>>
>> Thanks,
>>
>> Filipe
>>
>> [1] https://github.com/ottogroup/flink-spector
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

Filipe Correia
In reply to this post by lofifnc
Hello Alex,

Thanks for the reply.

On Sat, Jan 2, 2016 at 2:03 PM, lofifnc <[hidden email]> wrote:
> I'm currently updating and improving the documentation[1] of flink-spector.
> Regarding missing examples: I'm planning to include small examples showing
> the behaviour of output matchers, as the documentation already includes
> several demonstrating how to assemble test cases.

Sounds good, I would find those useful!

> Please let me know, if
> you're having problems with a particular aspect of the framework or the
> documentation or like to have an example for a special case. I will try to
> work it into the next version.

Ok. Like you said, I quickly found problems trying to test my scala
code, so I didn't get very far. Scalatest support will probably help
with most issues that I've found. I will be sure to try flinkspector
again when it arrives

> At the moment the framework is lacking support to tests the scala api.

Thanks for your work on this. It's something that I'm interested in,
and I'm sure that I'm not the only one!

Regards,

Filipe
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

Filipe Correia
In reply to this post by Filipe Correia
Hi list,

Here's a concrete example of an issue that I've found when trying to
unit test a flink app (scroll down to see the console output):
https://gist.github.com/filipefigcorreia/fdf106eb3d40e035f82a

I am creating a custom datasink to collect the results, but the
execution seems to finish before having the chance of actually
collecting any results (race condition?). Any ideas of what I may be
doing wrong?

I have found this thread on the list that seems to describe a similar
problem, although the solution of using "env.setParallelism(1)" didn't
work for me:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Published-test-artifacts-for-flink-streaming-tp3379p3560.html

Thanks in advance,

Filipe


On Sat, Jan 2, 2016 at 2:26 PM, Filipe Correia <[hidden email]> wrote:

> Hi Stephan,
>
> Yes, both generating the datastreams and verifying expectations. Is
> the recommended way to create custom data sources and data sinks? I've
> meanwhile started down this road, but still hoping for a better way.
>
> Filipe
>
> On Thu, Dec 31, 2015 at 3:20 PM, Stephan Ewen <[hidden email]> wrote:
>> Hi!
>>
>> Are you referring to testing streaming programs?
>> What is the main obstacle for you? Generating test data streams?
>>
>> Thanks,
>> Stephan
>>
>>
>> On Thu, Dec 31, 2015 at 12:43 PM, Filipe Correia <[hidden email]>
>> wrote:
>>>
>>> Hi all,
>>>
>>> I'm finding it hard to unit test my Flink application. Are there any
>>> guidelines / best practices for unit testing a Flink application,
>>> especially for programming for the streaming API with Scala?
>>>
>>> Having a few good examples would also help a lot.
>>>
>>> I'm aware of flink-spector[1], and it looks great. Although the API is
>>> not ready to be used from Scala yet, I've created my tests with Java
>>> and managed to get it working for some very simple cases . But it does
>>> still lack documentation / examples, so I'm having trouble using it
>>> for most of the unit tests that I would like to create.
>>>
>>> Thanks,
>>>
>>> Filipe
>>>
>>> [1] https://github.com/ottogroup/flink-spector
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

lofifnc
Hi Filipe,

The problem your encountering most likely stems from the fact that Flink serializes all operators before
running them in the (local) cluster. During this process all outside references inside your sink are lost.

In the thread you've found are two solutions for this: Use the collect sink:
contrib/streaming/DataStreamUtils.java
which provides you with an iterator containing the results. I've no idea if this works with the current version of Flink or even with the scala api DataStream class.

And the solution from nick who directly uses a local static variable for the results: https://gist.github.com/ndimiduk/5f3b4757eb772feed6e6 you have to set the parallelism to 1 as you've discovered. (Haven't tested this either)

The third solution would be of course to use flink-spector, which takes care of these issues. But I have currently no time to finish support for scala. You can find an example and the current state here: [DataStreamSpec]https://github.com/lofifnc/flink-spector/blob/scala_api/flinkspector-datastream-scala_2.11/src/test/scala/org/flinkspector/scala/datastream/DataStreamSpec.scala. (This is works but has also not really been tested)

Best,
Alex


Reply | Threaded
Open this post in threaded view
|

Re: Unit testing support for flink application?

Filipe Correia
Hi Alex,

Thanks for the summary! I've tried those options and couldn't make the
2nd and 3rd work as I expected, so I've settled on using
DataStreamUtils.collect(), and it's working fine so far. I've made a
pull request with an example of this to be added to the docs:
https://github.com/apache/flink/pull/1487

Thanks,
Filipe

On Wed, Jan 6, 2016 at 2:37 PM, lofifnc <[hidden email]> wrote:

> Hi Filipe,
>
> The problem your encountering most likely stems from the fact that Flink
> serializes all operators before
> running them in the (local) cluster. During this process all outside
> references inside your sink are lost.
>
> In the thread you've found are two solutions for this: Use the collect sink:
> contrib/streaming/DataStreamUtils.java
> <https://github.com/apache/flink/blob/master/flink-contrib/flink-streaming-contrib/src/main/java/org/apache/flink/contrib/streaming/DataStreamUtils.java>
> which provides you with an iterator containing the results. I've no idea if
> this works with the current version of Flink or even with the scala api
> DataStream class.
>
> And the solution from nick who directly uses a local static variable for the
> results:  https://gist.github.com/ndimiduk/5f3b4757eb772feed6e6
> <https://gist.github.com/ndimiduk/5f3b4757eb772feed6e6>   you have to set
> the parallelism to 1 as you've discovered. (Haven't tested this either)
>
> The third solution would be of course to use flink-spector, which takes care
> of these issues. But I have currently no time to finish support for scala.
> You can find an example and the current state here:
> [DataStreamSpec]https://github.com/lofifnc/flink-spector/blob/scala_api/flinkspector-datastream-scala_2.11/src/test/scala/org/flinkspector/scala/datastream/DataStreamSpec.scala.
> (This is works but has also not really been tested)
>
> Best,
> Alex
>
>
>
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unit-testing-support-for-flink-application-tp4130p4189.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.