Proper way to do Integration Testing ?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Proper way to do Integration Testing ?

Faye Pressly
Hello!

I have built a Flink pipeline that involve

1) Reading events from a Kinesis Stream

2) Create two DataStream (using .filter) with events of type 'A' going in stream1 and event of type 'B' going in stream2

3) Transform stream1 into a Table and use Table API to do a simple window tumble and group by to counts the events

4) Interval join stream1 with stream2 in order to filter out some event in stream1 that are not in stream2

5) Transform the result of the interval join into a table and use Table API to do a simple Tumble Window Group by to count the events

6) Join 3) and 5) and transform back to a stream that sinks to an output kinesis stream


I have read the documentation that shows some examples of Unit Testing but I'm scratching my end to know how I'm going to be able to IT test my pipeline to make sure all the computation are correct given an exact input dataset?

Is there a proper way of writing IT to test my pipepleine?

Or will have I have to bring up a Flink cluster (with docker for example, fire events with a python scripts and then check the results by reading the output stream?

Thank you!
Reply | Threaded
Open this post in threaded view
|

Re: Proper way to do Integration Testing ?

Timo Walther
Hi Faye,

Flink does not officially provide testing tools at the moment. However,
you can use internal Flink tools if they solve your problem.

The `flink-end-to-end-tests` module [1] shows some examples how we test
Flink together with other systems. Many tests are still using plain bash
scripts (in the `test-scripts` folder). The newer generation uses a test
base like StreamingKafkaITCase [2] or SQLClientKafkaITCase [3].

Alternatively, you can also replace the connectors with testing
connectors and just run integration tests for your pipeline. Like we do
it in StreamTableEnvironmentITCase [4].

[1] https://github.com/apache/flink/tree/master/flink-end-to-end-tests
[2]
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/java/org/apache/flink/tests/util/kafka/StreamingKafkaITCase.java
[3]
https://github.com/apache/flink/blob/master/flink-end-to-end-tests/flink-end-to-end-tests-common-kafka/src/test/java/org/apache/flink/tests/util/kafka/SQLClientKafkaITCase.java
[4]
https://github.com/apache/flink/blob/master/flink-table/flink-table-planner-blink/src/test/scala/org/apache/flink/table/planner/runtime/stream/sql/StreamTableEnvironmentITCase.scala

Regards,
Timo


On 10.08.20 21:46, Faye Pressly wrote:

> Hello!
>
> I have built a Flink pipeline that involve
>
> 1) Reading events from a Kinesis Stream
>
> 2) Create two DataStream (using .filter) with events of type 'A' going
> in stream1 and event of type 'B' going in stream2
>
> 3) Transform stream1 into a Table and use Table API to do a simple
> window tumble and group by to counts the events
>
> 4) Interval join stream1 with stream2 in order to filter out some event
> in stream1 that are not in stream2
>
> 5) Transform the result of the interval join into a table and use Table
> API to do a simple Tumble Window Group by to count the events
>
> 6) Join 3) and 5) and transform back to a stream that sinks to an output
> kinesis stream
>
>
> I have read the documentation that shows some examples of Unit Testing
> but I'm scratching my end to know how I'm going to be able to IT test my
> pipeline to make sure all the computation are correct given an exact
> input dataset?
>
> Is there a proper way of writing IT to test my pipepleine?
>
> Or will have I have to bring up a Flink cluster (with docker for
> example, fire events with a python scripts and then check the results by
> reading the output stream?
>
> Thank you!