Data type serialization and testing

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Data type serialization and testing

Dave Maughan
Hi,

I recently encountered a scenario where the data type being passed between operators in my streaming job was modified such that it broke serialization. This was due to a non-Avro top-level data type containing an Avro field. The existing integration test (mini cluster) continued to work and unit tests that attempted to cover Kryo serialization continued to work, but when deployed to a real cluster it failed. The problem was easily solved but in future I'd like to catch problems like this in my testing.

Is there a way to force serialization always between all operators in the mini-cluster? Or is there another strategy I can apply to exercise the serialization of my data types?

Thanks,
Dave
Reply | Threaded
Open this post in threaded view
|

Re: Data type serialization and testing

Timo Walther
Hi Dave,

maybe it would be better to execute your tests against a local cluster
instead of the mini cluster. Also object reuse should be disabled and
chaining should be disabled to force serialization.

Maybe others have better ideas.

Regards,
Timo

On 30.04.21 10:25, Dave Maughan wrote:

> Hi,
>
> I recently encountered a scenario where the data type being passed
> between operators in my streaming job was modified such that it broke
> serialization. This was due to a non-Avro top-level data type containing
> an Avro field. The existing integration test (mini cluster) continued to
> work and unit tests that attempted to cover Kryo serialization continued
> to work, but when deployed to a real cluster it failed. The problem was
> easily solved but in future I'd like to catch problems like this in my
> testing.
>
> Is there a way to force serialization always between all operators in
> the mini-cluster? Or is there another strategy I can apply to exercise
> the serialization of my data types?
>
> Thanks,
> Dave

Reply | Threaded
Open this post in threaded view
|

Re: Data type serialization and testing

Dave Maughan
Hi Timo,

Thanks for your suggestions. I did notice the chaining option. I'll give them a try.

Is there an established pattern for running tests against a local cluster? Or any examples you could point me to? I did notice a FlinkContainer (testcontainers) but it appears to be in a module that is not published.

Thanks,
Dave

On Fri, 30 Apr 2021 at 13:11, Timo Walther <[hidden email]> wrote:
Hi Dave,

maybe it would be better to execute your tests against a local cluster
instead of the mini cluster. Also object reuse should be disabled and
chaining should be disabled to force serialization.

Maybe others have better ideas.

Regards,
Timo

On 30.04.21 10:25, Dave Maughan wrote:
> Hi,
>
> I recently encountered a scenario where the data type being passed
> between operators in my streaming job was modified such that it broke
> serialization. This was due to a non-Avro top-level data type containing
> an Avro field. The existing integration test (mini cluster) continued to
> work and unit tests that attempted to cover Kryo serialization continued
> to work, but when deployed to a real cluster it failed. The problem was
> easily solved but in future I'd like to catch problems like this in my
> testing.
>
> Is there a way to force serialization always between all operators in
> the mini-cluster? Or is there another strategy I can apply to exercise
> the serialization of my data types?
>
> Thanks,
> Dave