I know that for ADTs (sealed traits) there are some ongoing efforts to
overcome the performance degradation caused by the kryo fallback (see
https://github.com/apache/flink/pull/12929). E.g.,
```
sealed trait Event {
def id: Int
}
case class Pageview(id: Int, page: String) extends Event
case class Click(id: Int, url: String) extends Event
```
However, is there anything one can do for handling a similar situation but
for unsealed traits. That is, imagine a situation where Event is not sealed
because you don't know in advance which specific events you will be dealing
with, or maybe you are working on a generic framework that will be used to
develop specific applications afterwards, each one dealing with a set of
particular events. So, basically, the trait cannot be sealed. Instead of
simple case classes like Pageview and Click, within each specific
application one could be dealing with auto-generated case classes from
protocol buffer definitions, in a system which should be able to add more
types of events into the mix, so to speak.
How should one address this problem, if serialization performance wants to
be optimized? The general advice is not to use Flink with heterogenous types
to start with, because it will not be able to derive efficient serializers.
But the described use case sounds legit to me, so what would be the best way
to handle it, or, to put it another way, how to minimize performance
degradation due to serialization?
FYI: Posted also in SO:
https://stackoverflow.com/questions/65085335/best-way-to-handle-data-streams-for-non-sealed-trait-hierarchies.
--
Sent from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/