Are heterogeneous DataStreams possible?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Are heterogeneous DataStreams possible?

ljwagerfield
Our data's schema is defined by our users and is not known at compile time.

All data arrives in via a single Kafka topic and is serialized using the same serialization tech (to be defined).

We want to use King.com's RBEA technique to process this data in different ways at runtime (depending on its schema), using a single topology/DAG.

Therefore, each message passing through the DAG will have a different schema.

---

My question is, what's the best way to implement a system like this, where each message may have a different schema, and none of the schemas are known at compile time, but must use the same DAG?

I've tried using an 'array of heterogenous tuples' which appears to work fine when playing around in the IDE, but before I continue too far down that route, I just wanted to verify if there were any known methods for doing this?

Thanks!
Lawrence
Reply | Threaded
Open this post in threaded view
|

Re: Are heterogeneous DataStreams possible?

ljwagerfield
I should add: the operators determine how to handle each message by inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as its first field).
Reply | Threaded
Open this post in threaded view
|

Re: Are heterogeneous DataStreams possible?

Aljoscha Krettek
You could try using JSON for all your data, this might me slow, however. The other route, which I would suggest, is to have your own custom TypeSerializers than can efficiently deal with different types and dynamic schemas.

Cheers,
Aljoscha

On Thu, 5 Jan 2017 at 07:02 ljwagerfield <[hidden email]> wrote:
I should add: the operators determine how to handle each message by
inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as
its first field).



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852p10853.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Are heterogeneous DataStreams possible?

Henri Heiskanen
Hi,

We have been using HashMap and has been working fine so far.

Br,
Henkka

On Mon, Jan 9, 2017 at 5:35 PM, Aljoscha Krettek <[hidden email]> wrote:
You could try using JSON for all your data, this might me slow, however. The other route, which I would suggest, is to have your own custom TypeSerializers than can efficiently deal with different types and dynamic schemas.

Cheers,
Aljoscha

On Thu, 5 Jan 2017 at 07:02 ljwagerfield <[hidden email]> wrote:
I should add: the operators determine how to handle each message by
inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as
its first field).



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852p10853.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.