Multiple Sources and Isolated Analytics

classic Classic list List threaded Threaded
2 messages Options
fms
Reply | Threaded
Open this post in threaded view
|

Multiple Sources and Isolated Analytics

fms
This post was updated on .
I am new and just starting out with Flink so please forgive if the question doesn't quite make sense. I would like to evaulate flink for a big data pipeline. The part I am confused is i have not seen any example of multiple streams.

If i have multiple source say from IoT, mobile, web logs, etc... sending messages into Kafka, which are then being read in by Flink, and I want to query and aggregate each of those sources separately from each other, would I need to run a separate flink job for each source basically ? If not, how would i isolate one set of messages and the analytics that I perform on them from the other ones ?

If each job is required to be run separately, can they be run in parallel ?

Please advise if I am missing something here.
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Sources and Isolated Analytics

Jonas Gröger
One job can have multiple data sources. You would then have one stream per source like this:
    val iotStream: DataStream[IOTEntity] = ???
    val mobileStream: DataStream[MobileEntity] = ???
    val weblogStream: DataStream[WebLogEntity] = ???

You can then create separate operator graphs from this so that you have three
separate computations on the data, maybe like this:
    iotStream.writeToSocket(...)
    mobileStream.writeToSocket(...)
    weblogStream.writeToSocket(...)
They will be executed separately. However, if you don't want to combine them or something, I would suggest you to create three jobs.

-- Jonas

fms wrote
I am new and just starting out with Flink so please forgive if the question
doesn't quite make sense. I would like to evaulate flink for a big data
pipeline. The part I am confused is i have not seen any example of multiple
streams.

If i have multiple source say from IoT, mobile, web logs, etc... sending
messages into Kafka, which are then being read in by Flink, and I want to
query and aggregate each of those sources separately from each other, would
I need to run a separate flink job for each source basically ? If not, how
would i isolate one set of messages and the analytics that I perform on them
from the other ones ?

If each job is required to be run separately, can they be run in parallel ?

Please advise if I am missing something here.
Quoted from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Multiple-Sources-and-Isolated-Analytics-tp11059.html