multiple input streams

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple input streams

Eric L Goodman
If I have a standalone cluster running flink, what is the best way to ingest multiple streams of the same type of data?

For example, if I open a socket text stream, does the socket only get opened on the master node and then the stream is partitioned out to the worker nodes?
 DataStream<String> text = env.socketTextStream("localhost", port, "\n");

Is it possible to have each worker node be a sensor that receives a stream of data, where each stream is of the same type (e.g. a series of tuples)?

Thanks

Reply | Threaded
Open this post in threaded view
|

Re: multiple input streams

Hequn Cheng
Hi Eric,

> does the socket only get opened on the master node and then the stream is partitioned out to the worker nodes?
No, the socket are opened on the worker. As for the socket example, Flink starts a source task as a worker to ingest data. 
What's more, you can use setParallelism() method to control the source parallelism and to ingest multi-streams. However, the default socket source function(provided by flink) is not a ParallelSourceFunction, so the max parallelism should not bigger than 1. You can implement a user defined source function if you want.

> Is it possible to have each worker node be a sensor that receives a stream of data, where each stream is of the same type (e.g. a series of tuples)?
Yes, nearly all source connectors[1] works in this way.

I have no idea why you think the socket is opened on the master. Maybe this document[2] is helpful for you.

Best, Hequn


On Sat, Sep 1, 2018 at 11:47 AM Eric L Goodman <[hidden email]> wrote:
If I have a standalone cluster running flink, what is the best way to ingest multiple streams of the same type of data?

For example, if I open a socket text stream, does the socket only get opened on the master node and then the stream is partitioned out to the worker nodes?
 DataStream<String> text = env.socketTextStream("localhost", port, "\n");

Is it possible to have each worker node be a sensor that receives a stream of data, where each stream is of the same type (e.g. a series of tuples)?

Thanks