Data ingestion using a Flink TCP Server

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Data ingestion using a Flink TCP Server

omaralvarez
I have been trying to send data from several processes to my Flink application. I want to use a single port that will receive data from multiple clients. I have implemented my own SourceFunction, but I have two doubts.

My TCP server has multiple threads receiving data from multiple clients, is calling ctx.collect() without synchronization safe?

Second, when I have a source function, how is the data processing of the arrival messages handled? Is a source function parallelized? For instance with the simple SocketTextStream, are messages received in parallel or received sequentially and then processed in parallel by mappers and so on?

Sorry if my questions seem too obvious, but I'm kind of new to the Streaming API, and thank you in advance!
Reply | Threaded
Open this post in threaded view
|

Re: Data ingestion using a Flink TCP Server

Aljoscha Krettek
Hi,
regarding your first question. I think it is in general not safe to call ctx.collect() from Threads other than the Thread that is invoking the run() method of your SourceFunction. What I would suggest is to have a queue that your reader threads put data into and then read from that queue in the main thread to call ctx.collect().

For your second question: it depends on the Source. Sources that implement the SourceFunction interface are not parallelized. One instance of the source will send data (round-robin) to downstream parallel mappers (or other operations). If you implement the ParallelSourceFunction or RichParallelSourceFunction then your source will be parallelized.

Cheers,
Aljoscha

On Tue, 24 May 2016 at 22:15 omaralvarez <[hidden email]> wrote:
I have been trying to send data from several processes to my Flink
application. I want to use a single port that will receive data from
multiple clients. I have implemented my own SourceFunction, but I have two
doubts.

My TCP server has multiple threads receiving data from multiple clients, is
calling ctx.collect() without synchronization safe?

Second, when I have a source function, how is the data processing of the
arrival messages handled? Is a source function parallelized? For instance
with the simple SocketTextStream, are messages received in parallel or
received sequentially and then processed in parallel by mappers and so on?

Sorry if my questions seem too obvious, but I'm kind of new to the Streaming
API, and thank you in advance!



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-ingestion-using-a-Flink-TCP-Server-tp7134.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Data ingestion using a Flink TCP Server

omaralvarez
Hi,

Thank you very much for your answer. There is one more doubt in my mind. How are not parallelized source funtions processed? For instance, lets say I have four streams that implement SourceFunction, will they be placed on different parallel instances or will they be processed sequentially by the same instance?

Could you direct me to an example of a ParallelSourceFunction, because looking at the code I don't see an interface or any implementation example in the documentation.

Best,

Omar.
Reply | Threaded
Open this post in threaded view
|

Re: Data ingestion using a Flink TCP Server

Stephan Ewen
Hi!

A typical example of a parallel source is the Kafka Source.

Actually, other threads than the main run() thread can call ctx.collect(), provided they use the checkpoint lock properly. The Kafka source does that.

Stephan


On Wed, May 25, 2016 at 11:50 AM, omaralvarez <[hidden email]> wrote:
Hi,

Thank you very much for your answer. There is one more doubt in my mind. How
are not parallelized source funtions processed? For instance, lets say I
have four streams that implement SourceFunction, will they be placed on
different parallel instances or will they be processed sequentially by the
same instance?

Could you direct me to an example of a ParallelSourceFunction, because
looking at the code I don't see an interface or any implementation example
in the documentation.

Best,

Omar.



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-ingestion-using-a-Flink-TCP-Server-tp7134p7159.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Data ingestion using a Flink TCP Server

omaralvarez
Thanks for your answers, this makes the implementation way easier, I don't have to worry about queues.

I will take a look at the kafka connector.

So my only remaining question is how serial stream sources are handled. If I have four independent streams, will the sources be handled by different parallel instances?
Reply | Threaded
Open this post in threaded view
|

Re: Data ingestion using a Flink TCP Server

Aljoscha Krettek
Hi,
yes, if you have 4 independent non-parallel sources they will be executed independently in different threads.

Cheers,
Aljoscha

On Wed, 25 May 2016 at 13:40 omaralvarez <[hidden email]> wrote:
Thanks for your answers, this makes the implementation way easier, I don't
have to worry about queues.

I will take a look at the kafka connector.

So my only remaining question is how serial stream sources are handled. If I
have four independent streams, will the sources be handled by different
parallel instances?



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-ingestion-using-a-Flink-TCP-Server-tp7134p7164.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Data ingestion using a Flink TCP Server

omaralvarez
Thanks to everybody, all my doubts are solved. I gotta give it to you guys, the answers were really fast!

Cheers,

Omar.