Streaming in Flink

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming in Flink

Sourav Mazumder
Hi,

Does Flink support push based data streaming where the data source can push the events/data to Flink cluster over a socket (instead of Flink pulling the data at a given frequency)?

Regards,
Sourav
Reply | Threaded
Open this post in threaded view
|

Re: Streaming in Flink

Tzu-Li Tai
Hi Sourav,

Flink's streaming processes incoming data by-each-entry (true streaming, as compared to micro-batch), and streaming is inherently designed as a push-model, where a topology of stream transformations "listens" to a data source.

You can have a Flink streaming topology's data source configured to be sockets or message queues such as Kafka's topics.
Any event / data that is streamed to (or in other words, "pushed" to) the socket or Kafka topic will be processed by the Flink topology in real-time.

Therefore, the answer is yes to your question. Hope this helps.

BR,
Gordon

On Tue, Jan 5, 2016 at 12:42 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Does Flink support push based data streaming where the data source can push the events/data to Flink cluster over a socket (instead of Flink pulling the data at a given frequency)?

Regards,
Sourav



--
Tzu-Li Tai (Gordon Tai)
戴資力

National Cheng Kung University, Graduate Institute of Computer and Communication Engineering
High Performance Parallel and Distributed Systems Laboratory (HPDS Lab)
國立成功大學電機工程學系 - 電腦與通信工程研究所
高效能平行/分散系統實驗室 (HPDS Lab)

National Cheng Kung University, Engineering Science Dpt.
國立成功大學工程科學系

Contacts
+886981916890
Reply | Threaded
Open this post in threaded view
|

Re: Streaming in Flink

Sourav Mazumder
Hi Gordon,

Need little more clarification around reading data from Kafka.

As soon as any component behaves as a consumer of a topic/queue (iof a messaging system), it essentially does polling of the data after a regular interval (that interval may be small though). Hence essentially it captures all data/events accumulated in the queue from last polling instant to the current polling in a pull manner

This pattern is very different from real time push of data where a daemon process keeps on waiting on a continuous basis for any data pushed to it.

So what I'm looking clarification for is whether Flink supports a mechanism where a data source (actually any client application) can push a data to a socket which is continuously listened by a deamon process of flink.

Regards,
Sourav

On Mon, Jan 4, 2016 at 9:39 PM, Gordon Tai (戴資力) <[hidden email]> wrote:
Hi Sourav,

Flink's streaming processes incoming data by-each-entry (true streaming, as compared to micro-batch), and streaming is inherently designed as a push-model, where a topology of stream transformations "listens" to a data source.

You can have a Flink streaming topology's data source configured to be sockets or message queues such as Kafka's topics.
Any event / data that is streamed to (or in other words, "pushed" to) the socket or Kafka topic will be processed by the Flink topology in real-time.

Therefore, the answer is yes to your question. Hope this helps.

BR,
Gordon

On Tue, Jan 5, 2016 at 12:42 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Does Flink support push based data streaming where the data source can push the events/data to Flink cluster over a socket (instead of Flink pulling the data at a given frequency)?

Regards,
Sourav



--
Tzu-Li Tai (Gordon Tai)
戴資力

National Cheng Kung University, Graduate Institute of Computer and Communication Engineering
High Performance Parallel and Distributed Systems Laboratory (HPDS Lab)
國立成功大學電機工程學系 - 電腦與通信工程研究所
高效能平行/分散系統實驗室 (HPDS Lab)

National Cheng Kung University, Engineering Science Dpt.
國立成功大學工程科學系

Contacts
<a href="tel:%2B886981916890" value="+886981916890" target="_blank">+886981916890

Reply | Threaded
Open this post in threaded view
|

Re: Streaming in Flink

Chiwan Park-2
Hi Sourav,

Basically, Kafka consumer is pull-based [1]. If you want to build push-based system, you should use other options.

Flink supports both pull-based and push-based paradigm. It depends upon an implementation of data source. As one of examples, Flink provides a streaming source function based on socket [2].

[1] http://kafka.apache.org/documentation.html#design_pull
[2] https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java

> On Jan 5, 2016, at 3:01 PM, Sourav Mazumder <[hidden email]> wrote:
>
> Hi Gordon,
>
> Need little more clarification around reading data from Kafka.
>
> As soon as any component behaves as a consumer of a topic/queue (iof a messaging system), it essentially does polling of the data after a regular interval (that interval may be small though). Hence essentially it captures all data/events accumulated in the queue from last polling instant to the current polling in a pull manner
>
> This pattern is very different from real time push of data where a daemon process keeps on waiting on a continuous basis for any data pushed to it.
>
> So what I'm looking clarification for is whether Flink supports a mechanism where a data source (actually any client application) can push a data to a socket which is continuously listened by a deamon process of flink.
>
> Regards,
> Sourav
>
> On Mon, Jan 4, 2016 at 9:39 PM, Gordon Tai (戴資力) <[hidden email]> wrote:
> Hi Sourav,
>
> Flink's streaming processes incoming data by-each-entry (true streaming, as compared to micro-batch), and streaming is inherently designed as a push-model, where a topology of stream transformations "listens" to a data source.
>
> You can have a Flink streaming topology's data source configured to be sockets or message queues such as Kafka's topics.
> Any event / data that is streamed to (or in other words, "pushed" to) the socket or Kafka topic will be processed by the Flink topology in real-time.
>
> Therefore, the answer is yes to your question. Hope this helps.
>
> BR,
> Gordon
>
> On Tue, Jan 5, 2016 at 12:42 PM, Sourav Mazumder <[hidden email]> wrote:
> Hi,
>
> Does Flink support push based data streaming where the data source can push the events/data to Flink cluster over a socket (instead of Flink pulling the data at a given frequency)?
>
> Regards,
> Sourav
>
>
>
> --
> Tzu-Li Tai (Gordon Tai)
> 戴資力
>
> National Cheng Kung University, Graduate Institute of Computer and Communication Engineering
> High Performance Parallel and Distributed Systems Laboratory (HPDS Lab)
> 國立成功大學電機工程學系 - 電腦與通信工程研究所
> 高效能平行/分散系統實驗室 (HPDS Lab)
>
> National Cheng Kung University, Engineering Science Dpt.
> 國立成功大學工程科學系
>
> Contacts
> [hidden email]
> http://tzulitai.ee.ncku.edu.tw
> Linkedin: tw.linkedin.com/in/tzulitai
> +886981916890
>

Regards,
Chiwan Park