Hi Flink Community Team, This is a desperate request for your help on below. I am new to the Flink and trying to use it with Kafka for Event-based data stream processing in my project. I am struggling using Flink to find solutions to my requirements of project below:
Please provide me sample “Flink code snippet” for pulling data from kafka for above three requirements and oblige. I am stuck for last one month without much progress and your timely help will be a savior for me! Thanks & Regards, Vinay Raichur T-Systems India | Digital Solutions Mail: [hidden email] Mobile: +91 9739488992 |
Hi,
for your point 3. you can look at `FlinkKafkaConsumerBase.setStartFromTimestamp(...)`. Points 1. and 2. will not work with the well established `FlinkKafkaConsumer`. However, it should be possible to do it with the new `KafkaSource` that was introduced in Flink 1.12. It might be a bit rough around the edged, though. With the `KafkaSource` you can specify `OffsetInitializers` for both the starting and stopping offset of the source. Take a look at `KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` in the code. I hope this helps. Best, Aljoscha On 2021/01/08 07:51, [hidden email] wrote: >Hi Flink Community Team, > >This is a desperate request for your help on below. > >I am new to the Flink and trying to use it with Kafka for Event-based data stream processing in my project. I am struggling using Flink to find solutions to my requirements of project below: > > > 1. Get all Kafka topic records at a given time point 't' (now or in the past). Also how to pull latest-record only* from Kafka using Flink > 2. Getting all records from Kafka for a given time interval in the past between t1 & t2 time period. > 3. Continuously getting data from Kafka starting at a given time point (now or in the past). The client will actively cancel/close the data streaming. Examples: live dashboards. How to do it using Flink? >Please provide me sample "Flink code snippet" for pulling data from kafka for above three requirements and oblige. I am stuck for last one month without much progress and your timely help will be a savior for me! >Thanks & Regards, >Vinay Raichur >T-Systems India<https://www.t-systems.com/in/en> | Digital Solutions >Mail: [hidden email]<mailto:[hidden email]> >Mobile: +91 9739488992 > |
Thanks Aljoscha for your prompt response. It means a lot to me 😊
Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` that you were referring to in your previous reply, for my reference please to make it more clearer for me. Kind regards, Vinay -----Original Message----- From: Aljoscha Krettek <[hidden email]> Sent: 08 January 2021 19:26 To: [hidden email] Subject: Re: Flink to get historical data from kafka between timespan t1 & t2 Hi, for your point 3. you can look at `FlinkKafkaConsumerBase.setStartFromTimestamp(...)`. Points 1. and 2. will not work with the well established `FlinkKafkaConsumer`. However, it should be possible to do it with the new `KafkaSource` that was introduced in Flink 1.12. It might be a bit rough around the edged, though. With the `KafkaSource` you can specify `OffsetInitializers` for both the starting and stopping offset of the source. Take a look at `KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` in the code. I hope this helps. Best, Aljoscha On 2021/01/08 07:51, [hidden email] wrote: >Hi Flink Community Team, > >This is a desperate request for your help on below. > >I am new to the Flink and trying to use it with Kafka for Event-based data stream processing in my project. I am struggling using Flink to find solutions to my requirements of project below: > > > 1. Get all Kafka topic records at a given time point 't' (now or in >the past). Also how to pull latest-record only* from Kafka using Flink > 2. Getting all records from Kafka for a given time interval in the past between t1 & t2 time period. > 3. Continuously getting data from Kafka starting at a given time point (now or in the past). The client will actively cancel/close the data streaming. Examples: live dashboards. How to do it using Flink? >Please provide me sample "Flink code snippet" for pulling data from kafka for above three requirements and oblige. I am stuck for last one month without much progress and your timely help will be a savior for me! >Thanks & Regards, >Vinay Raichur >T-Systems India<https://www.t-systems.com/in/en> | Digital Solutions >Mail: [hidden email]<mailto:[hidden email]> >Mobile: +91 9739488992 > |
On 2021/01/08 16:55, [hidden email] wrote:
>Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` that you were referring to in your previous reply, for my reference please to make it more clearer for me. Ah sorry, but this I was referring to the Flink code. You can start with `KafkaSource` [1], which has an example block that shows how to use it in the Javadocs. [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java |
Hi Aljoscha,
Currently in our set-up we are using docker container version of Flink:1.11.2 along with containerized version of Kafka running on our Kubernetes cluster as seen in attached screenshot .png file to this mail of our Vagrant VM. a) As mentioned by you "KafkaSource" was introduced in Flink 1.12 so, I suppose we have to upgrade to this version of Flink. Can you share the link of the stable Flink image (containerized version) to be used in our set-up keeping in mind we are to use KafkaSource. b) Also kindly share details of the corresponding compatible version of "Kafka" (containerized version) to be used for making this work as I understand there is constraint on version of Kafka required to make it work. Kind regards, Vinay -----Original Message----- From: Aljoscha Krettek <[hidden email]> Sent: 11 January 2021 16:06 To: [hidden email] Cc: RAICHUR, VINAY <[hidden email]> Subject: Re: Flink to get historical data from kafka between timespan t1 & t2 On 2021/01/08 16:55, [hidden email] wrote: >Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` that you were referring to in your previous reply, for my reference please to make it more clearer for me. Ah sorry, but this I was referring to the Flink code. You can start with `KafkaSource` [1], which has an example block that shows how to use it in the Javadocs. [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java Flink-KafkaService-up&running.png (158K) Download Attachment |
On 2021/01/11 14:12, [hidden email] wrote:
>a) As mentioned by you "KafkaSource" was introduced in Flink 1.12 so, I >suppose we have to upgrade to this version of Flink. Can you share the >link of the stable Flink image (containerized version) to be used in >our set-up keeping in mind we are to use KafkaSource. Unfortunately, there is a problem with publishing those images. You can check out [1] to follow progress. In the meantime you can try and build the image yourself or use the one that Robert pushed to his private account. >b) Also kindly share details of the corresponding compatible version of >"Kafka" (containerized version) to be used for making this work as I >understand there is constraint on version of Kafka required to make it >work. I believe Kafka is backwards compatible since at least version 1.0, so any recent enough version should work. Best, Aljoscha |
In reply to this post by VINAY.RAICHUR
Hi Team & Aljoscha
Any update on below please? -----Original Message----- From: RAICHUR, VINAY Sent: 11 January 2021 19:43 To: 'Aljoscha Krettek' <[hidden email]>; [hidden email] Subject: RE: Flink to get historical data from kafka between timespan t1 & t2 Importance: High Hi Aljoscha, Currently in our set-up we are using docker container version of Flink:1.11.2 along with containerized version of Kafka running on our Kubernetes cluster as seen in attached screenshot .png file to this mail of our Vagrant VM. a) As mentioned by you "KafkaSource" was introduced in Flink 1.12 so, I suppose we have to upgrade to this version of Flink. Can you share the link of the stable Flink image (containerized version) to be used in our set-up keeping in mind we are to use KafkaSource. b) Also kindly share details of the corresponding compatible version of "Kafka" (containerized version) to be used for making this work as I understand there is constraint on version of Kafka required to make it work. Kind regards, Vinay -----Original Message----- From: Aljoscha Krettek <[hidden email]> Sent: 11 January 2021 16:06 To: [hidden email] Cc: RAICHUR, VINAY <[hidden email]> Subject: Re: Flink to get historical data from kafka between timespan t1 & t2 On 2021/01/08 16:55, [hidden email] wrote: >Could you also attach the code snippet for KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` that you were referring to in your previous reply, for my reference please to make it more clearer for me. Ah sorry, but this I was referring to the Flink code. You can start with `KafkaSource` [1], which has an example block that shows how to use it in the Javadocs. [1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kafka/src/main/java/org/apache/flink/connector/kafka/source/KafkaSource.java |
In reply to this post by Aljoscha Krettek
Hi Aljoscha
Not sure about your proposal regarding Point 3: * firstly how is it ensured that the stream is closed? If I understand the doc correctly the stream will be established starting with the latest timestamp (hmm... is it not a standard behaviour?) and will never finish (UNBOUNDED), * secondly it is still not clear how to get the latest event at a given time point in the past? Please help me with answers for the above Thanks & regards Vinay -----Original Message----- From: Aljoscha Krettek <[hidden email]> Sent: 08 January 2021 19:26 To: [hidden email] Subject: Re: Flink to get historical data from kafka between timespan t1 & t2 Hi, for your point 3. you can look at `FlinkKafkaConsumerBase.setStartFromTimestamp(...)`. Points 1. and 2. will not work with the well established `FlinkKafkaConsumer`. However, it should be possible to do it with the new `KafkaSource` that was introduced in Flink 1.12. It might be a bit rough around the edged, though. With the `KafkaSource` you can specify `OffsetInitializers` for both the starting and stopping offset of the source. Take a look at `KafkaSource`, `KafkaSourceBuilder`, and `OffsetInitializers` in the code. I hope this helps. Best, Aljoscha On 2021/01/08 07:51, [hidden email] wrote: >Hi Flink Community Team, > >This is a desperate request for your help on below. > >I am new to the Flink and trying to use it with Kafka for Event-based data stream processing in my project. I am struggling using Flink to find solutions to my requirements of project below: > > > 1. Get all Kafka topic records at a given time point 't' (now or in >the past). Also how to pull latest-record only* from Kafka using Flink > 2. Getting all records from Kafka for a given time interval in the past between t1 & t2 time period. > 3. Continuously getting data from Kafka starting at a given time point (now or in the past). The client will actively cancel/close the data streaming. Examples: live dashboards. How to do it using Flink? >Please provide me sample "Flink code snippet" for pulling data from kafka for above three requirements and oblige. I am stuck for last one month without much progress and your timely help will be a savior for me! >Thanks & Regards, >Vinay Raichur >T-Systems India<https://www.t-systems.com/in/en> | Digital Solutions >Mail: [hidden email]<mailto:[hidden email]> >Mobile: +91 9739488992 > |
On 2021/01/13 07:58, [hidden email] wrote:
>Not sure about your proposal regarding Point 3: >* firstly how is it ensured that the stream is closed? If I understand >the doc correctly the stream will be established starting with the >latest timestamp (hmm... is it not a standard behaviour?) and will >never finish (UNBOUNDED), On the first question of standard behaviour: the default is to start from the group offsets that are available in Kafka. This uses the configured consumer group. I think it's better to be explicit, though, and specify sth like `EARLIEST` or `LATEST`, etc. And yes, the stream will start but never stop with this version of the Kafka connector. Only when you use the new `KafkaSource` can you also specify an end timestamp that will make the Kafka source shut down eventually. >* secondly it is still not clear how to get the latest event at a >given time point in the past? You are referring to getting a single record, correct? I don't think this is possible with Flink. All you can do is get a stream from Kafka that is potentially bounded by a start timestamp and/or end timestamp. Best, Aljoscha |
Ok. Attached is the PPT of what am attempting to achieve w.r.t. time
Hope I am all set to achieve the three bullets mentioned in attached slide to create reports with KafkaSource and KafkaBuilder approach. If you have any additional tips to share please do so after going through the slide attached (example for live dashboards use case) Kind regards Vinay -----Original Message----- From: Aljoscha Krettek <[hidden email]> Sent: 13 January 2021 14:06 To: [hidden email] Cc: RAICHUR, VINAY <[hidden email]> Subject: Re: Flink to get historical data from kafka between timespan t1 & t2 On 2021/01/13 07:58, [hidden email] wrote: >Not sure about your proposal regarding Point 3: >* firstly how is it ensured that the stream is closed? If I understand >the doc correctly the stream will be established starting with the >latest timestamp (hmm... is it not a standard behaviour?) and will >never finish (UNBOUNDED), On the first question of standard behaviour: the default is to start from the group offsets that are available in Kafka. This uses the configured consumer group. I think it's better to be explicit, though, and specify sth like `EARLIEST` or `LATEST`, etc. And yes, the stream will start but never stop with this version of the Kafka connector. Only when you use the new `KafkaSource` can you also specify an end timestamp that will make the Kafka source shut down eventually. >* secondly it is still not clear how to get the latest event at a >given time point in the past? You are referring to getting a single record, correct? I don't think this is possible with Flink. All you can do is get a stream from Kafka that is potentially bounded by a start timestamp and/or end timestamp. Best, Aljoscha Positioning_Use_Cases_TrackingData_Past_Now.pptx (125K) Download Attachment |
On 2021/01/13 12:07, [hidden email] wrote:
>Ok. Attached is the PPT of what am attempting to achieve w.r.t. time > >Hope I am all set to achieve the three bullets mentioned in attached >slide to create reports with KafkaSource and KafkaBuilder approach. > >If you have any additional tips to share please do so after going >through the slide attached (example for live dashboards use case) I think that should work with `KafkaSource`, yes. You just need to set the correct start timestamps and end timestamps, respectively. I believe that's all there is to it, off the top of my head I can't think of any additional tips. Best, Aljoscha |
Free forum by Nabble | Edit this page |