Flink Batch Processing

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Batch Processing

s_penakalapati@yahoo.com
Hi All,

Need your help in Flink Batch processing: scenario described below:

we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.
thresholds can be set by the owner for each vehicle. 

Say: we have 3 vehicles, threshold is set for 2 vehicles. 
Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
vehicle 3  no threshold set by owner.

All the vehicle data is stored in HBase tables. We have a scheduled Batch Job every day at 12 pm to check the status of vehicle movement and Petrol consumption against threshold and raise an alert (vehicle1 did not move for past 20 hours, vehicle 2 consumed more petrol. )

Since it is a Batch Job, I loaded all threshold data in one DataSet and HBase Data in another Dataset using HbaseInputFormat.

What I am failing to figure out is:
1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has threshold of 35 hours, I need to fetch data from Hbase for different scenario. Is there any better approach to get all data using one Hbase connection.
2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed only on DataStream. Please help me with a simple example. (alert can be raised if count is zero or like petrol consumption is too high)


I could not get any example for Dataset on google where an alert is raised. Kindly guide me if there is any better approach

Regards,
Sunitha.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Batch Processing

Piotr Nowojski-4
Hi Sunitha,

First and foremost, the DataSet API will be deprecated soon [1] so I would suggest trying to migrate to the DataStream API. When using the DataStream API it doesn't mean that you can not work with bounded inputs - you can. Flink SQL (Blink planner) is in fact using DataStream API to execute both streaming and batch queries. Maybe this path would be easier?

And about answering your question using the DataSet API - sorry, I don't know it :( I will try to ping someone who could help here.

Piotrek


pon., 28 wrz 2020 o 15:14 [hidden email] <[hidden email]> napisał(a):
Hi All,

Need your help in Flink Batch processing: scenario described below:

we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.
thresholds can be set by the owner for each vehicle. 

Say: we have 3 vehicles, threshold is set for 2 vehicles. 
Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
vehicle 3  no threshold set by owner.

All the vehicle data is stored in HBase tables. We have a scheduled Batch Job every day at 12 pm to check the status of vehicle movement and Petrol consumption against threshold and raise an alert (vehicle1 did not move for past 20 hours, vehicle 2 consumed more petrol. )

Since it is a Batch Job, I loaded all threshold data in one DataSet and HBase Data in another Dataset using HbaseInputFormat.

What I am failing to figure out is:
1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has threshold of 35 hours, I need to fetch data from Hbase for different scenario. Is there any better approach to get all data using one Hbase connection.
2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed only on DataStream. Please help me with a simple example. (alert can be raised if count is zero or like petrol consumption is too high)


I could not get any example for Dataset on google where an alert is raised. Kindly guide me if there is any better approach

Regards,
Sunitha.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Batch Processing

s_penakalapati@yahoo.com
Hi Piotrek,

Thank you for the reply.

Flink changes are good, However Flink is changing so much that we are unable to get any good implementation examples either on Flink documents or any other website.

Using HBaseInputFormat I was able to read the data as a DataSet<>, now I see that DataSet would be deprecated.

In recent release Flink 1.11.1 I see Blink planner, but I was not able to get one example on how to connect to HBase and read data. Is there any link I can refer to see some implementation of reading from HBase as bounded data using Blink Planner/DataStream API.

Regards,
Sunitha.



On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski <[hidden email]> wrote:


Hi Sunitha,

First and foremost, the DataSet API will be deprecated soon [1] so I would suggest trying to migrate to the DataStream API. When using the DataStream API it doesn't mean that you can not work with bounded inputs - you can. Flink SQL (Blink planner) is in fact using DataStream API to execute both streaming and batch queries. Maybe this path would be easier?

And about answering your question using the DataSet API - sorry, I don't know it :( I will try to ping someone who could help here.

Piotrek


pon., 28 wrz 2020 o 15:14 [hidden email] <[hidden email]> napisał(a):
Hi All,

Need your help in Flink Batch processing: scenario described below:

we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.
thresholds can be set by the owner for each vehicle. 

Say: we have 3 vehicles, threshold is set for 2 vehicles. 
Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
vehicle 3  no threshold set by owner.

All the vehicle data is stored in HBase tables. We have a scheduled Batch Job every day at 12 pm to check the status of vehicle movement and Petrol consumption against threshold and raise an alert (vehicle1 did not move for past 20 hours, vehicle 2 consumed more petrol. )

Since it is a Batch Job, I loaded all threshold data in one DataSet and HBase Data in another Dataset using HbaseInputFormat.

What I am failing to figure out is:
1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has threshold of 35 hours, I need to fetch data from Hbase for different scenario. Is there any better approach to get all data using one Hbase connection.
2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed only on DataStream. Please help me with a simple example. (alert can be raised if count is zero or like petrol consumption is too high)


I could not get any example for Dataset on google where an alert is raised. Kindly guide me if there is any better approach

Regards,
Sunitha.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Batch Processing

Till Rohrmann

On Tue, Sep 29, 2020 at 9:16 AM [hidden email] <[hidden email]> wrote:
Hi Piotrek,

Thank you for the reply.

Flink changes are good, However Flink is changing so much that we are unable to get any good implementation examples either on Flink documents or any other website.

Using HBaseInputFormat I was able to read the data as a DataSet<>, now I see that DataSet would be deprecated.

In recent release Flink 1.11.1 I see Blink planner, but I was not able to get one example on how to connect to HBase and read data. Is there any link I can refer to see some implementation of reading from HBase as bounded data using Blink Planner/DataStream API.

Regards,
Sunitha.



On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski <[hidden email]> wrote:


Hi Sunitha,

First and foremost, the DataSet API will be deprecated soon [1] so I would suggest trying to migrate to the DataStream API. When using the DataStream API it doesn't mean that you can not work with bounded inputs - you can. Flink SQL (Blink planner) is in fact using DataStream API to execute both streaming and batch queries. Maybe this path would be easier?

And about answering your question using the DataSet API - sorry, I don't know it :( I will try to ping someone who could help here.

Piotrek


pon., 28 wrz 2020 o 15:14 [hidden email] <[hidden email]> napisał(a):
Hi All,

Need your help in Flink Batch processing: scenario described below:

we have multiple vehicles, we get data from each vehicle at a very high speed, 1 record per minute.
thresholds can be set by the owner for each vehicle. 

Say: we have 3 vehicles, threshold is set for 2 vehicles. 
Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
vehicle 3  no threshold set by owner.

All the vehicle data is stored in HBase tables. We have a scheduled Batch Job every day at 12 pm to check the status of vehicle movement and Petrol consumption against threshold and raise an alert (vehicle1 did not move for past 20 hours, vehicle 2 consumed more petrol. )

Since it is a Batch Job, I loaded all threshold data in one DataSet and HBase Data in another Dataset using HbaseInputFormat.

What I am failing to figure out is:
1> vehicle 1 is having threshold of 20 hours where as vehicle 2 has threshold of 35 hours, I need to fetch data from Hbase for different scenario. Is there any better approach to get all data using one Hbase connection.
2> how to apply alert on Dataset.  CEP pattern/ Match_recognize is allowed only on DataStream. Please help me with a simple example. (alert can be raised if count is zero or like petrol consumption is too high)


I could not get any example for Dataset on google where an alert is raised. Kindly guide me if there is any better approach

Regards,
Sunitha.
Reply | Threaded
Open this post in threaded view
|

Re: Flink Batch Processing

Timo Walther
Hi Sunitha,

currently, not every connector can be mixed with every API. I agree that
it is confusing from time to time. The HBase connector is an
InputFormat. DataSet, DataStream and Table API can work with
InputFormats. The current Hbase input format might work best with Table
API. If you like to use CEP API, you can use Table API
(StreamTableEnvironment) to read from Hbase and call `toAppendStream`
directly afterwards to further process in DataStream API. This works
also for bounded streams thus you can do "batch" processing.

Regards,
Timo


On 29.09.20 09:56, Till Rohrmann wrote:

> Hi Sunitha,
>
> here is some documentation about how to use the Hbase sink with Flink
> [1, 2].
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/connectors/hbase.html
> [2]
> https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-hbase-connector.html
>
> Cheers,
> Till
>
> On Tue, Sep 29, 2020 at 9:16 AM [hidden email]
> <mailto:[hidden email]> <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Piotrek,
>
>     Thank you for the reply.
>
>     Flink changes are good, However Flink is changing so much that we
>     are unable to get any good implementation examples either on Flink
>     documents or any other website.
>
>     Using HBaseInputFormat I was able to read the data as a DataSet<>,
>     now I see that DataSet would be deprecated.
>
>     In recent release Flink 1.11.1 I see Blink planner, but I was not
>     able to get one example on how to connect to HBase and read data. Is
>     there any link I can refer to see some implementation of reading
>     from HBase as bounded data using Blink Planner/DataStream API.
>
>     Regards,
>     Sunitha.
>
>
>
>     On Monday, September 28, 2020, 07:12:19 PM GMT+5:30, Piotr Nowojski
>     <[hidden email] <mailto:[hidden email]>> wrote:
>
>
>     Hi Sunitha,
>
>     First and foremost, the DataSet API will be deprecated soon [1] so I
>     would suggest trying to migrate to the DataStream API. When using
>     the DataStream API it doesn't mean that you can not work with
>     bounded inputs - you can. Flink SQL (Blink planner) is in fact using
>     DataStream API to execute both streaming and batch queries. Maybe
>     this path would be easier?
>
>     And about answering your question using the DataSet API - sorry, I
>     don't know it :( I will try to ping someone who could help here.
>
>     Piotrek
>
>     [1]
>     https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=158866741
>
>     pon., 28 wrz 2020 o 15:14 [hidden email]
>     <mailto:[hidden email]> <[hidden email]
>     <mailto:[hidden email]>> napisał(a):
>
>         Hi All,
>
>         Need your help in Flink Batch processing: scenario described below:
>
>         we have multiple vehicles, we get data from each vehicle at a
>         very high speed, 1 record per minute.
>         thresholds can be set by the owner for each vehicle.
>
>         Say: we have 3 vehicles, threshold is set for 2 vehicles.
>         Vehicle 1, threshold 20 hours, allowedPetrolConsumption=15
>         vehicle 2, threshold 35 hours, allowedPetrolConsumption=28
>         vehicle 3  no threshold set by owner.
>
>         All the vehicle data is stored in HBase tables. We have a
>         scheduled Batch Job every day at 12 pm to check the status of
>         vehicle movement and Petrol consumption against threshold and
>         raise an alert (vehicle1 did not move for past 20 hours, vehicle
>         2 consumed more petrol. )
>
>         Since it is a Batch Job, I loaded all threshold data in one
>         DataSet and HBase Data in another Dataset using HbaseInputFormat.
>
>         What I am failing to figure out is:
>         1> vehicle 1 is having threshold of 20 hours where as vehicle 2
>         has threshold of 35 hours, I need to fetch data from Hbase for
>         different scenario. Is there any better approach to get all data
>         using one Hbase connection.
>         2> how to apply alert on Dataset.  CEP pattern/ Match_recognize
>         is allowed only on DataStream. Please help me with a simple
>         example. (alert can be raised if count is zero or like petrol
>         consumption is too high)
>
>
>         I could not get any example for Dataset on google where an alert
>         is raised. Kindly guide me if there is any better approach
>
>         Regards,
>         Sunitha.
>