Realtime Data processing from HBase

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Realtime Data processing from HBase

s_penakalapati@yahoo.com
Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
  • Using Flink streaming API,  is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment  and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
  • If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach. 
  • If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all  :)


Regards,
Sunitha.

Reply | Threaded
Open this post in threaded view
|

Re: Realtime Data processing from HBase

s_penakalapati@yahoo.com
Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:


Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
  • Using Flink streaming API,  is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment  and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
  • If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach. 
  • If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all  :)


Regards,
Sunitha.

Reply | Threaded
Open this post in threaded view
|

Re: Realtime Data processing from HBase

Deepak Sharma
I would suggest another approach here.
1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka.
2.Flink streaming job would be the second job to read for kafka and process data.

With the separation of the concern as above , maintaining it would be simpler.

Thanks
Deepak

On Mon, Dec 28, 2020 at 10:42 AM [hidden email] <[hidden email]> wrote:
Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:


Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
  • Using Flink streaming API,  is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment  and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
  • If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach. 
  • If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all  :)


Regards,
Sunitha.



--
Reply | Threaded
Open this post in threaded view
|

Re: Realtime Data processing from HBase

s_penakalapati@yahoo.com
Thanks Deepak. 

Does this mean Streaming from HBase is not possible using current Streaming API?

Also request you to shred some light on HBase checkpointing. I referred the below URL to implement checkpointing however in the example I see count is passed in the SourceFunction ( SourceFunction<Long>) Is it possible to checkpoint based on the data we read from HBase  


Regards,
Sunitha.

On Monday, December 28, 2020, 10:51:45 AM GMT+5:30, Deepak Sharma <[hidden email]> wrote:


I would suggest another approach here.
1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka.
2.Flink streaming job would be the second job to read for kafka and process data.

With the separation of the concern as above , maintaining it would be simpler.

Thanks
Deepak

On Mon, Dec 28, 2020 at 10:42 AM [hidden email] <[hidden email]> wrote:
Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:


Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
  • Using Flink streaming API,  is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment  and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
  • If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach. 
  • If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all  :)


Regards,
Sunitha.



--
Reply | Threaded
Open this post in threaded view
|

Re: Realtime Data processing from HBase

Arvid Heise-3
Hi Sunitha,

The current HBase connector only works continuously with Table API/SQL. If you use the input format, it only reads the data once as you have found out.

What you can do is to implement your own source that repeatedly polls data and uses pagination or filters to poll only new data. You would add the last read offset to the checkpoint data of your source.

If you are using Flink 1.12, I'd strongly recommend to use the new source interface [1].


On Mon, Dec 28, 2020 at 6:43 AM [hidden email] <[hidden email]> wrote:
Thanks Deepak. 

Does this mean Streaming from HBase is not possible using current Streaming API?

Also request you to shred some light on HBase checkpointing. I referred the below URL to implement checkpointing however in the example I see count is passed in the SourceFunction ( SourceFunction<Long>) Is it possible to checkpoint based on the data we read from HBase  


Regards,
Sunitha.

On Monday, December 28, 2020, 10:51:45 AM GMT+5:30, Deepak Sharma <[hidden email]> wrote:


I would suggest another approach here.
1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka.
2.Flink streaming job would be the second job to read for kafka and process data.

With the separation of the concern as above , maintaining it would be simpler.

Thanks
Deepak

On Mon, Dec 28, 2020 at 10:42 AM [hidden email] <[hidden email]> wrote:
Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:


Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
  • Using Flink streaming API,  is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment  and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
  • If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach. 
  • If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all  :)


Regards,
Sunitha.



--


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng