(DEPRECATED) Apache Flink User Mailing List archive.

Realtime Data processing from HBase

Classic

List

Threaded

5 messages Options

s_penakalapati@yahoo.com

Realtime Data processing from HBase

Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase

We receive huge volume of data at very high speed to HBase tables from source system.

Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:

Using Flink streaming API, is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach.
If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all :)

Regards,

Sunitha.

s_penakalapati@yahoo.com

Re: Realtime Data processing from HBase

Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,

Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:

Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase

We receive huge volume of data at very high speed to HBase tables from source system.

Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:

Using Flink streaming API, is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach.
If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all :)

Regards,

Sunitha.

Deepak Sharma

Re: Realtime Data processing from HBase

I would suggest another approach here.

1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka.

2.Flink streaming job would be the second job to read for kafka and process data.

With the separation of the concern as above , maintaining it would be simpler.

Thanks

Deepak

On Mon, Dec 28, 2020 at 10:42 AM [hidden email] <[hidden email]> wrote:

Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:

Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
Using Flink streaming API, is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach.
If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all :)

Regards,
Sunitha.

Thanks
Deepak
www.bigdatabig.com
www.keosha.net

s_penakalapati@yahoo.com

Re: Realtime Data processing from HBase

Thanks Deepak.

Does this mean Streaming from HBase is not possible using current Streaming API?

Also request you to shred some light on HBase checkpointing. I referred the below URL to implement checkpointing however in the example I see count is passed in the SourceFunction ( SourceFunction<Long>) Is it possible to checkpoint based on the data we read from HBase

https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.html

Regards,

Sunitha.

On Monday, December 28, 2020, 10:51:45 AM GMT+5:30, Deepak Sharma <[hidden email]> wrote:

I would suggest another approach here.

1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka.

2.Flink streaming job would be the second job to read for kafka and process data.

With the separation of the concern as above , maintaining it would be simpler.

Thanks

Deepak

On Mon, Dec 28, 2020 at 10:42 AM [hidden email] <[hidden email]> wrote:

Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:

Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
Using Flink streaming API, is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach.
If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all :)

Regards,
Sunitha.

Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Arvid Heise-3

Re: Realtime Data processing from HBase

Hi Sunitha,

The current HBase connector only works continuously with Table API/SQL. If you use the input format, it only reads the data once as you have found out.

What you can do is to implement your own source that repeatedly polls data and uses pagination or filters to poll only new data. You would add the last read offset to the checkpoint data of your source.

If you are using Flink 1.12, I'd strongly recommend to use the new source interface [1].

[1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/sources.html

On Mon, Dec 28, 2020 at 6:43 AM [hidden email] <[hidden email]> wrote:

Thanks Deepak.

Does this mean Streaming from HBase is not possible using current Streaming API?

Also request you to shred some light on HBase checkpointing. I referred the below URL to implement checkpointing however in the example I see count is passed in the SourceFunction ( SourceFunction<Long>) Is it possible to checkpoint based on the data we read from HBase

https://ci.apache.org/projects/flink/flink-docs-release-1.12/api/java/org/apache/flink/streaming/connectors/twitter/TwitterSource.html

Regards,
Sunitha.

On Monday, December 28, 2020, 10:51:45 AM GMT+5:30, Deepak Sharma <[hidden email]> wrote:

I would suggest another approach here.
1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka.
2.Flink streaming job would be the second job to read for kafka and process data.

With the separation of the concern as above , maintaining it would be simpler.

Thanks
Deepak

On Mon, Dec 28, 2020 at 10:42 AM [hidden email] <[hidden email]> wrote:

Hi Team,

Kindly help me with some inputs.. I am using Flink 1.12.

Regards,
Sunitha.

On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:

Hi Team,

I recently encountered one usecase in my project as described below:

My data source is HBase
We receive huge volume of data at very high speed to HBase tables from source system.
Need to read from HBase, perform computation and insert to postgreSQL.

I would like few inputs on the below points:
Using Flink streaming API, is continuous streaming possible from HBase Database? As I tried using RichSourceFunction ,StreamExecutionEnvironment and was able to read data but Job stops once all data is read from HBase. My requirement is Job should be continuously executing and read data as and when data arrives to HBase table.
If continuous streaming from HBase is supported, How can Checkpointing be done on HBase so that Job can be restarted from the pointed where Job aborted. I tried googling but no luck. Request to help with any simple example or approach.
If continuous streaming from HBase is not supported then what should be alternative approach - Batch Job?(Our requirement is to process the realtime data from HBase and not to launch multiple ETL Job)

Happy Christmas to all :)

Regards,
Sunitha.

--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net

Arvid Heise | Senior Java Developer

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng