Hi Team, I recently encountered one usecase in my project as described below: My data source is HBase We receive huge volume of data at very high speed to HBase tables from source system. Need to read from HBase, perform computation and insert to postgreSQL. I would like few inputs on the below points:
Happy Christmas to all :) Regards, Sunitha. |
Hi Team, Kindly help me with some inputs.. I am using Flink 1.12. Regards, Sunitha.
On Thursday, December 24, 2020, 08:34:00 PM GMT+5:30, [hidden email] <[hidden email]> wrote:
Hi Team, I recently encountered one usecase in my project as described below: My data source is HBase We receive huge volume of data at very high speed to HBase tables from source system. Need to read from HBase, perform computation and insert to postgreSQL. I would like few inputs on the below points:
Happy Christmas to all :) Regards, Sunitha. |
I would suggest another approach here. 1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka. 2.Flink streaming job would be the second job to read for kafka and process data. With the separation of the concern as above , maintaining it would be simpler. Thanks Deepak
|
Thanks Deepak. Does this mean Streaming from HBase is not possible using current Streaming API? Also request you to shred some light on HBase checkpointing. I referred the below URL to implement checkpointing however in the example I see count is passed in the SourceFunction ( SourceFunction<Long>) Is it possible to checkpoint based on the data we read from HBase Regards, Sunitha.
On Monday, December 28, 2020, 10:51:45 AM GMT+5:30, Deepak Sharma <[hidden email]> wrote:
I would suggest another approach here. 1.Write a job that reads from hbase , checkpoints and pushes the data to broker such as Kafka. 2.Flink streaming job would be the second job to read for kafka and process data. With the separation of the concern as above , maintaining it would be simpler. Thanks Deepak
|
Hi Sunitha, The current HBase connector only works continuously with Table API/SQL. If you use the input format, it only reads the data once as you have found out. What you can do is to implement your own source that repeatedly polls data and uses pagination or filters to poll only new data. You would add the last read offset to the checkpoint data of your source. If you are using Flink 1.12, I'd strongly recommend to use the new source interface [1].
-- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Free forum by Nabble | Edit this page |