Flink application with HBase

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink application with HBase

Thomas Lamirault
Hello everybody,

I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of  flatMap/keyBy/timeWindow/reduce to a HBase table.
I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better performance and cleaner code ? (Or equivalent ?)

Thanks,

B.R / Cordialement

Thomas Lamirault
Reply | Threaded
Open this post in threaded view
|

Re: Flink application with HBase

Márton Balassi

Hi Thomas,

You can use both of the suggested solutions.

The benefit that you might get from HBaseOutputformat that it is already tested and integrated with Flink as opposed to you having to connect to HBase in a general SinkFunction.

Best,

Marton

On Dec 22, 2015 1:04 PM, "Thomas Lamirault" <[hidden email]> wrote:
Hello everybody,

I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of  flatMap/keyBy/timeWindow/reduce to a HBase table.
I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better performance and cleaner code ? (Or equivalent ?)

Thanks,

B.R / Cordialement

Thomas Lamirault
Reply | Threaded
Open this post in threaded view
|

Re: Flink application with HBase

Stephan Ewen
The OutputFormats (such as the HBaseOutputFormat) come originally from the DataSet API.

The work with DataStream, but the main difference to the SinkFunction is that have no way to let you implement custom checkpointing hooks. Since sinks interact with the outside works (side effect), they are by default not "exactly once", but only "at least once" in cases of failures when you use checkpointing.

If that works for your case, feel free to use the HBaseOutputFormat.

If you plan on adding custom exactly-once sink checkpointing logic (such as buffering data in the sink and committing only upon successful checkpoints), I would go for the SinkFunction.

Greetings,
Stephan





On Tue, Dec 22, 2015 at 1:45 PM, Márton Balassi <[hidden email]> wrote:

Hi Thomas,

You can use both of the suggested solutions.

The benefit that you might get from HBaseOutputformat that it is already tested and integrated with Flink as opposed to you having to connect to HBase in a general SinkFunction.

Best,

Marton

On Dec 22, 2015 1:04 PM, "Thomas Lamirault" <[hidden email]> wrote:
Hello everybody,

I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of  flatMap/keyBy/timeWindow/reduce to a HBase table.
I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better performance and cleaner code ? (Or equivalent ?)

Thanks,

B.R / Cordialement

Thomas Lamirault