(DEPRECATED) Apache Flink User Mailing List archive.

Flink application with HBase

Classic

List

Threaded

3 messages Options

Thomas Lamirault

Flink application with HBase

Hello everybody,

I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of flatMap/keyBy/timeWindow/reduce to a HBase table.

I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better performance and cleaner code ? (Or equivalent ?)

Thanks,

B.R / Cordialement

Thomas Lamirault

Márton Balassi

Re: Flink application with HBase

Hi Thomas,

You can use both of the suggested solutions.

The benefit that you might get from HBaseOutputformat that it is already tested and integrated with Flink as opposed to you having to connect to HBase in a general SinkFunction.

Best,

Marton

On Dec 22, 2015 1:04 PM, "Thomas Lamirault" <[hidden email]> wrote:

Hello everybody,

I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of flatMap/keyBy/timeWindow/reduce to a HBase table.

I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better performance and cleaner code ? (Or equivalent ?)

Thanks,

B.R / Cordialement

Thomas Lamirault

Stephan Ewen

Re: Flink application with HBase

The OutputFormats (such as the HBaseOutputFormat) come originally from the DataSet API.

The work with DataStream, but the main difference to the SinkFunction is that have no way to let you implement custom checkpointing hooks. Since sinks interact with the outside works (side effect), they are by default not "exactly once", but only "at least once" in cases of failures when you use checkpointing.

If that works for your case, feel free to use the HBaseOutputFormat.

If you plan on adding custom exactly-once sink checkpointing logic (such as buffering data in the sink and committing only upon successful checkpoints), I would go for the SinkFunction.

Greetings,

Stephan

On Tue, Dec 22, 2015 at 1:45 PM, Márton Balassi <[hidden email]> wrote:

Hi Thomas,

You can use both of the suggested solutions.

The benefit that you might get from HBaseOutputformat that it is already tested and integrated with Flink as opposed to you having to connect to HBase in a general SinkFunction.

Best,

Marton

On Dec 22, 2015 1:04 PM, "Thomas Lamirault" <[hidden email]> wrote:

Hello everybody,

I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of flatMap/keyBy/timeWindow/reduce to a HBase table.

I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better performance and cleaner code ? (Or equivalent ?)

Thanks,

B.R / Cordialement

Thomas Lamirault