Hello everybody,
I am using Flink (0.10.1) with a streaming source (Kafka) , and I write results of flatMap/keyBy/timeWindow/reduce to a HBase table.
I have try with a class (Sinkclass) who implements SinkFunction<MyObject>, and a class (HBaseOutputFormat) who implements OutputFormat<MyObject>. For you, it's better to use the Sinkclass or HBaseOutputFormat, for better
performance and cleaner code ? (Or equivalent ?)
Thanks,
B.R / Cordialement
Thomas Lamirault
|
Hi Thomas, You can use both of the suggested solutions. The benefit that you might get from HBaseOutputformat that it is already tested and integrated with Flink as opposed to you having to connect to HBase in a general SinkFunction. Best, Marton On Dec 22, 2015 1:04 PM, "Thomas Lamirault" <[hidden email]> wrote:
|
The OutputFormats (such as the HBaseOutputFormat) come originally from the DataSet API. The work with DataStream, but the main difference to the SinkFunction is that have no way to let you implement custom checkpointing hooks. Since sinks interact with the outside works (side effect), they are by default not "exactly once", but only "at least once" in cases of failures when you use checkpointing. If that works for your case, feel free to use the HBaseOutputFormat. If you plan on adding custom exactly-once sink checkpointing logic (such as buffering data in the sink and committing only upon successful checkpoints), I would go for the SinkFunction. Greetings, Stephan On Tue, Dec 22, 2015 at 1:45 PM, Márton Balassi <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |