(DEPRECATED) Apache Flink User Mailing List archive.

Last batch of stream data could not be sinked when data comes very slow

Classic

List

Threaded

3 messages Options

徐涛

Last batch of stream data could not be sinked when data comes very slow

Hi Experts,

When we implement a sink, usually we implement a batch, according to the record number or when reaching a time interval, however this may lead to data of last batch do not write to sink. Because it is triggered by the incoming record.

I also test the JDBCOutputFormat provided by flink, and found that it also has the same problem. If the batch size is 50, and 49 items arrive, but the last one comes in an hour later, then the 49 items will not be written to sink during the one hour. This may cause data delay or data loss.

So should any pose a solution to this problem?

Thanks a lot.

Best

Henry

Tzu-Li (Gordon) Tai

Re: Last batch of stream data could not be sinked when data comes very slow

Hi Henry,

Flushing of buffered data in sinks should occur on two occasions - 1) when some buffer size limit is reached or a fixed-flush interval is fired, and 2) on checkpoints.

Flushing any pending data before completing a checkpoint ensures the sink has at-least-once guarantees, so that should answer your question about data loss.

For data delay due to the buffering, my only suggestion would be to have a time-interval based flushing configuration.

That is what is currently happening, for example, in the Kafka / Kinesis producer sinks. Records are buffered, and flushed at fixed intervals or when the buffer is full. They are also flushed on every checkpoint.

Cheers,

Gordon

On 13 November 2018 at 5:07:32 PM, 徐涛 ([hidden email]) wrote:

Hi Experts,
When we implement a sink, usually we implement a batch, according to the record number or when reaching a time interval, however this may lead to data of last batch do not write to sink. Because it is triggered by the incoming record.

I also test the JDBCOutputFormat provided by flink, and found that it also has the same problem. If the batch size is 50, and 49 items arrive, but the last one comes in an hour later, then the 49 items will not be written to sink during the one hour. This may cause data delay or data loss.

So should any pose a solution to this problem?

Thanks a lot.

Best

Henry

徐涛

Re: Last batch of stream data could not be sinked when data comes very slow

Hi Gordon,

Later I found the implementation of the Elastic Search Sink provided by Flink, and I found it also use the mechanism to flush the data when checkpoints happens. I apply the method, now the problem is solved. It uses exactly the method you have provided. Thanks a lot for your help.

Best

Henry

在 2018年11月14日，下午5:08，Tzu-Li (Gordon) Tai <[hidden email]> 写道：

Hi Henry,

Flushing of buffered data in sinks should occur on two occasions - 1) when some buffer size limit is reached or a fixed-flush interval is fired, and 2) on checkpoints.

Flushing any pending data before completing a checkpoint ensures the sink has at-least-once guarantees, so that should answer your question about data loss.
For data delay due to the buffering, my only suggestion would be to have a time-interval based flushing configuration.
That is what is currently happening, for example, in the Kafka / Kinesis producer sinks. Records are buffered, and flushed at fixed intervals or when the buffer is full. They are also flushed on every checkpoint.

Cheers,
Gordon

On 13 November 2018 at 5:07:32 PM, 徐涛 ([hidden email]) wrote:
Hi Experts,
When we implement a sink, usually we implement a batch, according to the record number or when reaching a time interval, however this may lead to data of last batch do not write to sink. Because it is triggered by the incoming record.
I also test the JDBCOutputFormat provided by flink, and found that it also has the same problem. If the batch size is 50, and 49 items arrive, but the last one comes in an hour later, then the 49 items will not be written to sink during the one hour. This may cause data delay or data loss.
So should any pose a solution to this problem?
Thanks a lot.

Best
Henry