Hello Flink expert, We have a pipeline that read both bounded and unbounded sources and our understanding is that when the bounded sources complete they should get a watermark of +inf and then we should be able to take a savepoint and safely restart the pipeline. However, we have source that never get watermarks and we are confused as to what we are seeing and what we should expect Cam Mach |
Hi Cam, could you share a bit more details about your job (e.g. which sources are you using, what are your settings, etc.). Ideally you can provide a minimal example in order to better understand the program. From a high level perspective, there might be different problems: First of all, Flink does not support checkpointing/taking a savepoint if some of the job's operator have already terminated iirc. But your description points rather into the direction that your bounded source does not terminate. So maybe you are reading a file via StreamExecutionEnvironment.createFileInput with FileProcessingMode.PROCESS_CONTINUOUSLY. But these things are hard to tell without a better understanding of your job. Cheers, Till On Mon, Jan 13, 2020 at 8:35 PM Cam Mach <[hidden email]> wrote:
|
Hi Till, Thanks for your response. Our sources are S3 and Kinesis. We have run several tests, and we are able to take savepoint/checkpoint, but only when S3 complete reading. And at that point, our pipeline has watermarks for other operators, but not the source operator. We are not running `PROCESS_CONTINUOUSLY`, so we should have watermark for the source as well, right? Attached is snapshot of our pipeline. Thanks On Tue, Jan 14, 2020 at 10:43 AM Till Rohrmann <[hidden email]> wrote:
|
Hi, Cam,
I think you might want to know why the web page does not show the watermark of the source. Currently, the web only shows the "input" watermark. The source only outputs the watermark so the web shows you that there is "No Watermark". Actually Flink has "output" watermark metrics. I think Flink should also show this information on the web. Would you mind open a Jira to track this? Best, Guowei Cam Mach <[hidden email]> 于2020年1月15日周三 上午4:05写道:
|
Hi Guowei, Thanks for your response. What I understand from you, one operator has two watermarks? If so, one operator's output watermark would be an input watermark of the next operator? Does it sounds redundant? And, yes we want to understand when we should expect to see watermarks for our "combined" sources (bounded and un-bounded) for our pipeline? If you can be more directly, it would be very helpful. Thanks, On Tue, Jan 14, 2020 at 5:42 PM Guowei Ma <[hidden email]> wrote:
|
>>What I understand from you, one operator has two watermarks? If so, one operator's output watermark would be an input watermark of the next operator? Does it sounds redundant? There are no two watermarks for an operator. What I want to say is "watermark metrics". >>Or do you mean the Web UI only show the input watermarks of every operator, but since the source doesn't have input watermark show it show "No Watermark" ? And we should have output watermark for source? Yes. But the web UI only shows the task level watermarks metrics, not the operator level. Yout could find more detail information about metrics in the link[1]. >>And, yes we want to understand when we should expect to see watermarks for our "combined" sources (bounded and un-bounded) for our pipeline? Do you try a topology with only Kinesis source and the web UI shows the Watermark of source? Actually, I think it might not be related to the "combined" source. Best, Guowei Cam Mach <[hidden email]> 于2020年1月15日周三 下午3:53写道:
|
Free forum by Nabble | Edit this page |