is it possible one task manager stuck and still fetching data from Kinesis?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

is it possible one task manager stuck and still fetching data from Kinesis?

Terry Chia-Wei Wu
We are running Flink 1.10 about 900+ task managers with kinesis as an input stream. The problem we are having now is that only Max Age of kinesis shard is growing and the average age of that kinesis is very low meaning most of shards having very low age. We already checked the data skew issue but it's quite uniformly distributed. Any idea how this can happen and how to debug on this issue? I'm wondering is it possible to have one TM's operator stuck and source still fetching data so that Kinesis's age still going high. 

Terry


Reply | Threaded
Open this post in threaded view
|

Re: is it possible one task manager stuck and still fetching data from Kinesis?

Till Rohrmann
Hi Terry,

I am not a Kinesis expert that's why I've pulled in Thomas and Max who might know more about Flink's Kinesis behaviour. What could help, though, would be access to the Flink cluster logs to see whether something fishy is going on.

Cheers,
Till

On Fri, Jul 31, 2020 at 4:41 AM Terry Chia-Wei Wu <[hidden email]> wrote:
We are running Flink 1.10 about 900+ task managers with kinesis as an input stream. The problem we are having now is that only Max Age of kinesis shard is growing and the average age of that kinesis is very low meaning most of shards having very low age. We already checked the data skew issue but it's quite uniformly distributed. Any idea how this can happen and how to debug on this issue? I'm wondering is it possible to have one TM's operator stuck and source still fetching data so that Kinesis's age still going high. 

Terry