Good tutorial troubleshoot and reading logs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Good tutorial troubleshoot and reading logs

Noah
Hi there,

I see there is a good page tutorial out there that explains flink's
logging.  Specifically I am seeing hung jobs and would like to
understand more about what is causing the jobs to hang.  Also more
details about the checkpoint logging.

Cheers
Reply | Threaded
Open this post in threaded view
|

Re: Good tutorial troubleshoot and reading logs

rmetzger0
Hi Noah,

sadly there's no generic guide on how to approach Flink logs. 
What exactly do you mean by "the job hangs"?
Did you verify via the metrics that it is not making any progress anymore at all? If so, are all operators affected, or just some?

If your Flink cluster really is stuck, and you are certain that the sources are receiving data, then I'd suggest to to a ThreadDump of some TaskManagers to see where they are stuck.

Best,
Robert



On Mon, Nov 2, 2020 at 11:09 PM Noah <[hidden email]> wrote:
Hi there,

I see there is a good page tutorial out there that explains flink's
logging.  Specifically I am seeing hung jobs and would like to
understand more about what is causing the jobs to hang.  Also more
details about the checkpoint logging.

Cheers