|
Hi Dinesh,
Did updating to Flink 1.10 resolve the issue?
Thanks,
— Ken Hi Andrey, Sure We will try to use Flink 1.10 to see if HA issues we are facing is fixed and update in this thread.
Thanks, Dinesh
On Thu, Apr 2, 2020 at 3:22 PM Andrey Zagrebin < [hidden email]> wrote: Hi Dinesh,
Thanks for sharing the logs. There were couple of HA fixes since 1.7, e.g. [1] and [2]. I would suggest to try Flink 1.10. If the problem persists, could you also find the logs of the failed Job Manager before the failover?
Best, Andrey
Hi Yang, I am attaching one full jobmanager log for a job which I reran today. This a job that tries to read from savepoint. Same error message "leader election onging" is displayed. And this stays the same even after 30 minutes. If I leave the job without yarn kill, it stays the same forever. Based on your suggestions till now, I guess it might be some zookeeper problem. If that is the case, what can I lookout for in zookeeper to figure out the issue?
Thanks, Dinesh
[snip]
-------------------------- Ken Krugler custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
|