Flink Jobs disappers

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink Jobs disappers

G.S.Vijay Raajaa
HI,

I am using Flink Task manager and Job Manager as docker containers. Strangely, I find the jobs to disappear from the web portal after some time. The jobs don't move to the failed state either. Any pointers will be really helpful. Not able to get a clue from the logs.

Kindly let me know if I need specific tuning and ways to persists the uploaded jars.

Regards,
Vijay Raajaa G S
Reply | Threaded
Open this post in threaded view
|

Re: Flink Jobs disappers

Chesnay Schepler
Hello,

could you tell us a bit more about your setup? Which Flink version
you're using, whether HA is enabled, does this happen every time etc. .
Regards,
Chesnay

On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:

> HI,
>
> I am using Flink Task manager and Job Manager as docker containers.
> Strangely, I find the jobs to disappear from the web portal after some
> time. The jobs don't move to the failed state either. Any pointers
> will be really helpful. Not able to get a clue from the logs.
>
> Kindly let me know if I need specific tuning and ways to persists the
> uploaded jars.
>
> Regards,
> Vijay Raajaa G S


Reply | Threaded
Open this post in threaded view
|

Re: Flink Jobs disappers

G.S.Vijay Raajaa
HI Chesnay,


I am currently using Flink - 1.3 using docker containers. I am not using it in HA mode. I have 3 task managers and one job manager. This happens randomly and not every time. Does it mean the task manager ran out of memory etc? I am using slots more than the available core , I hope compute is shared in round robin. Any pointers to tuning and HA setup will be greatly appreciated.

Regards,
Vijay Raajaa GS 

On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <[hidden email]> wrote:
Hello,

could you tell us a bit more about your setup? Which Flink version you're using, whether HA is enabled, does this happen every time etc. .
Regards,
Chesnay


On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
HI,

I am using Flink Task manager and Job Manager as docker containers. Strangely, I find the jobs to disappear from the web portal after some time. The jobs don't move to the failed state either. Any pointers will be really helpful. Not able to get a clue from the logs.

Kindly let me know if I need specific tuning and ways to persists the uploaded jars.

Regards,
Vijay Raajaa G S



Reply | Threaded
Open this post in threaded view
|

Re: Flink Jobs disappers

Chesnay Schepler
If a TaskManager ran out of memory there should be something in the JobManager logs about a unreachable TaskManager.
That said, there should also be something in the JobManager logs about the job disappearing...

Could you set the logging level to DEBUG, run the job again, and provide us (or me directly) with the logs?

Regards,
Chesnay

On 08.07.2017 08:44, G.S.Vijay Raajaa wrote:
HI Chesnay,


I am currently using Flink - 1.3 using docker containers. I am not using it in HA mode. I have 3 task managers and one job manager. This happens randomly and not every time. Does it mean the task manager ran out of memory etc? I am using slots more than the available core , I hope compute is shared in round robin. Any pointers to tuning and HA setup will be greatly appreciated.

Regards,
Vijay Raajaa GS 

On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <[hidden email]> wrote:
Hello,

could you tell us a bit more about your setup? Which Flink version you're using, whether HA is enabled, does this happen every time etc. .
Regards,
Chesnay


On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
HI,

I am using Flink Task manager and Job Manager as docker containers. Strangely, I find the jobs to disappear from the web portal after some time. The jobs don't move to the failed state either. Any pointers will be really helpful. Not able to get a clue from the logs.

Kindly let me know if I need specific tuning and ways to persists the uploaded jars.

Regards,
Vijay Raajaa G S




Reply | Threaded
Open this post in threaded view
|

Re: Flink Jobs disappers

Joshua Griffith
Are your containers on separate nodes? Are you running in Kubernetes? Have you set hard resource limits?

When I’ve run into this issue it’s been because the JobManager was restarted (I wasn’t running in HA mode). Your node could have been restarted or Docker could have OOM-killed the process if the machine was low on memory. You might want to `docker ps` to see if your containers are restarting. Exit code 137 probably means that they were OOM-killed.

I wouldn’t run the JobManager on the same node as TaskManagers unless you’re using hard resource limits. Note: if you decide to go the hard resource limit route, know that Docker OOM-kills based on VIRT, not RSS (watch out for mmap).

On Jul 8, 2017, at 1:54 AM, Chesnay Schepler <[hidden email]> wrote:

If a TaskManager ran out of memory there should be something in the JobManager logs about a unreachable TaskManager.
That said, there should also be something in the JobManager logs about the job disappearing...

Could you set the logging level to DEBUG, run the job again, and provide us (or me directly) with the logs?

Regards,
Chesnay

On 08.07.2017 08:44, G.S.Vijay Raajaa wrote:
HI Chesnay,


I am currently using Flink - 1.3 using docker containers. I am not using it in HA mode. I have 3 task managers and one job manager. This happens randomly and not every time. Does it mean the task manager ran out of memory etc? I am using slots more than the available core , I hope compute is shared in round robin. Any pointers to tuning and HA setup will be greatly appreciated.

Regards,
Vijay Raajaa GS 

On Sat, Jul 8, 2017 at 12:04 PM, Chesnay Schepler <[hidden email]> wrote:
Hello,

could you tell us a bit more about your setup? Which Flink version you're using, whether HA is enabled, does this happen every time etc. .
Regards,
Chesnay


On 06.07.2017 21:43, G.S.Vijay Raajaa wrote:
HI,

I am using Flink Task manager and Job Manager as docker containers. Strangely, I find the jobs to disappear from the web portal after some time. The jobs don't move to the failed state either. Any pointers will be really helpful. Not able to get a clue from the logs.

Kindly let me know if I need specific tuning and ways to persists the uploaded jars.

Regards,
Vijay Raajaa G S






signature.asc (849 bytes) Download Attachment