App gets stuck in Created State

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

App gets stuck in Created State

Arpith P
Hi,

We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state.  We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb and job manager memory as 2gb. and 2 slots This request should only take 20 containers and 40 Vcores. But I see Flink is overallocating resource of 65 containers and 129 Cores . I've attached snapshots for references.

Right now I'm passing:  -yD yarn.heartbeat.container-request-interval=1000 -yD taskmanager.network.memory.fraction=0.045 -yD taskmanager.memory.preallote=true.

How do I control resource allocation?.


Allocation1.png (105K) Download Attachment
YarnRequest.png (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: App gets stuck in Created State

Zhu Zhu
Hi Arpith,

All tasks in CREATED state indicates no task is scheduled yet. It is strange it a job gets stuck in this state.
Is it possible that you share the job manager log so we can check what is happening there?

Thanks,
Zhu

Arpith P <[hidden email]> 于2020年9月21日周一 下午3:52写道:
Hi,

We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state.  We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb and job manager memory as 2gb. and 2 slots This request should only take 20 containers and 40 Vcores. But I see Flink is overallocating resource of 65 containers and 129 Cores . I've attached snapshots for references.

Right now I'm passing:  -yD yarn.heartbeat.container-request-interval=1000 -yD taskmanager.network.memory.fraction=0.045 -yD taskmanager.memory.preallote=true.

How do I control resource allocation?.

Reply | Threaded
Open this post in threaded view
|

Re: App gets stuck in Created State

Arpith P
All the job manager logs have been deleted from the cluster. I'll have to work with the infra team to get it back, once I have it i'll post it here.

Arpith

On Mon, Sep 21, 2020 at 5:50 PM Zhu Zhu <[hidden email]> wrote:
Hi Arpith,

All tasks in CREATED state indicates no task is scheduled yet. It is strange it a job gets stuck in this state.
Is it possible that you share the job manager log so we can check what is happening there?

Thanks,
Zhu

Arpith P <[hidden email]> 于2020年9月21日周一 下午3:52写道:
Hi,

We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state.  We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb and job manager memory as 2gb. and 2 slots This request should only take 20 containers and 40 Vcores. But I see Flink is overallocating resource of 65 containers and 129 Cores . I've attached snapshots for references.

Right now I'm passing:  -yD yarn.heartbeat.container-request-interval=1000 -yD taskmanager.network.memory.fraction=0.045 -yD taskmanager.memory.preallote=true.

How do I control resource allocation?.