(DEPRECATED) Apache Flink User Mailing List archive.

App gets stuck in Created State

Classic

List

Threaded

3 messages Options

Arpith P

App gets stuck in Created State

Hi,

We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state. We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb and job manager memory as 2gb. and 2 slots This request should only take 20 containers and 40 Vcores. But I see Flink is overallocating resource of 65 containers and 129 Cores . I've attached snapshots for references.

Right now I'm passing: -yD yarn.heartbeat.container-request-interval=1000 -yD taskmanager.network.memory.fraction=0.045 -yD taskmanager.memory.preallote=true.

How do I control resource allocation?.

Allocation1.png (105K) Download Attachment

YarnRequest.png (33K) Download Attachment

Zhu Zhu

Re: App gets stuck in Created State

Hi Arpith,

All tasks in CREATED state indicates no task is scheduled yet. It is strange it a job gets stuck in this state.

Is it possible that you share the job manager log so we can check what is happening there?

Thanks,

Zhu

Arpith P <[hidden email]> 于2020年9月21日周一下午3:52写道：

Hi,

We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state. We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb and job manager memory as 2gb. and 2 slots This request should only take 20 containers and 40 Vcores. But I see Flink is overallocating resource of 65 containers and 129 Cores . I've attached snapshots for references.

Right now I'm passing: -yD yarn.heartbeat.container-request-interval=1000 -yD taskmanager.network.memory.fraction=0.045 -yD taskmanager.memory.preallote=true.

How do I control resource allocation?.

Arpith P

Re: App gets stuck in Created State

All the job manager logs have been deleted from the cluster. I'll have to work with the infra team to get it back, once I have it i'll post it here.

Arpith

On Mon, Sep 21, 2020 at 5:50 PM Zhu Zhu <[hidden email]> wrote:

Hi Arpith,

All tasks in CREATED state indicates no task is scheduled yet. It is strange it a job gets stuck in this state.
Is it possible that you share the job manager log so we can check what is happening there?

Thanks,
Zhu

Arpith P <[hidden email]> 于2020年9月21日周一下午3:52写道：
Hi,

We have Flink 1.8.0 cluster deployed in Hadoop distributed mode, I often see even though Hadoop has enough resources Flink sits in Created state. We have 4 operators using 15 parallelism, 1 operator using 40 & 2 operators using 10. At time of submission I'm passing taskmanager memory as 4Gb and job manager memory as 2gb. and 2 slots This request should only take 20 containers and 40 Vcores. But I see Flink is overallocating resource of 65 containers and 129 Cores . I've attached snapshots for references.

Right now I'm passing: -yD yarn.heartbeat.container-request-interval=1000 -yD taskmanager.network.memory.fraction=0.045 -yD taskmanager.memory.preallote=true.

How do I control resource allocation?.