(DEPRECATED) Apache Flink User Mailing List archive.

flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Classic

List

Threaded

3 messages Options

Colin Williams

flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Hi,

I've been trying to update my flink-docker jobmanager configuration for flink 1.4. I think the system is shutting down after a leadership election, but I'm not sure what the issue is. My configuration of the jobmanager follows

jobmanager.rpc.address: 10.16.228.150
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
blob.server.port: 6124
query.server.port: 6125

web.port: 8081
web.history: 10

parallelism.default: 1

state.backend: rocksdb
state.backend.rocksdb.checkpointdir: /tmp/flink/rocksdb
state.backend.fs.checkpointdir: file:///var/lib/data/checkpoints

high-availability: zookeeper
high-availability.cluster-id: /dev
high-availability.zookeeper.quorum: 10.16.228.190:2181
high-availability.zookeeper.path.root: /flink-1.4
high-availability.zookeeper.storageDir: file:///var/lib/data/recovery
high-availability.jobmanager.port: 50010

env.java.opts: -Dlog.file=/opt/flink/log/jobmanager.log

I'm also attaching some debugging output which shows the shutdown. Again I'm not entirely sure it's caused by a leadership issue because it's not clear from the debug logs. Can anyone suggest changes I might make to the configuration to fix this? I've tried clearing the zookeeper root path in case it had some old session information, but that didn't seem to help.

Best,

Colin Williams

out (32K) Download Attachment

Colin Williams

Re: flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

On Tue, Dec 19, 2017 at 7:29 PM, Colin Williams <[hidden email]> wrote:

Hi,

I've been trying to update my flink-docker jobmanager configuration for flink 1.4. I think the system is shutting down after a leadership election, but I'm not sure what the issue is. My configuration of the jobmanager follows

jobmanager.rpc.address: 10.16.228.150
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
blob.server.port: 6124
query.server.port: 6125

web.port: 8081
web.history: 10

parallelism.default: 1

state.backend: rocksdb
state.backend.rocksdb.checkpointdir: /tmp/flink/rocksdb
state.backend.fs.checkpointdir: file:///var/lib/data/checkpoints

high-availability: zookeeper
high-availability.cluster-id: /dev
high-availability.zookeeper.quorum: 10.16.228.190:2181
high-availability.zookeeper.path.root: /flink-1.4
high-availability.zookeeper.storageDir: file:///var/lib/data/recovery
high-availability.jobmanager.port: 50010

env.java.opts: -Dlog.file=/opt/flink/log/jobmanager.log

I'm also attaching some debugging output which shows the shutdown. Again I'm not entirely sure it's caused by a leadership issue because it's not clear from the debug logs. Can anyone suggest changes I might make to the configuration to fix this? I've tried clearing the zookeeper root path in case it had some old session information, but that didn't seem to help.

Best,

Colin Williams

out.txt (32K) Download Attachment

Till Rohrmann

Re: flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Hi Colin,

the log looks as if the Flink JobManager receives a SIGTERM signal and shuts down due to that. This is nothing which should be triggered by Flink's leader election. Could you check whether this signal might be created by another process in your environment or if the container supervisor terminated the process?

Cheers,

Till

On Wed, Dec 20, 2017 at 4:41 AM, Colin Williams <[hidden email]> wrote:

On Tue, Dec 19, 2017 at 7:29 PM, Colin Williams <[hidden email]> wrote:
Hi,

I've been trying to update my flink-docker jobmanager configuration for flink 1.4. I think the system is shutting down after a leadership election, but I'm not sure what the issue is. My configuration of the jobmanager follows

jobmanager.rpc.address: 10.16.228.150
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
blob.server.port: 6124
query.server.port: 6125

web.port: 8081
web.history: 10

parallelism.default: 1

state.backend: rocksdb
state.backend.rocksdb.checkpointdir: /tmp/flink/rocksdb
state.backend.fs.checkpointdir: file:///var/lib/data/checkpoints

high-availability: zookeeper
high-availability.cluster-id: /dev
high-availability.zookeeper.quorum: 10.16.228.190:2181
high-availability.zookeeper.path.root: /flink-1.4
high-availability.zookeeper.storageDir: file:///var/lib/data/recovery
high-availability.jobmanager.port: 50010

env.java.opts: -Dlog.file=/opt/flink/log/jobmanager.log

I'm also attaching some debugging output which shows the shutdown. Again I'm not entirely sure it's caused by a leadership issue because it's not clear from the debug logs. Can anyone suggest changes I might make to the configuration to fix this? I've tried clearing the zookeeper root path in case it had some old session information, but that didn't seem to help.

Best,

Colin Williams