flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Colin Williams
Hi,

I've been trying to update my flink-docker jobmanager configuration for flink 1.4. I think the system is shutting down after a leadership election, but I'm not sure what the issue is. My configuration of the jobmanager follows


jobmanager.rpc.address: 10.16.228.150
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
blob.server.port: 6124
query.server.port: 6125

web.port: 8081
web.history: 10

parallelism.default: 1

state.backend: rocksdb
state.backend.rocksdb.checkpointdir: /tmp/flink/rocksdb
state.backend.fs.checkpointdir: file:///var/lib/data/checkpoints

high-availability: zookeeper
high-availability.cluster-id: /dev
high-availability.zookeeper.quorum: 10.16.228.190:2181
high-availability.zookeeper.path.root: /flink-1.4
high-availability.zookeeper.storageDir: file:///var/lib/data/recovery
high-availability.jobmanager.port: 50010

env.java.opts: -Dlog.file=/opt/flink/log/jobmanager.log

I'm also attaching some debugging output which shows the shutdown. Again I'm not entirely sure it's caused by a leadership issue because it's not clear from the debug logs. Can anyone suggest changes I might make to the configuration to fix this? I've tried clearing the zookeeper root path in case it had some old session information, but that didn't seem to help.

Best,

Colin Williams
Reply | Threaded
Open this post in threaded view
|

Re: flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Colin Williams


On Tue, Dec 19, 2017 at 7:29 PM, Colin Williams <[hidden email]> wrote:
Hi,

I've been trying to update my flink-docker jobmanager configuration for flink 1.4. I think the system is shutting down after a leadership election, but I'm not sure what the issue is. My configuration of the jobmanager follows


jobmanager.rpc.address: 10.16.228.150
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
blob.server.port: 6124
query.server.port: 6125

web.port: 8081
web.history: 10

parallelism.default: 1

state.backend: rocksdb
state.backend.rocksdb.checkpointdir: /tmp/flink/rocksdb
state.backend.fs.checkpointdir: file:///var/lib/data/checkpoints

high-availability: zookeeper
high-availability.cluster-id: /dev
high-availability.zookeeper.quorum: 10.16.228.190:2181
high-availability.zookeeper.path.root: /flink-1.4
high-availability.zookeeper.storageDir: file:///var/lib/data/recovery
high-availability.jobmanager.port: 50010

env.java.opts: -Dlog.file=/opt/flink/log/jobmanager.log

I'm also attaching some debugging output which shows the shutdown. Again I'm not entirely sure it's caused by a leadership issue because it's not clear from the debug logs. Can anyone suggest changes I might make to the configuration to fix this? I've tried clearing the zookeeper root path in case it had some old session information, but that didn't seem to help.

Best,

Colin Williams


out.txt (32K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: flink jobmanager HA zookeeper leadership election - RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.

Till Rohrmann
Hi Colin,

the log looks as if the Flink JobManager receives a SIGTERM signal and shuts down due to that. This is nothing which should be triggered by Flink's leader election. Could you check whether this signal might be created by another process in your environment or if the container supervisor terminated the process?

Cheers,
Till

On Wed, Dec 20, 2017 at 4:41 AM, Colin Williams <[hidden email]> wrote:


On Tue, Dec 19, 2017 at 7:29 PM, Colin Williams <[hidden email]> wrote:
Hi,

I've been trying to update my flink-docker jobmanager configuration for flink 1.4. I think the system is shutting down after a leadership election, but I'm not sure what the issue is. My configuration of the jobmanager follows


jobmanager.rpc.address: 10.16.228.150
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
blob.server.port: 6124
query.server.port: 6125

web.port: 8081
web.history: 10

parallelism.default: 1

state.backend: rocksdb
state.backend.rocksdb.checkpointdir: /tmp/flink/rocksdb
state.backend.fs.checkpointdir: file:///var/lib/data/checkpoints

high-availability: zookeeper
high-availability.cluster-id: /dev
high-availability.zookeeper.quorum: 10.16.228.190:2181
high-availability.zookeeper.path.root: /flink-1.4
high-availability.zookeeper.storageDir: file:///var/lib/data/recovery
high-availability.jobmanager.port: 50010

env.java.opts: -Dlog.file=/opt/flink/log/jobmanager.log

I'm also attaching some debugging output which shows the shutdown. Again I'm not entirely sure it's caused by a leadership issue because it's not clear from the debug logs. Can anyone suggest changes I might make to the configuration to fix this? I've tried clearing the zookeeper root path in case it had some old session information, but that didn't seem to help.

Best,

Colin Williams