Hi,
For the first time I'm trying to set up a standalone cluster. My current configuration 4 server (1 jobmanger and 3 taskmanager) a) starting the cluster swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh Starting cluster. Starting standalonesession daemon on host sb-ust1. Starting taskexecutor daemon on host sb-ust2. Starting taskexecutor daemon on host sb-ust3. Starting taskexecutor daemon on host sb-ust4. On the taskmanager side I get the error 2019-05-01 21:16:32,794 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] has failed, address is now gated for [50] ms. Reason: [class [B cannot be cast to class [C ([B and [C are in module java.base of loader 'bootstrap')] 2019-05-01 21:16:41,932 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. 2019-05-01 21:17:01,960 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not resolve ResourceManager address akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify".. port 6123 is allowed on the jobmanager but I haven't created a specialized flink - user. - Is this necessary? if yes, is it possible to define another user for communication purposes? I followed the documentation to setup a ssl based communication (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes) and created a keystore as described: keytool -genkeypair -alias swissbib.internal -keystore internal.keystore -dname "CN=flink.internal" -storepass verysecret -keypass verysecret -keyalg RSA -keysize 4096 and deployed the flink-conf.yaml on the whole cluster (part of flink-conf.yaml) security.ssl.internal.enabled: true security.ssl.internal.keystore: /swissbib_index/apps/flink/conf/internal.keystore security.ssl.internal.truststore: /swissbib_index/apps/flink/conf/internal.keystore security.ssl.internal.keystore-password: verysecret security.ssl.internal.truststore-password: verysecret security.ssl.internal.key-password: verysecret but this doesn't solve the problem - still no connection between task-managers and job-managers. - another question: which ports have to be enabled in the firewall for a standalone cluster? Thanks for any hints! Günter |
Which java version are you using?
On 01/05/2019 21:31, Günter Hipler wrote: > Hi, > > For the first time I'm trying to set up a standalone cluster. My > current configuration > 4 server (1 jobmanger and 3 taskmanager) > > a) starting the cluster > swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh > Starting cluster. > Starting standalonesession daemon on host sb-ust1. > Starting taskexecutor daemon on host sb-ust2. > Starting taskexecutor daemon on host sb-ust3. > Starting taskexecutor daemon on host sb-ust4. > > > On the taskmanager side I get the error > 2019-05-01 21:16:32,794 WARN > akka.remote.ReliableDeliverySupervisor - > Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] has > failed, address is now gated for [50] ms. Reason: [class [B cannot be > cast to class [C ([B and [C are in module java.base of loader > 'bootstrap')] > 2019-05-01 21:16:41,932 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could > not resolve ResourceManager address > akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in > 10000 ms: Ask timed out on > [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), > Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent > message of type "akka.actor.Identify".. > 2019-05-01 21:17:01,960 INFO > org.apache.flink.runtime.taskexecutor.TaskExecutor - Could > not resolve ResourceManager address > akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in > 10000 ms: Ask timed out on > [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), > Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent > message of type "akka.actor.Identify".. > > > port 6123 is allowed on the jobmanager but I haven't created a > specialized flink - user. > > - Is this necessary? if yes, is it possible to define another user for > communication purposes? > > I followed the documentation to setup a ssl based communication > (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes) > and created a keystore as described: > > keytool -genkeypair -alias swissbib.internal -keystore > internal.keystore -dname "CN=flink.internal" -storepass verysecret > -keypass verysecret -keyalg RSA -keysize 4096 > > and deployed the flink-conf.yaml on the whole cluster > > (part of flink-conf.yaml) > security.ssl.internal.enabled: true > security.ssl.internal.keystore: > /swissbib_index/apps/flink/conf/internal.keystore > security.ssl.internal.truststore: > /swissbib_index/apps/flink/conf/internal.keystore > security.ssl.internal.keystore-password: verysecret > security.ssl.internal.truststore-password: verysecret > security.ssl.internal.key-password: verysecret > > but this doesn't solve the problem - still no connection between > task-managers and job-managers. > > - another question: which ports have to be enabled in the firewall for > a standalone cluster? > > Thanks for any hints! > > Günter > > |
swissbib@sb-ust1:~$ java -version
openjdk version "11.0.2" 2019-01-15 OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3) OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed mode, sharing) swissbib@sb-ust1:~$ Is version 8 more appropriate? Günter On 02.05.19 13:48, Chesnay Schepler wrote: > Which java version are you using? > > On 01/05/2019 21:31, Günter Hipler wrote: >> Hi, >> >> For the first time I'm trying to set up a standalone cluster. My >> current configuration >> 4 server (1 jobmanger and 3 taskmanager) >> >> a) starting the cluster >> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh >> Starting cluster. >> Starting standalonesession daemon on host sb-ust1. >> Starting taskexecutor daemon on host sb-ust2. >> Starting taskexecutor daemon on host sb-ust3. >> Starting taskexecutor daemon on host sb-ust4. >> >> >> On the taskmanager side I get the error >> 2019-05-01 21:16:32,794 WARN >> akka.remote.ReliableDeliverySupervisor - >> Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] >> has failed, address is now gated for [50] ms. Reason: [class [B >> cannot be cast to class [C ([B and [C are in module java.base of >> loader 'bootstrap')] >> 2019-05-01 21:16:41,932 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could >> not resolve ResourceManager address >> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in >> 10000 ms: Ask timed out on >> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), >> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >> message of type "akka.actor.Identify".. >> 2019-05-01 21:17:01,960 INFO >> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could >> not resolve ResourceManager address >> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in >> 10000 ms: Ask timed out on >> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), >> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >> message of type "akka.actor.Identify".. >> >> >> port 6123 is allowed on the jobmanager but I haven't created a >> specialized flink - user. >> >> - Is this necessary? if yes, is it possible to define another user >> for communication purposes? >> >> I followed the documentation to setup a ssl based communication >> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes) >> and created a keystore as described: >> >> keytool -genkeypair -alias swissbib.internal -keystore >> internal.keystore -dname "CN=flink.internal" -storepass verysecret >> -keypass verysecret -keyalg RSA -keysize 4096 >> >> and deployed the flink-conf.yaml on the whole cluster >> >> (part of flink-conf.yaml) >> security.ssl.internal.enabled: true >> security.ssl.internal.keystore: >> /swissbib_index/apps/flink/conf/internal.keystore >> security.ssl.internal.truststore: >> /swissbib_index/apps/flink/conf/internal.keystore >> security.ssl.internal.keystore-password: verysecret >> security.ssl.internal.truststore-password: verysecret >> security.ssl.internal.key-password: verysecret >> >> but this doesn't solve the problem - still no connection between >> task-managers and job-managers. >> >> - another question: which ports have to be enabled in the firewall >> for a standalone cluster? >> >> Thanks for any hints! >> >> Günter >> >> > > |
In reply to this post by Chesnay Schepler
Java version: "1.8.0_112" Java(TM) SE Runtime Environment (build 1.8.0_112-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode) On Thu, 2 May 2019 at 17:18, Chesnay Schepler <[hidden email]> wrote: Which java version are you using? Warm Regards,
Abhishek Jain |
In reply to this post by Günter Hipler-2
Flink still only works with Java 8 at the moment. It will be a while
until we properly support Java 11. On 02/05/2019 13:58, Günter Hipler wrote: > swissbib@sb-ust1:~$ java -version > openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3) > OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed > mode, sharing) > swissbib@sb-ust1:~$ > > Is version 8 more appropriate? > > Günter > > > On 02.05.19 13:48, Chesnay Schepler wrote: >> Which java version are you using? >> >> On 01/05/2019 21:31, Günter Hipler wrote: >>> Hi, >>> >>> For the first time I'm trying to set up a standalone cluster. My >>> current configuration >>> 4 server (1 jobmanger and 3 taskmanager) >>> >>> a) starting the cluster >>> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh >>> Starting cluster. >>> Starting standalonesession daemon on host sb-ust1. >>> Starting taskexecutor daemon on host sb-ust2. >>> Starting taskexecutor daemon on host sb-ust3. >>> Starting taskexecutor daemon on host sb-ust4. >>> >>> >>> On the taskmanager side I get the error >>> 2019-05-01 21:16:32,794 WARN akka.remote.ReliableDeliverySupervisor >>> - Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] >>> has failed, address is now gated for [50] ms. Reason: [class [B >>> cannot be cast to class [C ([B and [C are in module java.base of >>> loader 'bootstrap')] >>> 2019-05-01 21:16:41,932 INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >>> resolve ResourceManager address >>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in >>> 10000 ms: Ask timed out on >>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), >>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >>> message of type "akka.actor.Identify".. >>> 2019-05-01 21:17:01,960 INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >>> resolve ResourceManager address >>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in >>> 10000 ms: Ask timed out on >>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), >>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >>> message of type "akka.actor.Identify".. >>> >>> >>> port 6123 is allowed on the jobmanager but I haven't created a >>> specialized flink - user. >>> >>> - Is this necessary? if yes, is it possible to define another user >>> for communication purposes? >>> >>> I followed the documentation to setup a ssl based communication >>> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes) >>> and created a keystore as described: >>> >>> keytool -genkeypair -alias swissbib.internal -keystore >>> internal.keystore -dname "CN=flink.internal" -storepass verysecret >>> -keypass verysecret -keyalg RSA -keysize 4096 >>> >>> and deployed the flink-conf.yaml on the whole cluster >>> >>> (part of flink-conf.yaml) >>> security.ssl.internal.enabled: true >>> security.ssl.internal.keystore: >>> /swissbib_index/apps/flink/conf/internal.keystore >>> security.ssl.internal.truststore: >>> /swissbib_index/apps/flink/conf/internal.keystore >>> security.ssl.internal.keystore-password: verysecret >>> security.ssl.internal.truststore-password: verysecret >>> security.ssl.internal.key-password: verysecret >>> >>> but this doesn't solve the problem - still no connection between >>> task-managers and job-managers. >>> >>> - another question: which ports have to be enabled in the firewall >>> for a standalone cluster? >>> >>> Thanks for any hints! >>> >>> Günter >>> >>> >> >> > |
In reply to this post by Günter Hipler-2
Thanks a lot for the hint - this seems to solve the problem
openjdk version "1.8.0_191" OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12) OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode) 2019-05-02 15:17:44,109 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Resolved ResourceManager address, beginning registration 2019-05-02 15:17:44,109 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Registration at ResourceManager attempt 1 (timeout=100ms) 2019-05-02 15:17:44,183 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Successful registration at resource manager akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager under registration id 2068ab84444ebbc0d4868e1605dfde4f. Günter ----Ursprüngliche Nachricht---- Von : [hidden email] Datum : 02/05/2019 - 14:20 (CEST) An : [hidden email], [hidden email] Cc : [hidden email] Betreff : Re: configuration of standalone cluster Flink still only works with Java 8 at the moment. It will be a while until we properly support Java 11. On 02/05/2019 13:58, Günter Hipler wrote: > swissbib@sb-ust1:~$ java -version > openjdk version "11.0.2" 2019-01-15 > OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3) > OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed > mode, sharing) > swissbib@sb-ust1:~$ > > Is version 8 more appropriate? > > Günter > > > On 02.05.19 13:48, Chesnay Schepler wrote: >> Which java version are you using? >> >> On 01/05/2019 21:31, Günter Hipler wrote: >>> Hi, >>> >>> For the first time I'm trying to set up a standalone cluster. My >>> current configuration >>> 4 server (1 jobmanger and 3 taskmanager) >>> >>> a) starting the cluster >>> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh >>> Starting cluster. >>> Starting standalonesession daemon on host sb-ust1. >>> Starting taskexecutor daemon on host sb-ust2. >>> Starting taskexecutor daemon on host sb-ust3. >>> Starting taskexecutor daemon on host sb-ust4. >>> >>> >>> On the taskmanager side I get the error >>> 2019-05-01 21:16:32,794 WARN akka.remote.ReliableDeliverySupervisor >>> - Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] >>> has failed, address is now gated for [50] ms. Reason: [class [B >>> cannot be cast to class [C ([B and [C are in module java.base of >>> loader 'bootstrap')] >>> 2019-05-01 21:16:41,932 INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >>> resolve ResourceManager address >>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in >>> 10000 ms: Ask timed out on >>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), >>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >>> message of type "akka.actor.Identify".. >>> 2019-05-01 21:17:01,960 INFO >>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not >>> resolve ResourceManager address >>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in >>> 10000 ms: Ask timed out on >>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/), >>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent >>> message of type "akka.actor.Identify".. >>> >>> >>> port 6123 is allowed on the jobmanager but I haven't created a >>> specialized flink - user. >>> >>> - Is this necessary? if yes, is it possible to define another user >>> for communication purposes? >>> >>> I followed the documentation to setup a ssl based communication >>> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes) >>> and created a keystore as described: >>> >>> keytool -genkeypair -alias swissbib.internal -keystore >>> internal.keystore -dname "CN=flink.internal" -storepass verysecret >>> -keypass verysecret -keyalg RSA -keysize 4096 >>> >>> and deployed the flink-conf.yaml on the whole cluster >>> >>> (part of flink-conf.yaml) >>> security.ssl.internal.enabled: true >>> security.ssl.internal.keystore: >>> /swissbib_index/apps/flink/conf/internal.keystore >>> security.ssl.internal.truststore: >>> /swissbib_index/apps/flink/conf/internal.keystore >>> security.ssl.internal.keystore-password: verysecret >>> security.ssl.internal.truststore-password: verysecret >>> security.ssl.internal.key-password: verysecret >>> >>> but this doesn't solve the problem - still no connection between >>> task-managers and job-managers. >>> >>> - another question: which ports have to be enabled in the firewall >>> for a standalone cluster? >>> >>> Thanks for any hints! >>> >>> Günter >>> >>> >> >> > |
Free forum by Nabble | Edit this page |