configuration of standalone cluster

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

configuration of standalone cluster

Günter Hipler-2
Hi,

For the first time I'm trying to set up a standalone cluster. My current
configuration
4 server (1 jobmanger and 3 taskmanager)

a) starting the cluster
swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh
Starting cluster.
Starting standalonesession daemon on host sb-ust1.
Starting taskexecutor daemon on host sb-ust2.
Starting taskexecutor daemon on host sb-ust3.
Starting taskexecutor daemon on host sb-ust4.


On the taskmanager side I get the error
2019-05-01 21:16:32,794 WARN
akka.remote.ReliableDeliverySupervisor                        -
Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] has
failed, address is now gated for [50] ms. Reason: [class [B cannot be
cast to class [C ([B and [C are in module java.base of loader 'bootstrap')]
2019-05-01 21:16:41,932 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
not resolve ResourceManager address
akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
10000 ms: Ask timed out on
[ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
message of type "akka.actor.Identify"..
2019-05-01 21:17:01,960 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
not resolve ResourceManager address
akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
10000 ms: Ask timed out on
[ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
message of type "akka.actor.Identify"..


port 6123 is allowed on the jobmanager but I haven't created a
specialized flink - user.

- Is this necessary? if yes, is it possible to define another user for
communication purposes?

I followed the documentation to setup a ssl based communication
(https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes)
and created a keystore as described:

keytool -genkeypair -alias swissbib.internal -keystore internal.keystore
-dname "CN=flink.internal" -storepass verysecret -keypass verysecret
-keyalg RSA -keysize 4096

and deployed the flink-conf.yaml on the whole cluster

(part of flink-conf.yaml)
security.ssl.internal.enabled: true
security.ssl.internal.keystore:
/swissbib_index/apps/flink/conf/internal.keystore
security.ssl.internal.truststore:
/swissbib_index/apps/flink/conf/internal.keystore
security.ssl.internal.keystore-password: verysecret
security.ssl.internal.truststore-password: verysecret
security.ssl.internal.key-password: verysecret

but this doesn't solve the problem - still no connection between
task-managers and job-managers.

- another question: which ports have to be enabled in the firewall for a
standalone cluster?

Thanks for any hints!

Günter

Reply | Threaded
Open this post in threaded view
|

Re: configuration of standalone cluster

Chesnay Schepler
Which java version are you using?

On 01/05/2019 21:31, Günter Hipler wrote:

> Hi,
>
> For the first time I'm trying to set up a standalone cluster. My
> current configuration
> 4 server (1 jobmanger and 3 taskmanager)
>
> a) starting the cluster
> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh
> Starting cluster.
> Starting standalonesession daemon on host sb-ust1.
> Starting taskexecutor daemon on host sb-ust2.
> Starting taskexecutor daemon on host sb-ust3.
> Starting taskexecutor daemon on host sb-ust4.
>
>
> On the taskmanager side I get the error
> 2019-05-01 21:16:32,794 WARN
> akka.remote.ReliableDeliverySupervisor                        -
> Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] has
> failed, address is now gated for [50] ms. Reason: [class [B cannot be
> cast to class [C ([B and [C are in module java.base of loader
> 'bootstrap')]
> 2019-05-01 21:16:41,932 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
> not resolve ResourceManager address
> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
> 10000 ms: Ask timed out on
> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
> message of type "akka.actor.Identify"..
> 2019-05-01 21:17:01,960 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
> not resolve ResourceManager address
> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
> 10000 ms: Ask timed out on
> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
> message of type "akka.actor.Identify"..
>
>
> port 6123 is allowed on the jobmanager but I haven't created a
> specialized flink - user.
>
> - Is this necessary? if yes, is it possible to define another user for
> communication purposes?
>
> I followed the documentation to setup a ssl based communication
> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes)
> and created a keystore as described:
>
> keytool -genkeypair -alias swissbib.internal -keystore
> internal.keystore -dname "CN=flink.internal" -storepass verysecret
> -keypass verysecret -keyalg RSA -keysize 4096
>
> and deployed the flink-conf.yaml on the whole cluster
>
> (part of flink-conf.yaml)
> security.ssl.internal.enabled: true
> security.ssl.internal.keystore:
> /swissbib_index/apps/flink/conf/internal.keystore
> security.ssl.internal.truststore:
> /swissbib_index/apps/flink/conf/internal.keystore
> security.ssl.internal.keystore-password: verysecret
> security.ssl.internal.truststore-password: verysecret
> security.ssl.internal.key-password: verysecret
>
> but this doesn't solve the problem - still no connection between
> task-managers and job-managers.
>
> - another question: which ports have to be enabled in the firewall for
> a standalone cluster?
>
> Thanks for any hints!
>
> Günter
>
>

Reply | Threaded
Open this post in threaded view
|

Re: configuration of standalone cluster

Günter Hipler-2
swissbib@sb-ust1:~$ java -version
openjdk version "11.0.2" 2019-01-15
OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3)
OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed
mode, sharing)
swissbib@sb-ust1:~$

Is version 8 more appropriate?

Günter


On 02.05.19 13:48, Chesnay Schepler wrote:

> Which java version are you using?
>
> On 01/05/2019 21:31, Günter Hipler wrote:
>> Hi,
>>
>> For the first time I'm trying to set up a standalone cluster. My
>> current configuration
>> 4 server (1 jobmanger and 3 taskmanager)
>>
>> a) starting the cluster
>> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh
>> Starting cluster.
>> Starting standalonesession daemon on host sb-ust1.
>> Starting taskexecutor daemon on host sb-ust2.
>> Starting taskexecutor daemon on host sb-ust3.
>> Starting taskexecutor daemon on host sb-ust4.
>>
>>
>> On the taskmanager side I get the error
>> 2019-05-01 21:16:32,794 WARN
>> akka.remote.ReliableDeliverySupervisor                        -
>> Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123]
>> has failed, address is now gated for [50] ms. Reason: [class [B
>> cannot be cast to class [C ([B and [C are in module java.base of
>> loader 'bootstrap')]
>> 2019-05-01 21:16:41,932 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
>> not resolve ResourceManager address
>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
>> 10000 ms: Ask timed out on
>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
>> message of type "akka.actor.Identify"..
>> 2019-05-01 21:17:01,960 INFO
>> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
>> not resolve ResourceManager address
>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
>> 10000 ms: Ask timed out on
>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
>> message of type "akka.actor.Identify"..
>>
>>
>> port 6123 is allowed on the jobmanager but I haven't created a
>> specialized flink - user.
>>
>> - Is this necessary? if yes, is it possible to define another user
>> for communication purposes?
>>
>> I followed the documentation to setup a ssl based communication
>> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes)
>> and created a keystore as described:
>>
>> keytool -genkeypair -alias swissbib.internal -keystore
>> internal.keystore -dname "CN=flink.internal" -storepass verysecret
>> -keypass verysecret -keyalg RSA -keysize 4096
>>
>> and deployed the flink-conf.yaml on the whole cluster
>>
>> (part of flink-conf.yaml)
>> security.ssl.internal.enabled: true
>> security.ssl.internal.keystore:
>> /swissbib_index/apps/flink/conf/internal.keystore
>> security.ssl.internal.truststore:
>> /swissbib_index/apps/flink/conf/internal.keystore
>> security.ssl.internal.keystore-password: verysecret
>> security.ssl.internal.truststore-password: verysecret
>> security.ssl.internal.key-password: verysecret
>>
>> but this doesn't solve the problem - still no connection between
>> task-managers and job-managers.
>>
>> - another question: which ports have to be enabled in the firewall
>> for a standalone cluster?
>>
>> Thanks for any hints!
>>
>> Günter
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: configuration of standalone cluster

Abhishek Jain
In reply to this post by Chesnay Schepler
Java version: "1.8.0_112"
Java(TM) SE Runtime Environment (build 1.8.0_112-b15)
Java HotSpot(TM) 64-Bit Server VM (build 25.112-b15, mixed mode)


On Thu, 2 May 2019 at 17:18, Chesnay Schepler <[hidden email]> wrote:
Which java version are you using?

On 01/05/2019 21:31, Günter Hipler wrote:
> Hi,
>
> For the first time I'm trying to set up a standalone cluster. My
> current configuration
> 4 server (1 jobmanger and 3 taskmanager)
>
> a) starting the cluster
> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh
> Starting cluster.
> Starting standalonesession daemon on host sb-ust1.
> Starting taskexecutor daemon on host sb-ust2.
> Starting taskexecutor daemon on host sb-ust3.
> Starting taskexecutor daemon on host sb-ust4.
>
>
> On the taskmanager side I get the error
> 2019-05-01 21:16:32,794 WARN
> akka.remote.ReliableDeliverySupervisor                        -
> Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123] has
> failed, address is now gated for [50] ms. Reason: [class [B cannot be
> cast to class [C ([B and [C are in module java.base of loader
> 'bootstrap')]
> 2019-05-01 21:16:41,932 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
> not resolve ResourceManager address
> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
> 10000 ms: Ask timed out on
> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
> message of type "akka.actor.Identify"..
> 2019-05-01 21:17:01,960 INFO
> org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could
> not resolve ResourceManager address
> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
> 10000 ms: Ask timed out on
> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
> message of type "akka.actor.Identify"..
>
>
> port 6123 is allowed on the jobmanager but I haven't created a
> specialized flink - user.
>
> - Is this necessary? if yes, is it possible to define another user for
> communication purposes?
>
> I followed the documentation to setup a ssl based communication
> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes)
> and created a keystore as described:
>
> keytool -genkeypair -alias swissbib.internal -keystore
> internal.keystore -dname "CN=flink.internal" -storepass verysecret
> -keypass verysecret -keyalg RSA -keysize 4096
>
> and deployed the flink-conf.yaml on the whole cluster
>
> (part of flink-conf.yaml)
> security.ssl.internal.enabled: true
> security.ssl.internal.keystore:
> /swissbib_index/apps/flink/conf/internal.keystore
> security.ssl.internal.truststore:
> /swissbib_index/apps/flink/conf/internal.keystore
> security.ssl.internal.keystore-password: verysecret
> security.ssl.internal.truststore-password: verysecret
> security.ssl.internal.key-password: verysecret
>
> but this doesn't solve the problem - still no connection between
> task-managers and job-managers.
>
> - another question: which ports have to be enabled in the firewall for
> a standalone cluster?
>
> Thanks for any hints!
>
> Günter
>
>



--
Warm Regards,
Abhishek Jain
Reply | Threaded
Open this post in threaded view
|

Re: configuration of standalone cluster

Chesnay Schepler
In reply to this post by Günter Hipler-2
Flink still only works with Java 8 at the moment. It will be a while
until we properly support Java 11.

On 02/05/2019 13:58, Günter Hipler wrote:

> swissbib@sb-ust1:~$ java -version
> openjdk version "11.0.2" 2019-01-15
> OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3)
> OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed
> mode, sharing)
> swissbib@sb-ust1:~$
>
> Is version 8 more appropriate?
>
> Günter
>
>
> On 02.05.19 13:48, Chesnay Schepler wrote:
>> Which java version are you using?
>>
>> On 01/05/2019 21:31, Günter Hipler wrote:
>>> Hi,
>>>
>>> For the first time I'm trying to set up a standalone cluster. My
>>> current configuration
>>> 4 server (1 jobmanger and 3 taskmanager)
>>>
>>> a) starting the cluster
>>> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh
>>> Starting cluster.
>>> Starting standalonesession daemon on host sb-ust1.
>>> Starting taskexecutor daemon on host sb-ust2.
>>> Starting taskexecutor daemon on host sb-ust3.
>>> Starting taskexecutor daemon on host sb-ust4.
>>>
>>>
>>> On the taskmanager side I get the error
>>> 2019-05-01 21:16:32,794 WARN akka.remote.ReliableDeliverySupervisor
>>> - Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123]
>>> has failed, address is now gated for [50] ms. Reason: [class [B
>>> cannot be cast to class [C ([B and [C are in module java.base of
>>> loader 'bootstrap')]
>>> 2019-05-01 21:16:41,932 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not
>>> resolve ResourceManager address
>>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
>>> 10000 ms: Ask timed out on
>>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
>>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
>>> message of type "akka.actor.Identify"..
>>> 2019-05-01 21:17:01,960 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not
>>> resolve ResourceManager address
>>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
>>> 10000 ms: Ask timed out on
>>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
>>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
>>> message of type "akka.actor.Identify"..
>>>
>>>
>>> port 6123 is allowed on the jobmanager but I haven't created a
>>> specialized flink - user.
>>>
>>> - Is this necessary? if yes, is it possible to define another user
>>> for communication purposes?
>>>
>>> I followed the documentation to setup a ssl based communication
>>> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes)
>>> and created a keystore as described:
>>>
>>> keytool -genkeypair -alias swissbib.internal -keystore
>>> internal.keystore -dname "CN=flink.internal" -storepass verysecret
>>> -keypass verysecret -keyalg RSA -keysize 4096
>>>
>>> and deployed the flink-conf.yaml on the whole cluster
>>>
>>> (part of flink-conf.yaml)
>>> security.ssl.internal.enabled: true
>>> security.ssl.internal.keystore:
>>> /swissbib_index/apps/flink/conf/internal.keystore
>>> security.ssl.internal.truststore:
>>> /swissbib_index/apps/flink/conf/internal.keystore
>>> security.ssl.internal.keystore-password: verysecret
>>> security.ssl.internal.truststore-password: verysecret
>>> security.ssl.internal.key-password: verysecret
>>>
>>> but this doesn't solve the problem - still no connection between
>>> task-managers and job-managers.
>>>
>>> - another question: which ports have to be enabled in the firewall
>>> for a standalone cluster?
>>>
>>> Thanks for any hints!
>>>
>>> Günter
>>>
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Re: configuration of standalone cluster

Günter Hipler-2
In reply to this post by Günter Hipler-2
Thanks a lot for the hint - this seems to solve the problem

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

2019-05-02 15:17:44,109 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Resolved ResourceManager address, beginning registration
2019-05-02 15:17:44,109 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Registration at ResourceManager attempt 1 (timeout=100ms)
2019-05-02 15:17:44,183 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Successful registration at resource manager akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager under registration id 2068ab84444ebbc0d4868e1605dfde4f.

Günter



----Ursprüngliche Nachricht----
Von : [hidden email]
Datum : 02/05/2019 - 14:20 (CEST)
An : [hidden email], [hidden email]
Cc : [hidden email]
Betreff : Re: configuration of standalone cluster

Flink still only works with Java 8 at the moment. It will be a while
until we properly support Java 11.

On 02/05/2019 13:58, Günter Hipler wrote:

> swissbib@sb-ust1:~$ java -version
> openjdk version "11.0.2" 2019-01-15
> OpenJDK Runtime Environment (build 11.0.2+9-Ubuntu-3ubuntu118.04.3)
> OpenJDK 64-Bit Server VM (build 11.0.2+9-Ubuntu-3ubuntu118.04.3, mixed
> mode, sharing)
> swissbib@sb-ust1:~$
>
> Is version 8 more appropriate?
>
> Günter
>
>
> On 02.05.19 13:48, Chesnay Schepler wrote:
>> Which java version are you using?
>>
>> On 01/05/2019 21:31, Günter Hipler wrote:
>>> Hi,
>>>
>>> For the first time I'm trying to set up a standalone cluster. My
>>> current configuration
>>> 4 server (1 jobmanger and 3 taskmanager)
>>>
>>> a) starting the cluster
>>> swissbib@sb-ust1:/swissbib_index/apps/flink/bin$ ./start-cluster.sh
>>> Starting cluster.
>>> Starting standalonesession daemon on host sb-ust1.
>>> Starting taskexecutor daemon on host sb-ust2.
>>> Starting taskexecutor daemon on host sb-ust3.
>>> Starting taskexecutor daemon on host sb-ust4.
>>>
>>>
>>> On the taskmanager side I get the error
>>> 2019-05-01 21:16:32,794 WARN akka.remote.ReliableDeliverySupervisor
>>> - Association with remote system [akka.ssl.tcp://flink@sb-ust1:6123]
>>> has failed, address is now gated for [50] ms. Reason: [class [B
>>> cannot be cast to class [C ([B and [C are in module java.base of
>>> loader 'bootstrap')]
>>> 2019-05-01 21:16:41,932 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not
>>> resolve ResourceManager address
>>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
>>> 10000 ms: Ask timed out on
>>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
>>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
>>> message of type "akka.actor.Identify"..
>>> 2019-05-01 21:17:01,960 INFO
>>> org.apache.flink.runtime.taskexecutor.TaskExecutor - Could not
>>> resolve ResourceManager address
>>> akka.ssl.tcp://flink@sb-ust1:6123/user/resourcemanager, retrying in
>>> 10000 ms: Ask timed out on
>>> [ActorSelection[Anchor(akka.ssl.tcp://flink@sb-ust1:6123/),
>>> Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent
>>> message of type "akka.actor.Identify"..
>>>
>>>
>>> port 6123 is allowed on the jobmanager but I haven't created a
>>> specialized flink - user.
>>>
>>> - Is this necessary? if yes, is it possible to define another user
>>> for communication purposes?
>>>
>>> I followed the documentation to setup a ssl based communication
>>> (https://ci.apache.org/projects/flink/flink-docs-release-1.8/ops/security-ssl.html#example-ssl-setup-standalone-and-kubernetes)
>>> and created a keystore as described:
>>>
>>> keytool -genkeypair -alias swissbib.internal -keystore
>>> internal.keystore -dname "CN=flink.internal" -storepass verysecret
>>> -keypass verysecret -keyalg RSA -keysize 4096
>>>
>>> and deployed the flink-conf.yaml on the whole cluster
>>>
>>> (part of flink-conf.yaml)
>>> security.ssl.internal.enabled: true
>>> security.ssl.internal.keystore:
>>> /swissbib_index/apps/flink/conf/internal.keystore
>>> security.ssl.internal.truststore:
>>> /swissbib_index/apps/flink/conf/internal.keystore
>>> security.ssl.internal.keystore-password: verysecret
>>> security.ssl.internal.truststore-password: verysecret
>>> security.ssl.internal.key-password: verysecret
>>>
>>> but this doesn't solve the problem - still no connection between
>>> task-managers and job-managers.
>>>
>>> - another question: which ports have to be enabled in the firewall
>>> for a standalone cluster?
>>>
>>> Thanks for any hints!
>>>
>>> Günter
>>>
>>>
>>
>>
>