Savepoints - jobmanager.rpc.address

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Savepoints - jobmanager.rpc.address

ant burton
Hi, 

When taking a savepoint on AWS EMR I get the following error

[hadoop@ip-10-12-169-172 ~]$ flink savepoint e14a6402b6f1e547c4adf40f43861c27
Retrieving JobManager.

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.configuration.IllegalConfigurationException: Couldn't retrieve client for cluster
        at org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:925)
        at org.apache.flink.client.CliFrontend.getJobManagerGateway(CliFrontend.java:939)
        at org.apache.flink.client.CliFrontend.triggerSavepoint(CliFrontend.java:714)
        at org.apache.flink.client.CliFrontend.savepoint(CliFrontend.java:704)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1096)
        at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
        at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)
        at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130)
Caused by: java.lang.RuntimeException: Couldn't retrieve standalone cluster
        at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:48)
        at org.apache.flink.client.cli.DefaultCLI.retrieveCluster(DefaultCLI.java:74)
        at org.apache.flink.client.cli.DefaultCLI.retrieveCluster(DefaultCLI.java:38)
        at org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:920)
        ... 12 more
Caused by: org.apache.flink.util.ConfigurationException: Config parameter 'Key: 'jobmanager.rpc.address' , default: null (deprecated keys: [])' is missing (hostname/address of JobManager to connect to).
        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getJobManagerAddress(HighAvailabilityServicesUtils.java:119)
        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:76)
        at org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:131)
        at org.apache.flink.client.program.StandaloneClusterClient.<init>(StandaloneClusterClient.java:42)
        at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:46)
        ... 15 more

My configuration.json is 

[
    {
        "Classification": "flink-conf",
        "Properties": {
            "taskmanager.numberOfTaskSlots":"1",
            "state.backend": "filesystem",
            "state.checkpoints.dir": "s3://flink/checkpoints/",
            "state.backend.fs.checkpointdir": "s3://flink/checkpoints/"
        }
    }
]


Setting the following in configuration.json does not resolve the issue.

jobmanager.rpc.address: localhost or 0.0.0.0 or 127.0.0.1
jobmanager.rpc.port: 6123


Thanks,



Reply | Threaded
Open this post in threaded view
|

Re: Savepoints - jobmanager.rpc.address

Tzu-Li (Gordon) Tai
Hi!,

Since your running on AWS EMR, I’m assuming your deploying your Flink job / cluster on YARN?

If so, make sure to specify the YARN application id also.
You should do that by:
flink savepoint -yid <the YARN application id> <JobID>

Cheers,
Gordon

On 2 October 2017 at 9:39:09 PM, ant burton ([hidden email]) wrote:

Hi, 

When taking a savepoint on AWS EMR I get the following error

[hadoop@ip-10-12-169-172 ~]$ flink savepoint e14a6402b6f1e547c4adf40f43861c27
Retrieving JobManager.

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.configuration.IllegalConfigurationException: Couldn't retrieve client for cluster
        at org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:925)
        at org.apache.flink.client.CliFrontend.getJobManagerGateway(CliFrontend.java:939)
        at org.apache.flink.client.CliFrontend.triggerSavepoint(CliFrontend.java:714)
        at org.apache.flink.client.CliFrontend.savepoint(CliFrontend.java:704)
        at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1096)
        at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133)
        at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130)
        at org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
        at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
        at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130)
Caused by: java.lang.RuntimeException: Couldn't retrieve standalone cluster
        at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:48)
        at org.apache.flink.client.cli.DefaultCLI.retrieveCluster(DefaultCLI.java:74)
        at org.apache.flink.client.cli.DefaultCLI.retrieveCluster(DefaultCLI.java:38)
        at org.apache.flink.client.CliFrontend.retrieveClient(CliFrontend.java:920)
        ... 12 more
Caused by: org.apache.flink.util.ConfigurationException: Config parameter 'Key: 'jobmanager.rpc.address' , default: null (deprecated keys: [])' is missing (hostname/address of JobManager to connect to).
        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.getJobManagerAddress(HighAvailabilityServicesUtils.java:119)
        at org.apache.flink.runtime.highavailability.HighAvailabilityServicesUtils.createHighAvailabilityServices(HighAvailabilityServicesUtils.java:76)
        at org.apache.flink.client.program.ClusterClient.<init>(ClusterClient.java:131)
        at org.apache.flink.client.program.StandaloneClusterClient.<init>(StandaloneClusterClient.java:42)
        at org.apache.flink.client.deployment.StandaloneClusterDescriptor.retrieve(StandaloneClusterDescriptor.java:46)
        ... 15 more

My configuration.json is 

[
    {
        "Classification": "flink-conf",
        "Properties": {
            "taskmanager.numberOfTaskSlots":"1",
            "state.backend": "filesystem",
            "state.checkpoints.dir": "s3://flink/checkpoints/",
            "state.backend.fs.checkpointdir": "s3://flink/checkpoints/"
        }
    }
]


Setting the following in configuration.json does not resolve the issue.

jobmanager.rpc.address: localhost or 0.0.0.0 or 127.0.0.1
jobmanager.rpc.port: 6123


Thanks,