Re: Jira issue Flink-11127

Posted by Boris Lublinsky on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Jira-issue-Flink-11127-tp26180p26280.html

Konstantin, it still does not quite work
The IP is still in place, but…

Here is Job manager log
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
Starting Job Manager
config file: 
jobmanager.rest.address: crabby-kudu-fdp-flink-jobmanager-service
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 1
rest.port: 8081
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
blob.server.port: 6124
query.server.port: 6125
Starting standalonesession as a console application on host crabby-kudu-fdp-flink-jobmanager-85c8d799db-46rj2.
2019-02-21 21:00:37,803 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-02-21 21:00:37,804 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Starting StandaloneSessionClusterEntrypoint (Version: 1.7.1, Rev:89eafb4, Date:14.12.2018 @ 15:48:34 GMT)
2019-02-21 21:00:37,804 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  OS current user: ?
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Maximum heap size: 981 MiBytes
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JAVA_HOME: /docker-java-home/jre
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  No Hadoop Dependency available
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  JVM Options:
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xms1024m
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Xmx1024m
2019-02-21 21:00:37,805 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Program Arguments:
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --configDir
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     /opt/flink/conf
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     --executionMode
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -     cluster
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         -  Classpath: /opt/flink/lib/flink-metrics-prometheus-1.7.1.jar:/opt/flink/lib/flink-python_2.11-1.7.1.jar:/opt/flink/lib/flink-queryable-state-runtime_2.11-1.7.1.jar:/opt/flink/lib/flink-table_2.11-1.7.1.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.11-1.7.1.jar:::
2019-02-21 21:00:37,806 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - --------------------------------------------------------------------------------
2019-02-21 21:00:37,808 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-02-21 21:00:37,822 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rest.address, crabby-kudu-fdp-flink-jobmanager-service
2019-02-21 21:00:37,822 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-02-21 21:00:37,823 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-02-21 21:00:37,823 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-02-21 21:00:37,823 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2019-02-21 21:00:37,823 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-02-21 21:00:37,824 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: rest.port, 8081
2019-02-21 21:00:37,824 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2019-02-21 21:00:37,825 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.prom.class, org.apache.flink.metrics.prometheus.PrometheusReporter
2019-02-21 21:00:37,825 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.prom.port, 9249
2019-02-21 21:00:37,825 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: blob.server.port, 6124
2019-02-21 21:00:37,825 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: query.server.port, 6125
2019-02-21 21:00:38,010 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Starting StandaloneSessionClusterEntrypoint.
2019-02-21 21:00:38,011 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install default filesystem.
2019-02-21 21:00:38,016 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2019-02-21 21:00:38,023 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Install security context.
2019-02-21 21:00:38,031 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2019-02-21 21:00:38,043 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2019-02-21 21:00:38,044 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Initializing cluster services.
2019-02-21 21:00:38,513 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at 127.0.0.1:6123
2019-02-21 21:00:39,304 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-02-21 21:00:39,411 INFO  akka.remote.Remoting                                          - Starting remoting
2019-02-21 21:00:39,570 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:00:39,602 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor system started at <a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123
2019-02-21 21:00:39,617 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-02-21 21:00:39,626 INFO  org.apache.flink.runtime.blob.BlobServer                      - Created BLOB server storage directory /tmp/blobStore-12db5847-9543-43ad-a7fa-19de8e907ed6
2019-02-21 21:00:39,629 INFO  org.apache.flink.runtime.blob.BlobServer                      - Started BLOB server at 0.0.0.0:6124 - max concurrent requests: 50 - max backlog: 1000
2019-02-21 21:00:39,649 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - Configuring prom with {port=9249, class=org.apache.flink.metrics.prometheus.PrometheusReporter}.
2019-02-21 21:00:39,658 INFO  org.apache.flink.metrics.prometheus.PrometheusReporter        - Started PrometheusReporter HTTP server on port 9249.
2019-02-21 21:00:39,658 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - Reporting metrics for reporter prom of type org.apache.flink.metrics.prometheus.PrometheusReporter.
2019-02-21 21:00:39,659 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Trying to start actor system at 127.0.0.1:0
2019-02-21 21:00:39,714 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-02-21 21:00:39,720 INFO  akka.remote.Remoting                                          - Starting remoting
2019-02-21 21:00:39,727 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[<a href="akka.tcp://flink-metrics@127.0.0.1:34006" class="">akka.tcp://flink-metrics@127.0.0.1:34006]
2019-02-21 21:00:39,728 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint         - Actor system started at <a href="akka.tcp://flink-metrics@127.0.0.1:34006" class="">akka.tcp://flink-metrics@127.0.0.1:34006
2019-02-21 21:00:39,797 INFO  org.apache.flink.runtime.dispatcher.FileArchivedExecutionGraphStore  - Initializing FileArchivedExecutionGraphStore: Storage directory /tmp/executionGraphStore-757ae8c1-c839-4666-9d27-697c34214187, expiration time 3600000, maximum cache size 52428800 bytes.
2019-02-21 21:00:39,821 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /tmp/blobStore-71959baf-25bb-4182-864a-5f4873ea9988
2019-02-21 21:00:39,838 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-02-21 21:00:39,839 WARN  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Upload directory /tmp/flink-web-8dfc9112-0fc2-439f-aac5-2bbe5a003835/flink-web-upload does not exist, or has been deleted externally. Previously uploaded files are no longer available.
2019-02-21 21:00:39,840 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Created directory /tmp/flink-web-8dfc9112-0fc2-439f-aac5-2bbe5a003835/flink-web-upload for file uploads.
2019-02-21 21:00:39,896 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Starting rest endpoint.
2019-02-21 21:00:40,611 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - Log file environment variable 'log.file' is not set.
2019-02-21 21:00:40,611 WARN  org.apache.flink.runtime.webmonitor.WebMonitorUtils           - JobManager log files are unavailable in the web dashboard. Log file location not found in environment variable 'log.file' or configuration key 'Key: 'web.log.path' , default: null (deprecated keys: [jobmanager.web.log.path])'.
2019-02-21 21:00:41,098 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:00:41,301 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at 127.0.0.1:8081
2019-02-21 21:00:41,301 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://127.0.0.1:8081  was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000
2019-02-21 21:00:41,301 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://127.0.0.1:8081 .
2019-02-21 21:00:41,598 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at <a href="akka://flink/user/resourcemanager" class="">akka://flink/user/resourcemanager .
2019-02-21 21:00:41,616 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at <a href="akka://flink/user/dispatcher" class="">akka://flink/user/dispatcher .
2019-02-21 21:00:41,711 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager <a href="akka.tcp://flink@127.0.0.1:6123/user/resourcemanager" class="">akka.tcp://flink@127.0.0.1:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000
2019-02-21 21:00:41,712 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.
2019-02-21 21:00:41,807 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher <a href="akka.tcp://flink@127.0.0.1:6123/user/dispatcher" class="">akka.tcp://flink@127.0.0.1:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000
2019-02-21 21:00:41,898 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Recovering all persisted jobs.
2019-02-21 21:00:44,420 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:01:00,434 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:01:04,353 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:01:20,474 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:01:24,393 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:01:40,514 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:01:44,433 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:02:00,554 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]
2019-02-21 21:02:04,473 ERROR akka.remote.EndpointWriter                                    - dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/]] arriving at [<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123] 
inbound addresses are [<a href="akka.tcp://flink@127.0.0.1:6123" class="">akka.tcp://flink@127.0.0.1:6123]

And here is task manager

metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
Starting Task Manager
taskmanager.host : 10.131.2.148
config file: 
jobmanager.rpc.address: crabby-kudu-fdp-flink-jobmanager-service
jobmanager.rpc.port: 6123
jobmanager.heap.size: 1024m
taskmanager.heap.size: 1024m
taskmanager.numberOfTaskSlots: 16
parallelism.default: 1
rest.port: 8081
metrics.reporters: prom
metrics.reporter.prom.class: org.apache.flink.metrics.prometheus.PrometheusReporter
metrics.reporter.prom.port: 9249
taskmanager.host : 10.131.2.148
blob.server.port: 6124
query.server.port: 6125
Starting taskexecutor as a console application on host crabby-kudu-fdp-flink-taskmanager-9f548f744-xlfqg.
2019-02-21 21:00:38,013 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - --------------------------------------------------------------------------------
2019-02-21 21:00:38,014 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Starting TaskManager (Version: 1.7.1, Rev:89eafb4, Date:14.12.2018 @ 15:48:34 GMT)
2019-02-21 21:00:38,014 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  OS current user: ?
2019-02-21 21:00:38,014 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Current Hadoop/Kerberos user: <no hadoop dependency found>
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Maximum heap size: 922 MiBytes
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JAVA_HOME: /docker-java-home/jre
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  No Hadoop Dependency available
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  JVM Options:
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:+UseG1GC
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xms922M
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Xmx922M
2019-02-21 21:00:38,015 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -XX:MaxDirectMemorySize=8388607T
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlog4j.configuration=file:/opt/flink/conf/log4j-console.properties
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     -Dlogback.configurationFile=file:/opt/flink/conf/logback-console.xml
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Program Arguments:
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     --configDir
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -     /opt/flink/conf
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       -  Classpath: /opt/flink/lib/flink-metrics-prometheus-1.7.1.jar:/opt/flink/lib/flink-python_2.11-1.7.1.jar:/opt/flink/lib/flink-queryable-state-runtime_2.11-1.7.1.jar:/opt/flink/lib/flink-table_2.11-1.7.1.jar:/opt/flink/lib/log4j-1.2.17.jar:/opt/flink/lib/slf4j-log4j12-1.7.15.jar:/opt/flink/lib/flink-dist_2.11-1.7.1.jar:::
2019-02-21 21:00:38,016 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - --------------------------------------------------------------------------------
2019-02-21 21:00:38,018 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Registered UNIX signal handlers for [TERM, HUP, INT]
2019-02-21 21:00:38,021 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Maximum number of open file descriptors is 1048576.
2019-02-21 21:00:38,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.address, crabby-kudu-fdp-flink-jobmanager-service
2019-02-21 21:00:38,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.rpc.port, 6123
2019-02-21 21:00:38,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: jobmanager.heap.size, 1024m
2019-02-21 21:00:38,032 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.heap.size, 1024m
2019-02-21 21:00:38,033 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.numberOfTaskSlots, 16
2019-02-21 21:00:38,033 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: parallelism.default, 1
2019-02-21 21:00:38,033 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: rest.port, 8081
2019-02-21 21:00:38,034 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporters, prom
2019-02-21 21:00:38,034 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.prom.class, org.apache.flink.metrics.prometheus.PrometheusReporter
2019-02-21 21:00:38,035 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: metrics.reporter.prom.port, 9249
2019-02-21 21:00:38,035 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: taskmanager.host, 10.131.2.148
2019-02-21 21:00:38,035 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: blob.server.port, 6124
2019-02-21 21:00:38,035 INFO  org.apache.flink.configuration.GlobalConfiguration            - Loading configuration property: query.server.port, 6125
2019-02-21 21:00:38,041 INFO  org.apache.flink.core.fs.FileSystem                           - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2019-02-21 21:00:38,060 INFO  org.apache.flink.runtime.security.modules.HadoopModuleFactory  - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2019-02-21 21:00:38,082 INFO  org.apache.flink.runtime.security.SecurityUtils               - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2019-02-21 21:00:43,278 WARN  org.apache.flink.configuration.Configuration                  - Config uses deprecated configuration key 'jobmanager.rpc.address' instead of proper key 'rest.address'
2019-02-21 21:00:43,281 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Using configured hostname/address for TaskManager: 10.131.2.148.
2019-02-21 21:00:43,283 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Trying to start actor system at 10.131.2.148:0
2019-02-21 21:00:43,686 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-02-21 21:00:43,736 INFO  akka.remote.Remoting                                          - Starting remoting
2019-02-21 21:00:43,850 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[<a href="akka.tcp://flink@10.131.2.148:38454" class="">akka.tcp://flink@10.131.2.148:38454]
2019-02-21 21:00:43,857 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils         - Actor system started at <a href="akka.tcp://flink@10.131.2.148:38454" class="">akka.tcp://flink@10.131.2.148:38454
2019-02-21 21:00:43,864 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Trying to start actor system at 10.131.2.148:0
2019-02-21 21:00:43,881 INFO  akka.event.slf4j.Slf4jLogger                                  - Slf4jLogger started
2019-02-21 21:00:43,888 INFO  akka.remote.Remoting                                          - Starting remoting
2019-02-21 21:00:43,897 INFO  akka.remote.Remoting                                          - Remoting started; listening on addresses :[<a href="akka.tcp://flink-metrics@10.131.2.148:34162" class="">akka.tcp://flink-metrics@10.131.2.148:34162]
2019-02-21 21:00:43,898 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Actor system started at <a href="akka.tcp://flink-metrics@10.131.2.148:34162" class="">akka.tcp://flink-metrics@10.131.2.148:34162
2019-02-21 21:00:43,916 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - Configuring prom with {port=9249, class=org.apache.flink.metrics.prometheus.PrometheusReporter}.
2019-02-21 21:00:43,925 INFO  org.apache.flink.metrics.prometheus.PrometheusReporter        - Started PrometheusReporter HTTP server on port 9249.
2019-02-21 21:00:43,926 INFO  org.apache.flink.runtime.metrics.MetricRegistryImpl           - Reporting metrics for reporter prom of type org.apache.flink.metrics.prometheus.PrometheusReporter.
2019-02-21 21:00:43,932 INFO  org.apache.flink.runtime.blob.PermanentBlobCache              - Created BLOB cache storage directory /tmp/blobStore-da779bfd-52ab-4e50-ae69-37cc363f0880
2019-02-21 21:00:43,934 INFO  org.apache.flink.runtime.blob.TransientBlobCache              - Created BLOB cache storage directory /tmp/blobStore-9f8aacaf-dede-45c6-9dba-34969b4adcba
2019-02-21 21:00:43,935 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner       - Starting TaskManager with ResourceID: 24acb543dbb8a7dd0b3f4f92bce93a8f
2019-02-21 21:00:43,939 INFO  org.apache.flink.runtime.io.network.netty.NettyConfig         - NettyConfig [server address: /10.131.2.148, server port: 0, ssl enabled: false, memory segment size (bytes): 32768, transport type: NIO, number of server threads: 16 (manual), number of client threads: 16 (manual), server connect backlog: 0 (use Netty's default), client connect timeout (sec): 120, send/receive buffer size (bytes): 0 (use Netty's default)]
2019-02-21 21:00:43,978 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Temporary file directory '/tmp': total 79 GB, usable 19 GB (24.05% usable)
2019-02-21 21:00:44,050 INFO  org.apache.flink.runtime.io.network.buffer.NetworkBufferPool  - Allocated 102 MB for network buffer pool (number of memory segments: 3278, bytes per segment: 32768).
2019-02-21 21:00:44,105 INFO  org.apache.flink.runtime.io.network.NetworkEnvironment        - Starting the network environment and its components.
2019-02-21 21:00:44,141 INFO  org.apache.flink.runtime.io.network.netty.NettyClient         - Successful initialization (took 34 ms).
2019-02-21 21:00:44,187 INFO  org.apache.flink.runtime.io.network.netty.NettyServer         - Successful initialization (took 46 ms). Listening on SocketAddress /10.131.2.148:46191.
2019-02-21 21:00:44,194 INFO  org.apache.flink.queryablestate.server.KvStateServerImpl      - Started Queryable State Server @ /10.131.2.148:9067.
2019-02-21 21:00:44,206 INFO  org.apache.flink.queryablestate.client.proxy.KvStateClientProxyImpl  - Started Queryable State Proxy Server @ /10.131.2.148:9069.
2019-02-21 21:00:44,207 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerServices     - Limiting managed memory to 0.7 of the currently free heap space (639 MB), memory will be allocated lazily.
2019-02-21 21:00:44,210 INFO  org.apache.flink.runtime.io.disk.iomanager.IOManager          - I/O manager uses directory /tmp/flink-io-d1a33d1b-838f-4082-86b7-1ade59bdda8a for spill files.
2019-02-21 21:00:44,280 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration  - Messages have a max timeout of 10000 ms
2019-02-21 21:00:44,291 INFO  org.apache.flink.runtime.rpc.akka.AkkaRpcService              - Starting RPC endpoint for org.apache.flink.runtime.taskexecutor.TaskExecutor at <a href="akka://flink/user/taskmanager_0" class="">akka://flink/user/taskmanager_0 .
2019-02-21 21:00:44,305 INFO  org.apache.flink.runtime.taskexecutor.JobLeaderService        - Start job leader service.
2019-02-21 21:00:44,305 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager(00000000000000000000000000000000)" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager(00000000000000000000000000000000).
2019-02-21 21:00:44,306 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-807b9b28-6656-4bf9-b5ee-4ce41f3b4513
2019-02-21 21:00:54,330 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:01:14,370 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:01:34,409 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:01:54,449 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:02:14,490 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:02:34,529 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:02:54,569 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:03:14,610 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..
2019-02-21 21:03:34,649 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address <a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(<a href="akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/" class="">akka.tcp://flink@crabby-kudu-fdp-flink-jobmanager-service:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "akka.actor.Identify"..

Something is still not connected

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

On Feb 21, 2019, at 2:05 AM, Konstantin Knauf <[hidden email]> wrote:

Hi Boris,

the exact command depends on the docker-entrypoint.sh script and the image you are using. For the example contained in the Flink repository it is "task-manager", I think. The important thing is to pass "taskmanager.host" to the Taskmanager process. You can verify by checking the Taskmanager logs. These should contain lines like below:

2019-02-21 08:03:00,004 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] -  Program Arguments:
2019-02-21 08:03:00,008 INFO  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] -     -Dtaskmanager.host=10.12.10.173

In the Jobmanager logs you should see that the Taskmanager is registered under the IP above in a line similar to:

2019-02-21 08:03:26,874 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Registering TaskManager with ResourceID a0513ba2c472d2d1efc07626da9c1bda (akka.tcp://flink@10.12.10.173:46531/user/taskmanager_0) at ResourceManager

A service per Taskmanager is not required. The purpose of the config parameter is that the Jobmanager addresses the taskmanagers by IP instead of hostname.

Hope this helps!

Cheers,

Konstantin



On Wed, Feb 20, 2019 at 4:37 PM Boris Lublinsky <[hidden email]> wrote:
Also, The suggested workaround does not quite work.
2019-02-20 15:27:43,928 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://flink-metrics@flink-taskmanager-1:6170] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink-metrics@flink-taskmanager-1:6170]] Caused by: [flink-taskmanager-1: No address associated with hostname]
2019-02-20 15:27:48,750 ERROR org.apache.flink.runtime.rest.handler.legacy.files.StaticFileServerHandler - Caught exception

I think the problem is that its trying to connect to flink-task-manager-1

Using busybody to experiment with nslookup, I can see
/ # nslookup flink-taskmanager-1.flink-taskmanager
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

Name:      flink-taskmanager-1.flink-taskmanager
Address 1: 10.131.2.136 flink-taskmanager-1.flink-taskmanager.flink.svc.cluster.local
/ # nslookup flink-taskmanager-1
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

nslookup: can't resolve 'flink-taskmanager-1'
/ # nslookup flink-taskmanager-0.flink-taskmanager
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

Name:      flink-taskmanager-0.flink-taskmanager
Address 1: 10.131.0.111 flink-taskmanager-0.flink-taskmanager.flink.svc.cluster.local
/ # nslookup flink-taskmanager-0
Server:    10.0.11.151
Address 1: 10.0.11.151 ip-10-0-11-151.us-west-2.compute.internal

nslookup: can't resolve 'flink-taskmanager-0'
/ # 

So the name should be postfixed with the service name. How do I force it? I suspect I am missing config parameter

 
Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/

On Feb 19, 2019, at 4:33 AM, Konstantin Knauf <[hidden email]> wrote:

Hi Boris,

the solution is actually simpler than it sounds from the ticket. The only thing you need to do is to set the "taskmanager.host" to the Pod's IP address in the Flink configuration. The easiest way to do this is to pass this config dynamically via a command-line parameter. 

The Deployment spec could looks something like this:
containers:
- name: taskmanager
[...]
args:
- "taskmanager.sh"
- "start-foreground"
- "-Dtaskmanager.host=$(K8S_POD_IP)"
[...]
  env:
- name: K8S_POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP

Hope this helps and let me know if this works. 

Best, 

Konstantin

On Sun, Feb 17, 2019 at 9:51 PM Boris Lublinsky <[hidden email]> wrote:
Apparently there is a workaround for it.
Is it possible provide the complete helm chart for it.
Bits and pieces are in the ticket, but it would be nice to see the full chart

Boris Lublinsky
FDP Architect
[hidden email]
https://www.lightbend.com/



--
Konstantin Knauf | Solutions Architect
+49 160 91394525


Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   



--
Konstantin Knauf | Solutions Architect
+49 160 91394525

Follow us @VervericaData
--
Join Flink Forward - The Apache Flink Conference
Stream Processing | Event Driven | Real Time
--
Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
--
Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen