Hi Team, I'm exploring flink for one of my use case, I'm facing some issues while running a flink job in cluster mode. Below are the steps I followed to setup and run job in cluster mode : 1. Setup flink on google cloud dataproc using https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/flink 2. After setting up the cluster I could see the flink session started and could see the UI for the same. 3 Submitted job from dataproc master node using below command sudo HADOOP_CONF_DIR=/etc/hadoop/conf /usr/lib/flink/bin/flink run -m yarn-cluster -yid application_1592311654771_0001 -class com.sm.flink.FlinkDriver /usr/lib/flink/lib/flink-1.0.10-sm-SNAPSHOT.jar hdfs://cluster-flink-poc-m:8020/user/flink/rocksdb/ After running the job I see the job started successfully but created a mini local cluster and ran in local mode. I don't see any jobs submitted to JobManger and I also see 0 task managers on UI. Can someone please help me understand here?, do let me know what input is required to investigate the same. |
Are you by any chance creating a local
environment via
(Stream)ExecutionEnvironment#createLocalEnvironment?
On 17/06/2020 17:05, Sourabh Mehta
wrote:
|
No, I am not. On Wed, 17 Jun 2020 at 10:48 PM, Chesnay Schepler <[hidden email]> wrote:
|
Hi Sourabh, do you have access to the cluster logs? They could be helpful for debugging the problem. Which version of Flink are you using? Cheers, Till On Wed, Jun 17, 2020 at 7:39 PM Sourabh Mehta <[hidden email]> wrote:
|
Hi ,
application is using 1.10.0 but cluster is setup on 1.9.0. Yes I do have access. please find below starting logs from cluster 2020-06-17 11:28:18,989 INFO org.apache.shaded.flink.table.module.ModuleManager - Got FunctionDefinition equals from module core 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-06-17 11:28:20,538 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, cluster-flink-poc-m 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 12288 2020-06-17 11:28:20,539 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 12288 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 28 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 2048 2020-06-17 11:28:20,540 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf 2020-06-17 11:28:20,550 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.cpu.cores' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 1.7976931348623157E308 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.task.heap.size' , default: null (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.task.off-heap.size' , default: 0 bytes (fallback keys: []) required for local execution is not set, setting it to its default value 9223372036854775807 bytes 2020-06-17 11:28:20,552 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.network.min' , default: 64 mb (fallback keys: [{key=taskmanager.network.memory.min, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.network.max' , default: 1 gb (fallback keys: [{key=taskmanager.network.memory.max, isDeprecated=true}]) required for local execution is not set, setting it to its default value 64 mb 2020-06-17 11:28:20,553 INFO org.apache.shaded.flink.runtime.taskexecutor.TaskExecutorResourceUtils - The configuration option Key: 'taskmanager.memory.managed.size' , default: null (fallback keys: [{key=taskmanager.memory.size, isDeprecated=true}]) required for local execution is not set, setting it to its default value 128 mb 2020-06-17 11:28:20,558 INFO org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting Flink Mini Cluster 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, localhost 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.port, 6123 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.size, 1024m 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.size, 1024m 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 1 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 1 2020-06-17 11:28:20,561 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.execution.failover-strategy, region 2020-06-17 11:28:20,562 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.rpc.address, cluster-flink-poc-m 2020-06-17 11:28:20,562 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: jobmanager.heap.mb, 12288 2020-06-17 11:28:20,562 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.heap.mb, 12288 2020-06-17 11:28:20,562 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.numberOfTaskSlots, 4 2020-06-17 11:28:20,562 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: parallelism.default, 28 2020-06-17 11:28:20,563 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: taskmanager.network.numberOfBuffers, 2048 2020-06-17 11:28:20,563 INFO org.apache.shaded.flink.configuration.GlobalConfiguration - Loading configuration property: fs.hdfs.hadoopconf, /etc/hadoop/conf 2020-06-17 11:28:20,563 INFO org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting Metrics Registry 2020-06-17 11:28:20,610 INFO org.apache.shaded.flink.runtime.metrics.MetricRegistryImpl - No metrics reporter configured, no metrics will be exposed/reported. 2020-06-17 11:28:20,610 INFO org.apache.shaded.flink.runtime.minicluster.MiniCluster - Starting RPC Service(s) 2020-06-17 11:28:20,976 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2020-06-17 11:28:21,070 INFO org.apache.shaded.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Trying to start actor system at :0 2020-06-17 11:28:21,115 INFO akka.event.slf4j.Slf4jLogger - Slf4jLogger started 2020-06-17 11:28:21,131 INFO akka.remote.Remoting - Starting remoting 2020-06-17 11:28:21,279 INFO akka.remote.Remoting - Remoting started; listening on addresses :[akka.tcp://flink-metrics@<<IP:PORT>>] 2020-06-17 11:28:21,283 INFO org.apache.shaded.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Actor system started at akka.tcp://flink-metrics@<<IP:PORT>> Note : I have removed a few IP addresses from the log. On Thu, Jun 18, 2020 at 12:08 AM Till Rohrmann <[hidden email]> wrote:
|
Is your user-jar packaging and
relocating Flink classes? If so, then your job actually operate
against the classes provided by the cluster, which, well, just
wouldn't work.
On 18/06/2020 09:34, Sourabh Mehta
wrote:
|
Free forum by Nabble | Edit this page |