Hi Team,
Running a flink job on Yarn, I am trying to make connections to couchbase DB in one of my map functions in Flink Streaming job. But my task manager containers keep failing and keep assigning new containers and not giving me an opportunity to get any useful logs. val cluster = Cluster.connect("host", "user", "pwd") val bucket = cluster.bucket("bucket") val collection = bucket.defaultCollection Only thing I see is yarn exception: java.lang.Exception: Container released on a *lost* node at org.apache.flink.yarn.YarnResourceManager.lambda$onContainersCompleted$0(YarnResourceManager.java:343) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRunAsync(AkkaRpcActor.java:397) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:190) at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:74) at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:152) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171) at akka.actor.Actor$class.aroundReceive(Actor.scala:517) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592) at akka.actor.ActorCell.invoke(ActorCell.scala:561) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258) at akka.dispatch.Mailbox.run(Mailbox.scala:225) at akka.dispatch.Mailbox.exec(Mailbox.scala:235) at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Could you please provide any insight on how to get logs. And why a simple connection will not work. Note: it works in my local system yarn. Regards, Vijay |
Actually got this message in rolledover container logs: [org.slf4j.impl.Log4jLoggerFactory] Any suggestions on how to fix it ? On Mon, Aug 24, 2020 at 12:53 PM Vijayendra Yadav <[hidden email]> wrote:
|
Another one - Exception in thread "FileCache shutdown hook" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "FileCache shutdown hook" Regards, Vijay On Mon, Aug 24, 2020 at 1:04 PM Vijayendra Yadav <[hidden email]> wrote:
|
I think at least you have two different exceptions. > java.lang.Exception: Container released on a *lost* node This usually means a Yarn nodemanager is down. So all the containers running on this node will be released and rescheduled to a new one. If you want to figure out the root cause, you need to check the Yarn nodemanager logs. > java.lang.OutOfMemoryError: Metaspace Could you check the value of flink configuration "taskmanager.memory.jvm-metaspace.size"? If it is too small, increasing it will help. Usually, 256m is enough for most cases. Best, Yang Vijayendra Yadav <[hidden email]> 于2020年8月25日周二 上午4:51写道:
|
Thanks Yang that helped.
Sent from my iPhone On Aug 24, 2020, at 8:44 PM, Yang Wang <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |