Hi there!
We are doing some POCs submitting jobs remotely to Flink. We tried with Flink CLI and now we´re testing the Rest API.
So the point is that when we try to execute a set of requests in an async way (using CompletableFutures) only a couple of them run successfully. For the rest we get the exception copied at the end of the email.
Do you know the reason for this?
Thanks in advance!!
Regards,
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
at org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
at org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
at org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
at org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
at org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
at org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
at io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
at io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
at org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
at io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
at java.lang.Thread.run(Thread.java:748)
\nCaused by: java.util.ConcurrentModificationException
at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
at java.util.ArrayList$Itr.next(ArrayList.java:851)
at org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1065)
at org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1032)
at org.apache.flink.client.program.OptimizerPlanEnvironment.execute(OptimizerPlanEnvironment.java:47)
at com.piksel.sequoia.media.common.launcher.Launcher.main(Launcher.java:56)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
... 39 more\n”}
This message is private and confidential. If you have received this message in error, please notify the sender or [hidden email] and remove it from your system. Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road SE, Suite 400, Atlanta, GA 30339 |
When you submit a program via the REST API, the main method executes inside the JobManager process. Unfortunately a static variable is used to establish the execution environment that the program obtains from `ExecutionEnvironment.getExecutionEnvironment()`. From the stack trace it appears that two main methods are executing simultaneously and one is corrupting the other. On Mon, Aug 7, 2017 at 8:21 AM, Francisco Gonzalez Barea <[hidden email]> wrote:
|
Aha ok… Thanks for your answer Eron.
Regards
|
I quickly talked to Till about this. The new JobManager, once FLIP-6 is implemented, will have a new REST endpoint that allows submitting a JobGraph directly. With this, we no longer have to execute the user main() method in the WebRuntimeMonitor (which is a component that the current JobManager process loads to serve the web frontend and the REST interface).
This should solve the problem, but unfortunately it doesn't solve your current problem. Best, Aljoscha
|
Hello,
Going back on this thread, quick question: Will this be supported in next Flink version? If not, when is it expected to be included?
Regards
|
Hi,
Unfortunately, the FLIP-6 efforts are taking longer than expected and we won't have those changes to the REST API in the 1.4 release (which should happen in about a month). We are planning to very quickly release 1.5 after that, with the changes to the REST API. The only work-around I can think of so far is to use one cluster, i.e. one JobManager per job that you submit. This is actually what we started recommending a while ago and is something that we will recommend more aggressively in the Future because having only one Job per JobManager makes managing and debugging easier. Best, Aljoscha
|
Hi guys,
After investigating a bit more about this topic, we found a solution adding a small change in the Flink-1.3.2 source code. We found that the issue occurred when different threads tried to build the Tuple2<JobGraph, ClassLoader> object at the same time (due to they use the static ExecutionEnvironmnet variable mentioned before in this thread). So, we just used a semaphore to lock threads at that point. We have done some perf suites in our side, and this seems to be working fine (you can take a look at the code at the attached file, and if you decide to include it in the flink source code we can create a pull request). JarRunHandler.java <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1051/JarRunHandler.java> However, after getting this, we have noticed we don´t get too much performance improvement in Flink. Besides knowing that the semaphore we´ve added would add some latency, we have also realised that some task managers are not using all their capacity at all. We have increased the CPU and Memory capacity for the Flink instance and the same, we don´t realise too much improvement. Do you have any clue or hint to make Flink use better the resources? Thanks in advance -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Sorry guys,
in the previous message, when I talked about the task managers performance, I meant *Jobmanager* performance Francisco Gonzalez wrote > Hi guys, > > After investigating a bit more about this topic, we found a solution > adding > a small change in the Flink-1.3.2 source code. > > We found that the issue occurred when different threads tried to build the > Tuple2<JobGraph, ClassLoader> object at the same time (due to they > use the > static ExecutionEnvironmnet variable mentioned before in this thread). So, > we just used a semaphore to lock threads at that point. We have done some > perf suites in our side, and this seems to be working fine (you can take a > look at the code at the attached file, and if you decide to include it in > the flink source code we can create a pull request). JarRunHandler.java > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1051/JarRunHandler.java> > > However, after getting this, we have noticed we don´t get too much > performance improvement in Flink. Besides knowing that the semaphore we´ve > added would add some latency, we have also realised that some task > managers > are not using all their capacity at all. We have increased the CPU and > Memory capacity for the Flink instance and the same, we don´t realise too > much improvement. > > Do you have any clue or hint to make Flink use better the resources? > > Thanks in advance > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Free forum by Nabble | Edit this page |