Hello,
We are currently running jobs on Flink 1.4.2. Our usecase is as follows:
-service get request from customer - we submit job to flink using YarnClusterClient Sometimes we have up to 6 jobs at the same time. From time to time we got error as below: The program didn't contain a Flink job.
From logs we can see that main method from job is returning correct status, but for some reason later Flink throws that exception anyway. Do you know what could be a case here and how to prevent it from happening? Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Did you forget to call executionEnvironment.execute() after you define your Flink job? -- Rong On Mon, Jul 2, 2018 at 1:42 AM eSKa <[hidden email]> wrote: Hello, We are currently running jobs on Flink 1.4.2. Our usecase is as follows: |
No.
execute was called, and all calculation succeeded - there were job on dashboard with status FINISHED. after execute we had our logs that were claiming that everything succeded. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Hmm. That's strange. Can you explain a little more on how your YARN cluster is set up and how you configure the submission context? Also, did you try submitting the jobs in detach mode? Is this happening from time to time for one specific job graph? Or it is consistently throwing the exception for the same job? -- Rong On Mon, Jul 2, 2018 at 7:57 AM eSKa <[hidden email]> wrote: No. |
We are running same job all the time. And that error is happening from time to time.
Here is job submittion code:
private JobSubmissionResult submitProgramToCluster(PackagedProgram packagedProgram) throws JobSubmitterException,
And here our util for retrieving ClusterClient.
public class ClusterClientUtil {
What Yarn settings to you need? Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Are you executing these jobs
concurrently?
The ClusterClient was not written to be used concurrently in the same JVM, as it partially relies and mutates static fields. On 03.07.2018 09:50, eSKa wrote: We are running same job all the time. And that error is happening from time to time.
|
Yes - we are submitting jobs one by one.
How can we change that to work for our needs? -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
In reply to this post by Rong Rong
Hi, Let me summarize: 1) Sometimes you get the error message "org.apache.flink.client.program.ProgramMissingJobException: The program didn't contain a Flink job.". when submitting a program through the YarnClusterClient 2) The logs and the dashboard state that the job ran successful 3) The job performed all computations correctly. So the issue is that there is a invalid error message that suggests that a job failed but in fact it ran successfully. Is that correct? Thanks, Fabian 2018-07-02 17:14 GMT+02:00 Rong Rong <[hidden email]>:
|
In reply to this post by eSKa
HI, @chesnay I read the code of `ClusterClient`, and have not found the `static` field. So why cannot be used in the same jvm? (we also use `ClusterCLient` this way, so we really care about this feature) eSKa <[hidden email]> 于2018年7月3日周二 下午4:00写道: Yes - we are submitting jobs one by one. |
Dive into this call and you sill see
that it mutates static fields in the ExecutionEnvironment.
https://github.com/apache/flink/blob/master/flink-clients/src/main/java/org/apache/flink/client/program/ClusterClient.java#L422 On 03.07.2018 10:07, Chuanlei Ni wrote:
|
In reply to this post by Fabian Hueske-2
Yes - it seems that main method returns success but for some reason we have
that exception thrown. For now we applied workaround to catch exception and just skip it (later on our statusUpdater is reading statuses from FlinkDashboard). -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
I really interesting in making `ClusterClient` be used as multiple-instance in a jvm, because we need submit job in a long running process. I create a jira for this problem. eSKa <[hidden email]> 于2018年7月3日周二 下午4:20写道: Yes - it seems that main method returns success but for some reason we have |
Free forum by Nabble | Edit this page |