Simple batch job hangs if run twice

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Simple batch job hangs if run twice

Yassine MARZOUGUI
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine
Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Aljoscha Krettek
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine
Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Yassine MARZOUGUI

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine
Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Aljoscha Krettek
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine
Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Chesnay Schepler
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?

On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine

Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Yassine MARZOUGUI
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine


Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

rmetzger0
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine



Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Yassine MARZOUGUI
The input splits are correctly assgined. I noticed that whenever the job is stuck, that is because the task Combine (GroupReduce at first(DataSet.java:573)) keeps RUNNING and never switches to FINISHED.
I tried to debug the program at the first(100), but I couldn't do much. I attahced the full DEBUG output.

2016-09-22 12:10 GMT+02:00 Robert Metzger <[hidden email]>:
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine





output.log (245K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Fabian Hueske-2
Hi Yassine, can you share a stacktrace of the job when it got stuck?

Thanks, Fabian

2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
The input splits are correctly assgined. I noticed that whenever the job is stuck, that is because the task Combine (GroupReduce at first(DataSet.java:573)) keeps RUNNING and never switches to FINISHED.
I tried to debug the program at the first(100), but I couldn't do much. I attahced the full DEBUG output.

2016-09-22 12:10 GMT+02:00 Robert Metzger <[hidden email]>:
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine





Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Yassine MARZOUGUI
Hi Fabian,

Is it different from the output I already sent? (see attached file). If yes, how can I obtain the stacktrace of the job programmatically? Thanks.

Best,
Yassine

2016-09-23 10:55 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Yassine, can you share a stacktrace of the job when it got stuck?

Thanks, Fabian

2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
The input splits are correctly assgined. I noticed that whenever the job is stuck, that is because the task Combine (GroupReduce at first(DataSet.java:573)) keeps RUNNING and never switches to FINISHED.
I tried to debug the program at the first(100), but I couldn't do much. I attahced the full DEBUG output.

2016-09-22 12:10 GMT+02:00 Robert Metzger <[hidden email]>:
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine







output.log (245K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Yassine MARZOUGUI
Hi Fabian,

Not sure if this answers your question, here is the stack I got when debugging the combine and datasource operators when the job got stuck:

"DataSource (at main(BatchTest.java:28) (org.apache.flink.api.java.io.TupleCsvInputFormat)) (1/8)"
at java.lang.Object.wait(Object.java)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:163)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:93)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:163)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

"Combine (GroupReduce at first(DataSet.java:573)) (1/8)"
at java.lang.Object.wait(Object.java)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:163)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:93)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at org.apache.flink.api.java.functions.FirstReducer.reduce(FirstReducer.java:41)
at org.apache.flink.api.java.functions.FirstReducer.combine(FirstReducer.java:52)
at org.apache.flink.runtime.operators.AllGroupReduceDriver.run(AllGroupReduceDriver.java:152)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

Best,
Yassine


2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
Hi Fabian,

Is it different from the output I already sent? (see attached file). If yes, how can I obtain the stacktrace of the job programmatically? Thanks.

Best,
Yassine

2016-09-23 10:55 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Yassine, can you share a stacktrace of the job when it got stuck?

Thanks, Fabian

2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
The input splits are correctly assgined. I noticed that whenever the job is stuck, that is because the task Combine (GroupReduce at first(DataSet.java:573)) keeps RUNNING and never switches to FINISHED.
I tried to debug the program at the first(100), but I couldn't do much. I attahced the full DEBUG output.

2016-09-22 12:10 GMT+02:00 Robert Metzger <[hidden email]>:
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine







Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Fabian Hueske-2
Yes, log files and stacktraces are different things.
A stacktrace shows the call hierarchy of all threads in a JVM at the time when it is taken. So you can see the method that is currently executed (and from where it was called) when the stacktrace is taken. In case of a deadlock, you see where the program is waiting.

The stack you sent is only a part of the complete stacktrace. Most IDEs have a feature to take a stacktrace while they are executing a program.

2016-09-23 11:43 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
Hi Fabian,

Not sure if this answers your question, here is the stack I got when debugging the combine and datasource operators when the job got stuck:

"DataSource (at main(BatchTest.java:28) (org.apache.flink.api.java.io.TupleCsvInputFormat)) (1/8)"
at java.lang.Object.wait(Object.java)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:163)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:93)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:163)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

"Combine (GroupReduce at first(DataSet.java:573)) (1/8)"
at java.lang.Object.wait(Object.java)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:163)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:93)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at org.apache.flink.api.java.functions.FirstReducer.reduce(FirstReducer.java:41)
at org.apache.flink.api.java.functions.FirstReducer.combine(FirstReducer.java:52)
at org.apache.flink.runtime.operators.AllGroupReduceDriver.run(AllGroupReduceDriver.java:152)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

Best,
Yassine


2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
Hi Fabian,

Is it different from the output I already sent? (see attached file). If yes, how can I obtain the stacktrace of the job programmatically? Thanks.

Best,
Yassine

2016-09-23 10:55 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Yassine, can you share a stacktrace of the job when it got stuck?

Thanks, Fabian

2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
The input splits are correctly assgined. I noticed that whenever the job is stuck, that is because the task Combine (GroupReduce at first(DataSet.java:573)) keeps RUNNING and never switches to FINISHED.
I tried to debug the program at the first(100), but I couldn't do much. I attahced the full DEBUG output.

2016-09-22 12:10 GMT+02:00 Robert Metzger <[hidden email]>:
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine








Reply | Threaded
Open this post in threaded view
|

Re: Simple batch job hangs if run twice

Yassine MARZOUGUI
I found out how to dump the stacktrace (using jps & jtrace). Please find attached the stacktrace I got when the job got stuck.

Thanks,
Yassine

2016-09-23 11:48 GMT+02:00 Fabian Hueske <[hidden email]>:
Yes, log files and stacktraces are different things.
A stacktrace shows the call hierarchy of all threads in a JVM at the time when it is taken. So you can see the method that is currently executed (and from where it was called) when the stacktrace is taken. In case of a deadlock, you see where the program is waiting.

The stack you sent is only a part of the complete stacktrace. Most IDEs have a feature to take a stacktrace while they are executing a program.

2016-09-23 11:43 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
Hi Fabian,

Not sure if this answers your question, here is the stack I got when debugging the combine and datasource operators when the job got stuck:

"DataSource (at main(BatchTest.java:28) (org.apache.flink.api.java.io.TupleCsvInputFormat)) (1/8)"
at java.lang.Object.wait(Object.java)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:163)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:93)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at org.apache.flink.runtime.operators.util.metrics.CountingCollector.collect(CountingCollector.java:35)
at org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:163)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

"Combine (GroupReduce at first(DataSet.java:573)) (1/8)"
at java.lang.Object.wait(Object.java)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:163)
at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:93)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at org.apache.flink.api.java.functions.FirstReducer.reduce(FirstReducer.java:41)
at org.apache.flink.api.java.functions.FirstReducer.combine(FirstReducer.java:52)
at org.apache.flink.runtime.operators.AllGroupReduceDriver.run(AllGroupReduceDriver.java:152)
at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486)
at org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584)
at java.lang.Thread.run(Thread.java:745)

Best,
Yassine


2016-09-23 11:28 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
Hi Fabian,

Is it different from the output I already sent? (see attached file). If yes, how can I obtain the stacktrace of the job programmatically? Thanks.

Best,
Yassine

2016-09-23 10:55 GMT+02:00 Fabian Hueske <[hidden email]>:
Hi Yassine, can you share a stacktrace of the job when it got stuck?

Thanks, Fabian

2016-09-22 14:03 GMT+02:00 Yassine MARZOUGUI <[hidden email]>:
The input splits are correctly assgined. I noticed that whenever the job is stuck, that is because the task Combine (GroupReduce at first(DataSet.java:573)) keeps RUNNING and never switches to FINISHED.
I tried to debug the program at the first(100), but I couldn't do much. I attahced the full DEBUG output.

2016-09-22 12:10 GMT+02:00 Robert Metzger <[hidden email]>:
Can you try running with DEBUG logging level?
Then you should see if input splits are assigned. 
Also, you could try to use a debugger to see what's going on.

On Mon, Sep 19, 2016 at 2:04 PM, Yassine MARZOUGUI <[hidden email]> wrote:
Hi Chensey,

I am running Flink 1.1.2, and using NetBeans 8.1.
I made a screencast reproducing the problem here: http://recordit.co/P53OnFokN4.

Best,
Yassine


2016-09-19 10:04 GMT+02:00 Chesnay Schepler <[hidden email]>:
No, I can't recall that i had this happen to me.

I would enable logging and try again, as well as checking whether the second job is actually running through the WebInterface.

If you tell me your NetBeans version i can try to reproduce it.

Also, which version of Flink are you using?


On 19.09.2016 07:45, Aljoscha Krettek wrote:
Hmm, this sound like it could be IDE/Windows specific, unfortunately I don't have access to a windows machine. I'll loop in Chesnay how is using windows.

Chesnay, do you maybe have an idea what could be the problem? Have you ever encountered this?

On Sat, 17 Sep 2016 at 15:30 Yassine MARZOUGUI <[hidden email]> wrote:

Hi Aljoscha,

Thanks for your response. By the first time I mean I hit run from the IDE (I am using Netbeans on Windows) the first time after building the program. If then I stop it and run it again (without rebuidling) It is stuck in the state RUNNING. Sometimes I have to rebuild it, or close the IDE to be able to get an output. The behaviour is random, maybe it's related to the IDE or the OS and not necessarily Flink itself.


On Sep 17, 2016 15:16, "Aljoscha Krettek" <[hidden email]> wrote:
Hi,
when is the "first time". It seems you have tried this repeatedly so what differentiates a "first time" from the other times? Are you closing your IDE in-between or do you mean running the job a second time within the same program?

Cheers,
Aljoscha

On Fri, 9 Sep 2016 at 16:40 Yassine MARZOUGUI <[hidden email]> wrote:
Hi all,

When I run the following batch job inside the IDE for the first time, it outputs results and switches to FINISHED, but when I run it again it is stuck in the state RUNNING. The csv file size is 160 MB. What could be the reason for this behaviour?

public class BatchJob {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();

        env.readCsvFile("dump.csv")
                .ignoreFirstLine()
                .fieldDelimiter(";")
                .includeFields("111000")
                .types(String.class, String.class, String.class)
                .first(100)
                .print();

    }
}

Best,
Yassine










stacktrace.txt (52K) Download Attachment