Hi to all,
my job seems to be stucked and there's nothing logged also in debug mode. The only strange thing is a Received message SendHeartbeat at akka://flink/user/taskmanager_1 from Actor[akka://flink/deadLetters]. Could it be a symptom of a problem? Best, Flavio |
That is usually nothing to worry about. This just means that the message was sent without specifying a sender. What Akka then does is to use the `/deadLetters` actor as the sender. What kind of job is it? Cheers, Till On Fri, Jul 17, 2015 at 6:30 PM, Flavio Pompermaier <[hidden email]> wrote:
|
The job is quite simple..it just reads 10 parquet dirs, extract some infos out of the thrift objects and generates Tuple3,make a project() and a distinct() to call an external service only for some of the extracted ids (the external service translates the local id into a global one). Any idea of what can cause this strange behaviour? On 17 Jul 2015 19:04, "Till Rohrmann" <[hidden email]> wrote:
|
Without the logs it is hard to say. On Sat, Jul 18, 2015 at 11:41 AM, Flavio Pompermaier <[hidden email]> wrote:
|
Can you share the logs, so we can check for suspicious error messages? On Mon, Jul 20, 2015 at 9:34 AM, Till Rohrmann <[hidden email]> wrote:
|
@Stephan: This is a with high probability a deadlock in the spillable partitions. I'm looking into it (https://issues.apache.org/jira/browse/FLINK-2384)
@Flavio: can you run your job in force pipelined mode for the time being and check whether it works. env.getconfig().setExecutionMode(ExecutionMode.PIPELINED_FORCED); – Ufuk On 21 Jul 2015, at 15:05, Stephan Ewen <[hidden email]> wrote: > Can you share the logs, so we can check for suspicious error messages? > > On Mon, Jul 20, 2015 at 9:34 AM, Till Rohrmann <[hidden email]> wrote: > Without the logs it is hard to say. > > On Sat, Jul 18, 2015 at 11:41 AM, Flavio Pompermaier <[hidden email]> wrote: > The job is quite simple..it just reads 10 parquet dirs, extract some infos out of the thrift objects and generates Tuple3,make a project() and a distinct() to call an external service only for some of the extracted ids (the external service translates the local id into a global one). > Then there are two cogroup() to substitute the obtained global ids into the original tuple5. > Then the job outputs the mapping table into a csv and the translated tuple5 into a file. > The strange thing is that the job finish in about 1hour with the default memory settings (in Eclipse) while if I set -Xmx12gb it hangs). > I have 16GB of ram (maybe not all 12 available but I have 15 GB of swap on a SSD disk). > > Any idea of what can cause this strange behaviour? > > On 17 Jul 2015 19:04, "Till Rohrmann" <[hidden email]> wrote: > That is usually nothing to worry about. This just means that the message was sent without specifying a sender. What Akka then does is to use the `/deadLetters` actor as the sender. > > What kind of job is it? > > Cheers, > Till > > On Fri, Jul 17, 2015 at 6:30 PM, Flavio Pompermaier <[hidden email]> wrote: > Hi to all, > > > > > my job seems to be stucked and there's nothing logged also in debug mode. > The only strange thing is a > > Received message SendHeartbeat at akka://flink/user/taskmanager_1 from Actor[akka://flink/deadLetters]. > > Could it be a symptom of a problem? > > Best, > Flavio > > > |
Setting env.getconfig().setExecutionMode(ExecutionMode.PIPELINED_FORCED) make it work :)
On Tue, Jul 21, 2015 at 4:20 PM, Ufuk Celebi <[hidden email]> wrote: @Stephan: This is a with high probability a deadlock in the spillable partitions. I'm looking into it (https://issues.apache.org/jira/browse/FLINK-2384) |
OK, but I need to get back to you soon to test the fix as well. :D I hope that's OK. ;)
On 21 Jul 2015, at 17:45, Flavio Pompermaier <[hidden email]> wrote: > Setting env.getconfig().setExecutionMode(ExecutionMode.PIPELINED_FORCED) make it work :) |
Free forum by Nabble | Edit this page |