Job hangs

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Job hangs

Timur Fayruzov
Hello,

Now I'm at the stage where my job seem to completely hang. Source code is attached (it won't compile but I think gives a very good idea of what happens). Unfortunately I can't provide the datasets. Most of them are about 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB memory for each.

It was working for smaller input sizes. Any idea on what I can do differently is appreciated.

Thans,
Timur 

FaithResolution.scala (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Till Rohrmann

Could you share the logs with us, Timur? That would be very helpful.

Cheers,
Till

On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]> wrote:
Hello,

Now I'm at the stage where my job seem to completely hang. Source code is attached (it won't compile but I think gives a very good idea of what happens). Unfortunately I can't provide the datasets. Most of them are about 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB memory for each.

It was working for smaller input sizes. Any idea on what I can do differently is appreciated.

Thans,
Timur 
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Ufuk Celebi
Hey Timur,

is it possible to connect to the VMs and get stack traces of the Flink
processes as well?

We can first have a look at the logs, but the stack traces will be
helpful if we can't figure out what the issue is.

– Ufuk

On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]> wrote:

> Could you share the logs with us, Timur? That would be very helpful.
>
> Cheers,
> Till
>
> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]> wrote:
>>
>> Hello,
>>
>> Now I'm at the stage where my job seem to completely hang. Source code is
>> attached (it won't compile but I think gives a very good idea of what
>> happens). Unfortunately I can't provide the datasets. Most of them are about
>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB memory
>> for each.
>>
>> It was working for smaller input sizes. Any idea on what I can do
>> differently is appreciated.
>>
>> Thans,
>> Timur
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Timur Fayruzov

I will do it my tomorrow. Logs don't show anything unusual. Are there any logs besides what's in flink/log and yarn container logs?

On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
Hey Timur,

is it possible to connect to the VMs and get stack traces of the Flink
processes as well?

We can first have a look at the logs, but the stack traces will be
helpful if we can't figure out what the issue is.

– Ufuk

On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]> wrote:
> Could you share the logs with us, Timur? That would be very helpful.
>
> Cheers,
> Till
>
> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]> wrote:
>>
>> Hello,
>>
>> Now I'm at the stage where my job seem to completely hang. Source code is
>> attached (it won't compile but I think gives a very good idea of what
>> happens). Unfortunately I can't provide the datasets. Most of them are about
>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB memory
>> for each.
>>
>> It was working for smaller input sizes. Any idea on what I can do
>> differently is appreciated.
>>
>> Thans,
>> Timur
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Ufuk Celebi
No.

If you run on YARN, the YARN logs are the relevant ones for the
JobManager and TaskManager. The client log submitting the job should
be found in /log.

– Ufuk

On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
<[hidden email]> wrote:

> I will do it my tomorrow. Logs don't show anything unusual. Are there any
> logs besides what's in flink/log and yarn container logs?
>
> On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>
> Hey Timur,
>
> is it possible to connect to the VMs and get stack traces of the Flink
> processes as well?
>
> We can first have a look at the logs, but the stack traces will be
> helpful if we can't figure out what the issue is.
>
> – Ufuk
>
> On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]> wrote:
>> Could you share the logs with us, Timur? That would be very helpful.
>>
>> Cheers,
>> Till
>>
>> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> Now I'm at the stage where my job seem to completely hang. Source code is
>>> attached (it won't compile but I think gives a very good idea of what
>>> happens). Unfortunately I can't provide the datasets. Most of them are
>>> about
>>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> memory
>>> for each.
>>>
>>> It was working for smaller input sizes. Any idea on what I can do
>>> differently is appreciated.
>>>
>>> Thans,
>>> Timur
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

rmetzger0
Hi Timur,

thank you for sharing the source code of your job. That is helpful!
Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much more IO heavy with the larger input data because all the joins start spilling?
Our monitoring, in particular for batch jobs is really not very advanced.. If we had some monitoring showing the spill status, we would maybe see that the job is still running.

How long did you wait until you declared the job hanging?

Regards,
Robert


On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <[hidden email]> wrote:
No.

If you run on YARN, the YARN logs are the relevant ones for the
JobManager and TaskManager. The client log submitting the job should
be found in /log.

– Ufuk

On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
<[hidden email]> wrote:
> I will do it my tomorrow. Logs don't show anything unusual. Are there any
> logs besides what's in flink/log and yarn container logs?
>
> On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>
> Hey Timur,
>
> is it possible to connect to the VMs and get stack traces of the Flink
> processes as well?
>
> We can first have a look at the logs, but the stack traces will be
> helpful if we can't figure out what the issue is.
>
> – Ufuk
>
> On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]> wrote:
>> Could you share the logs with us, Timur? That would be very helpful.
>>
>> Cheers,
>> Till
>>
>> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> Now I'm at the stage where my job seem to completely hang. Source code is
>>> attached (it won't compile but I think gives a very good idea of what
>>> happens). Unfortunately I can't provide the datasets. Most of them are
>>> about
>>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> memory
>>> for each.
>>>
>>> It was working for smaller input sizes. Any idea on what I can do
>>> differently is appreciated.
>>>
>>> Thans,
>>> Timur

Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Timur Fayruzov

Hello Robert,

I observed progress for 2 hours(meaning numbers change on dashboard), and then I waited for 2 hours more. I'm sure it had to spill at some point, but I figured 2h is enough time.

Thanks,
Timur

On Apr 26, 2016 1:35 AM, "Robert Metzger" <[hidden email]> wrote:
Hi Timur,

thank you for sharing the source code of your job. That is helpful!
Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much more IO heavy with the larger input data because all the joins start spilling?
Our monitoring, in particular for batch jobs is really not very advanced.. If we had some monitoring showing the spill status, we would maybe see that the job is still running.

How long did you wait until you declared the job hanging?

Regards,
Robert


On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <[hidden email]> wrote:
No.

If you run on YARN, the YARN logs are the relevant ones for the
JobManager and TaskManager. The client log submitting the job should
be found in /log.

– Ufuk

On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
<[hidden email]> wrote:
> I will do it my tomorrow. Logs don't show anything unusual. Are there any
> logs besides what's in flink/log and yarn container logs?
>
> On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>
> Hey Timur,
>
> is it possible to connect to the VMs and get stack traces of the Flink
> processes as well?
>
> We can first have a look at the logs, but the stack traces will be
> helpful if we can't figure out what the issue is.
>
> – Ufuk
>
> On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]> wrote:
>> Could you share the logs with us, Timur? That would be very helpful.
>>
>> Cheers,
>> Till
>>
>> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>> wrote:
>>>
>>> Hello,
>>>
>>> Now I'm at the stage where my job seem to completely hang. Source code is
>>> attached (it won't compile but I think gives a very good idea of what
>>> happens). Unfortunately I can't provide the datasets. Most of them are
>>> about
>>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> memory
>>> for each.
>>>
>>> It was working for smaller input sizes. Any idea on what I can do
>>> differently is appreciated.
>>>
>>> Thans,
>>> Timur

Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Ufuk Celebi
Can you please further provide the execution plan via

env.getExecutionPlan()



On Tue, Apr 26, 2016 at 4:23 PM, Timur Fayruzov
<[hidden email]> wrote:

> Hello Robert,
>
> I observed progress for 2 hours(meaning numbers change on dashboard), and
> then I waited for 2 hours more. I'm sure it had to spill at some point, but
> I figured 2h is enough time.
>
> Thanks,
> Timur
>
> On Apr 26, 2016 1:35 AM, "Robert Metzger" <[hidden email]> wrote:
>>
>> Hi Timur,
>>
>> thank you for sharing the source code of your job. That is helpful!
>> Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much
>> more IO heavy with the larger input data because all the joins start
>> spilling?
>> Our monitoring, in particular for batch jobs is really not very advanced..
>> If we had some monitoring showing the spill status, we would maybe see that
>> the job is still running.
>>
>> How long did you wait until you declared the job hanging?
>>
>> Regards,
>> Robert
>>
>>
>> On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <[hidden email]> wrote:
>>>
>>> No.
>>>
>>> If you run on YARN, the YARN logs are the relevant ones for the
>>> JobManager and TaskManager. The client log submitting the job should
>>> be found in /log.
>>>
>>> – Ufuk
>>>
>>> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
>>> <[hidden email]> wrote:
>>> > I will do it my tomorrow. Logs don't show anything unusual. Are there
>>> > any
>>> > logs besides what's in flink/log and yarn container logs?
>>> >
>>> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>>> >
>>> > Hey Timur,
>>> >
>>> > is it possible to connect to the VMs and get stack traces of the Flink
>>> > processes as well?
>>> >
>>> > We can first have a look at the logs, but the stack traces will be
>>> > helpful if we can't figure out what the issue is.
>>> >
>>> > – Ufuk
>>> >
>>> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]>
>>> > wrote:
>>> >> Could you share the logs with us, Timur? That would be very helpful.
>>> >>
>>> >> Cheers,
>>> >> Till
>>> >>
>>> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Now I'm at the stage where my job seem to completely hang. Source
>>> >>> code is
>>> >>> attached (it won't compile but I think gives a very good idea of what
>>> >>> happens). Unfortunately I can't provide the datasets. Most of them
>>> >>> are
>>> >>> about
>>> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> >>> memory
>>> >>> for each.
>>> >>>
>>> >>> It was working for smaller input sizes. Any idea on what I can do
>>> >>> differently is appreciated.
>>> >>>
>>> >>> Thans,
>>> >>> Timur
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Timur Fayruzov
Robert, Ufuk, logs, execution plan and a screenshot of the console are in the archive: https://www.dropbox.com/s/68gyl6f3rdzn7o1/debug-stuck.tar.gz?dl=0

Note that when I looked in the backpressure view I saw back pressure 'high' on following paths:

Input->code_line:123,124->map->join
Input->code_line:134,135->map->join
Input->code_line:121->map->join

Unfortunately, I was not able to take thread dumps nor heap dumps (neither kill -3, jstack nor jmap worked, some Amazon AMI problem I assume).

Hope that helps.

Please, let me know if I can assist you in any way. Otherwise, I probably would not be actively looking at this problem.

Thanks,
Timur


On Tue, Apr 26, 2016 at 8:11 AM, Ufuk Celebi <[hidden email]> wrote:
Can you please further provide the execution plan via

env.getExecutionPlan()



On Tue, Apr 26, 2016 at 4:23 PM, Timur Fayruzov
<[hidden email]> wrote:
> Hello Robert,
>
> I observed progress for 2 hours(meaning numbers change on dashboard), and
> then I waited for 2 hours more. I'm sure it had to spill at some point, but
> I figured 2h is enough time.
>
> Thanks,
> Timur
>
> On Apr 26, 2016 1:35 AM, "Robert Metzger" <[hidden email]> wrote:
>>
>> Hi Timur,
>>
>> thank you for sharing the source code of your job. That is helpful!
>> Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much
>> more IO heavy with the larger input data because all the joins start
>> spilling?
>> Our monitoring, in particular for batch jobs is really not very advanced..
>> If we had some monitoring showing the spill status, we would maybe see that
>> the job is still running.
>>
>> How long did you wait until you declared the job hanging?
>>
>> Regards,
>> Robert
>>
>>
>> On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <[hidden email]> wrote:
>>>
>>> No.
>>>
>>> If you run on YARN, the YARN logs are the relevant ones for the
>>> JobManager and TaskManager. The client log submitting the job should
>>> be found in /log.
>>>
>>> – Ufuk
>>>
>>> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
>>> <[hidden email]> wrote:
>>> > I will do it my tomorrow. Logs don't show anything unusual. Are there
>>> > any
>>> > logs besides what's in flink/log and yarn container logs?
>>> >
>>> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>>> >
>>> > Hey Timur,
>>> >
>>> > is it possible to connect to the VMs and get stack traces of the Flink
>>> > processes as well?
>>> >
>>> > We can first have a look at the logs, but the stack traces will be
>>> > helpful if we can't figure out what the issue is.
>>> >
>>> > – Ufuk
>>> >
>>> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]>
>>> > wrote:
>>> >> Could you share the logs with us, Timur? That would be very helpful.
>>> >>
>>> >> Cheers,
>>> >> Till
>>> >>
>>> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Now I'm at the stage where my job seem to completely hang. Source
>>> >>> code is
>>> >>> attached (it won't compile but I think gives a very good idea of what
>>> >>> happens). Unfortunately I can't provide the datasets. Most of them
>>> >>> are
>>> >>> about
>>> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> >>> memory
>>> >>> for each.
>>> >>>
>>> >>> It was working for smaller input sizes. Any idea on what I can do
>>> >>> differently is appreciated.
>>> >>>
>>> >>> Thans,
>>> >>> Timur
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Vasiliki Kalavri
Hi Timur,

I've previously seen large batch jobs hang because of join deadlocks. We should have fixed those problems, but we might have missed some corner case. Did you check whether there was any cpu activity when the job hangs? Can you try running htop on the taskmanager machines and see if they're idle?

Cheers,
-Vasia.

On 27 April 2016 at 02:48, Timur Fayruzov <[hidden email]> wrote:
Robert, Ufuk, logs, execution plan and a screenshot of the console are in the archive: https://www.dropbox.com/s/68gyl6f3rdzn7o1/debug-stuck.tar.gz?dl=0

Note that when I looked in the backpressure view I saw back pressure 'high' on following paths:

Input->code_line:123,124->map->join
Input->code_line:134,135->map->join
Input->code_line:121->map->join

Unfortunately, I was not able to take thread dumps nor heap dumps (neither kill -3, jstack nor jmap worked, some Amazon AMI problem I assume).

Hope that helps.

Please, let me know if I can assist you in any way. Otherwise, I probably would not be actively looking at this problem.

Thanks,
Timur


On Tue, Apr 26, 2016 at 8:11 AM, Ufuk Celebi <[hidden email]> wrote:
Can you please further provide the execution plan via

env.getExecutionPlan()



On Tue, Apr 26, 2016 at 4:23 PM, Timur Fayruzov
<[hidden email]> wrote:
> Hello Robert,
>
> I observed progress for 2 hours(meaning numbers change on dashboard), and
> then I waited for 2 hours more. I'm sure it had to spill at some point, but
> I figured 2h is enough time.
>
> Thanks,
> Timur
>
> On Apr 26, 2016 1:35 AM, "Robert Metzger" <[hidden email]> wrote:
>>
>> Hi Timur,
>>
>> thank you for sharing the source code of your job. That is helpful!
>> Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much
>> more IO heavy with the larger input data because all the joins start
>> spilling?
>> Our monitoring, in particular for batch jobs is really not very advanced..
>> If we had some monitoring showing the spill status, we would maybe see that
>> the job is still running.
>>
>> How long did you wait until you declared the job hanging?
>>
>> Regards,
>> Robert
>>
>>
>> On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <[hidden email]> wrote:
>>>
>>> No.
>>>
>>> If you run on YARN, the YARN logs are the relevant ones for the
>>> JobManager and TaskManager. The client log submitting the job should
>>> be found in /log.
>>>
>>> – Ufuk
>>>
>>> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
>>> <[hidden email]> wrote:
>>> > I will do it my tomorrow. Logs don't show anything unusual. Are there
>>> > any
>>> > logs besides what's in flink/log and yarn container logs?
>>> >
>>> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>>> >
>>> > Hey Timur,
>>> >
>>> > is it possible to connect to the VMs and get stack traces of the Flink
>>> > processes as well?
>>> >
>>> > We can first have a look at the logs, but the stack traces will be
>>> > helpful if we can't figure out what the issue is.
>>> >
>>> > – Ufuk
>>> >
>>> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]>
>>> > wrote:
>>> >> Could you share the logs with us, Timur? That would be very helpful.
>>> >>
>>> >> Cheers,
>>> >> Till
>>> >>
>>> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Now I'm at the stage where my job seem to completely hang. Source
>>> >>> code is
>>> >>> attached (it won't compile but I think gives a very good idea of what
>>> >>> happens). Unfortunately I can't provide the datasets. Most of them
>>> >>> are
>>> >>> about
>>> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> >>> memory
>>> >>> for each.
>>> >>>
>>> >>> It was working for smaller input sizes. Any idea on what I can do
>>> >>> differently is appreciated.
>>> >>>
>>> >>> Thans,
>>> >>> Timur
>>
>>
>


Reply | Threaded
Open this post in threaded view
|

Re: Job hangs

Fabian Hueske-2
Hi Timur,

I had a look at the plan you shared.
I could not find any flow that branches and merges again, a pattern which is prone to cause a deadlocks.

However, I noticed that the plan performs a lot of partitioning steps.
You might want to have a look at forwarded field annotations which can help to reduce the partitioning and sorting steps [1].
This might help with complex jobs such as yours.

Best, Fabian

2016-04-27 10:57 GMT+02:00 Vasiliki Kalavri <[hidden email]>:
Hi Timur,

I've previously seen large batch jobs hang because of join deadlocks. We should have fixed those problems, but we might have missed some corner case. Did you check whether there was any cpu activity when the job hangs? Can you try running htop on the taskmanager machines and see if they're idle?

Cheers,
-Vasia.

On 27 April 2016 at 02:48, Timur Fayruzov <[hidden email]> wrote:
Robert, Ufuk, logs, execution plan and a screenshot of the console are in the archive: https://www.dropbox.com/s/68gyl6f3rdzn7o1/debug-stuck.tar.gz?dl=0

Note that when I looked in the backpressure view I saw back pressure 'high' on following paths:

Input->code_line:123,124->map->join
Input->code_line:134,135->map->join
Input->code_line:121->map->join

Unfortunately, I was not able to take thread dumps nor heap dumps (neither kill -3, jstack nor jmap worked, some Amazon AMI problem I assume).

Hope that helps.

Please, let me know if I can assist you in any way. Otherwise, I probably would not be actively looking at this problem.

Thanks,
Timur


On Tue, Apr 26, 2016 at 8:11 AM, Ufuk Celebi <[hidden email]> wrote:
Can you please further provide the execution plan via

env.getExecutionPlan()



On Tue, Apr 26, 2016 at 4:23 PM, Timur Fayruzov
<[hidden email]> wrote:
> Hello Robert,
>
> I observed progress for 2 hours(meaning numbers change on dashboard), and
> then I waited for 2 hours more. I'm sure it had to spill at some point, but
> I figured 2h is enough time.
>
> Thanks,
> Timur
>
> On Apr 26, 2016 1:35 AM, "Robert Metzger" <[hidden email]> wrote:
>>
>> Hi Timur,
>>
>> thank you for sharing the source code of your job. That is helpful!
>> Its a large pipeline with 7 joins and 2 co-groups. Maybe your job is much
>> more IO heavy with the larger input data because all the joins start
>> spilling?
>> Our monitoring, in particular for batch jobs is really not very advanced..
>> If we had some monitoring showing the spill status, we would maybe see that
>> the job is still running.
>>
>> How long did you wait until you declared the job hanging?
>>
>> Regards,
>> Robert
>>
>>
>> On Tue, Apr 26, 2016 at 10:11 AM, Ufuk Celebi <[hidden email]> wrote:
>>>
>>> No.
>>>
>>> If you run on YARN, the YARN logs are the relevant ones for the
>>> JobManager and TaskManager. The client log submitting the job should
>>> be found in /log.
>>>
>>> – Ufuk
>>>
>>> On Tue, Apr 26, 2016 at 10:06 AM, Timur Fayruzov
>>> <[hidden email]> wrote:
>>> > I will do it my tomorrow. Logs don't show anything unusual. Are there
>>> > any
>>> > logs besides what's in flink/log and yarn container logs?
>>> >
>>> > On Apr 26, 2016 1:03 AM, "Ufuk Celebi" <[hidden email]> wrote:
>>> >
>>> > Hey Timur,
>>> >
>>> > is it possible to connect to the VMs and get stack traces of the Flink
>>> > processes as well?
>>> >
>>> > We can first have a look at the logs, but the stack traces will be
>>> > helpful if we can't figure out what the issue is.
>>> >
>>> > – Ufuk
>>> >
>>> > On Tue, Apr 26, 2016 at 9:42 AM, Till Rohrmann <[hidden email]>
>>> > wrote:
>>> >> Could you share the logs with us, Timur? That would be very helpful.
>>> >>
>>> >> Cheers,
>>> >> Till
>>> >>
>>> >> On Apr 26, 2016 3:24 AM, "Timur Fayruzov" <[hidden email]>
>>> >> wrote:
>>> >>>
>>> >>> Hello,
>>> >>>
>>> >>> Now I'm at the stage where my job seem to completely hang. Source
>>> >>> code is
>>> >>> attached (it won't compile but I think gives a very good idea of what
>>> >>> happens). Unfortunately I can't provide the datasets. Most of them
>>> >>> are
>>> >>> about
>>> >>> 100-500MM records, I try to process on EMR cluster with 40 tasks 6GB
>>> >>> memory
>>> >>> for each.
>>> >>>
>>> >>> It was working for smaller input sizes. Any idea on what I can do
>>> >>> differently is appreciated.
>>> >>>
>>> >>> Thans,
>>> >>> Timur
>>
>>
>