Hi all,
My batch job has the follwoing plan in the end (figure attached): I have a total of 32 task slots, and I have set the parallelism of the last two operators before the sink to 31. The sink parallelism is 1. The last operator before the sink is a MapOperator, so it doesn't need to buffer all elements before emitting the output. Looking at the dashboard, I see that one task slot is available, but the sink subtask stays in the state "CREATED". Any explanation for this behaviour? Thank you. Best, Yassine |
Hi Yassine, you don't necessarily need to set the parallelism of the last two operators of 31, the sink with parallelism 1 will fit still into the slots. A task slot can, by default, hold an entire "slice" or parallel instance of a job. The reason why the sink stays in state CREATE in the beginning is because it didn't receive any data yet. In batch mode, an operator is switching to RUNNING once it received the first record. In your case, there are some operations (reduce, join) before the sink that first need to consume all data and process it. If you wait long enough, you should see the sink to become active. Regards, Robert On Wed, Nov 23, 2016 at 2:03 PM, Yassine MARZOUGUI <[hidden email]> wrote:
|
Hi Robert, I see, so the join needs to consume all data first and process it. Thanks a lot for the clarification! Best, 2016-11-23 17:35 GMT+01:00 Robert Metzger <[hidden email]>:
|
Free forum by Nabble | Edit this page |