http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/PartitionNotFoundException-after-deployment-tp19942p19953.html
> On 4 May 2018, at 14:52, Ufuk Celebi <
[hidden email]> wrote:
>
> Hey Gyula!
>
> I'm including Piotr and Nico (cc'd) who have worked on the network
> stack in the last releases.
>
> Registering the network structures including the intermediate results
> actually happens **before** any state is restored. I'm not sure why
> this reproducibly happens when you restore state. @Nico, Piotr: any
> ideas here?
>
> In general I think what happens here is the following:
> - a task requests the result of a local upstream producer, but that
> one has not registered its intermediate result yet
> - this should result in a retry of the request with some backoff
> (controlled via the config params you mention
> taskmanager.network.request-backoff.max,
> taskmanager.network.request-backoff.initial)
>
> As a first step I would set logging to DEBUG and check the TM logs for
> messages like "Retriggering partition request {}:{}."
>
> You can also check the SingleInputGate code which has the logic for
> retriggering requests.
>
> – Ufuk
>
>
> On Fri, May 4, 2018 at 10:27 AM, Gyula Fóra <
[hidden email]> wrote:
>> Hi Ufuk,
>>
>> Do you have any quick idea what could cause this problems in flink 1.4.2?
>> Seems like one operator takes too long to deploy and downstream tasks error
>> out on partition not found. This only seems to happen when the job is
>> restored from state and in fact that operator has some keyed and operator
>> state as well.
>>
>> Deploying the same job from empty state works well. We tried increasing the
>> taskmanager.network.request-backoff.max that didnt help.
>>
>> It would be great if you have some pointers where to look further, I havent
>> seen this happening before.
>>
>> Thank you!
>> Gyula
>>
>> The errror:
>> org.apache.flink.runtime.io.network.partition.: Partition
>> 4c5e9cd5dd410331103f51127996068a@b35ef4ffe25e3d17c5d6051ebe2860cd not found.
>> at
>> org.apache.flink.runtime.io.network.partition.ResultPartitionManager.createSubpartitionView(ResultPartitionManager.java:77)
>> at
>> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel.requestSubpartition(LocalInputChannel.java:115)
>> at
>> org.apache.flink.runtime.io.network.partition.consumer.LocalInputChannel$1.run(LocalInputChannel.java:159)
>> at java.util.TimerThread.mainLoop(Timer.java:555)
>> at java.util.TimerThread.run(Timer.java:505)
>
>
>
> --
> Data Artisans GmbH | Stresemannstr. 121a | 10963 Berlin
>
>
[hidden email]
> +49-30-43208879
>
> Registered at Amtsgericht Charlottenburg - HRB 158244 B
> Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen