http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/heap-dump-shows-StoppableSourceStreamTask-retained-by-java-lang-finalizer-tp15535p15617.html
instantiation, memory allocation and so on. It may costs much time.
> Hi Yuta,
>
> when the execute() method is called, the a so-called JobGraph is
> constructed from all operators that have been added before by calling
> map(), keyBy() and so on.
> The JobGraph is then submitted to the JobManager which is the master
> process in Flink. Based on the JobGraph, the master deploys tasks to the
> worker processes (TaskManagers).
> These are the tasks that do the actual processing and they are
> subsequently started as I explained before, i.e., the source task starts
> consuming from Kafka before subsequent tasks have been started.
>
> So, there is quite a lot happening when you call execute() including
> network communication and task deployment.
>
> Hope this helps,
> Fabian
>
> 2017-09-15 4:25 GMT+02:00 Yuta Morisawa <
[hidden email]
> <mailto:
[hidden email]>>:
>
> Hi, Fabian
>
> > If I understand you correctly, the problem is only for the first events
> > that are processed.
> Yes. More Precisely, first 300 kafka-messages.
>
> > AFAIK, Flink lazily instantiates its operators which means that a source
> > task starts to consume records from Kafka before the subsequent tasks
> > have been started.
> That's a great indication. It describe well the affair.
> But, according to the document, it says "The operations are actually
> executed when the execution is explicitly triggered by an execute()
> call on the execution environment.".
> What does it mean?
> AFAIK, common Flink programs invoke execute() in main().
> Every operators start at this time? I think maybe no.
>
> - Flink Document
>
>
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/api_concepts.html#lazy-evaluation> <
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/api_concepts.html#lazy-evaluation>
>
>
> > Not sure if or what can be done about this behavior.
> > I'll loop in Till who knows more about the lifecycle of tasks.
> Thank you very much for your kindness.
>
> Regards, Yuta
>
> On 2017/09/14 19:32, Fabian Hueske wrote:
>
> Hi,
>
> If I understand you correctly, the problem is only for the first
> events that are processed.
>
> AFAIK, Flink lazily instantiates its operators which means that
> a source task starts to consume records from Kafka before the
> subsequent tasks have been started.
> That's why the latency of the first records is higher.
>
> Not sure if or what can be done about this behavior.
> I'll loop in Till who knows more about the lifecycle of tasks.
>
> Best, Fabian
>
>
> 2017-09-12 11:02 GMT+02:00 Yuta Morisawa
> <
[hidden email]
> <mailto:
[hidden email]>
> <mailto:
[hidden email]
> <mailto:
[hidden email]>>>:
>
> Hi,
>
> I am worrying about the delay of the Streaming API.
> My application is that it gets data from kafka-connectors and
> process them, then push data to kafka-producers.
> The problem is that the app suffers a long delay when the
> first data
> come in the cluster.
> It takes about 1000ms to process data (I measure the time with
> kafka-timestamp). On the other hand, it works well after
> 2-3 seconds
> first data come in (the delay is about 200ms).
>
> The application is so delay sensitive that I want to solve
> this problem.
> Now, I think this is a matter of JVM but I have no idea to
> investigate it.
> Is there any way to avoid this delay?
>
>
>
> Thank you for your attention
> Yuta
>
>
>