回复: period batch job lead to OutOfMemoryError: Metaspace problem

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

回复: period batch job lead to OutOfMemoryError: Metaspace problem

太平洋
I have configured to 512M, but problem still exist. Now the memory size is still 256M.
Attachments are TM and JM logs. 

Look forward to your reply.

------------------ 原始邮件 ------------------
发件人: "Yangze Guo" <[hidden email]>;
发送时间: 2021年4月6日(星期二) 晚上6:35
收件人: "太平洋"<[hidden email]>;
抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

> I have tried this method, but the problem still exist.
How much memory do you configure for it?

> is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
Not quite sure about it. AFAIK, each job will have a classloader.
Multiple tasks of the same job in the same TM will share the same
classloader. The classloader will be removed if there is no more task
running on the TM. Classloader without reference will be finally
cleanup by GC. Could you share JM and TM logs for further analysis?
I'll also involve @Guowei Ma in this thread.


Best,
Yangze Guo

On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:

>
> I have tried this method, but the problem still exist.
> by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 下午4:32
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> I think you can try to increase the JVM metaspace option for
> TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
>
> Best,
> Yangze Guo
>
> Best,
> Yangze Guo
>
>
> On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> >
> > batch job:
> > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> >
> > env:
> > flink version:1.12.2
> > task-manager slot count: 5
> > deployment: standalone kubernetes session 模式
> > dependencies:
> >
> >     <dependency>
> >
> >       <groupId>org.apache.flink</groupId>
> >
> >       <artifactId>flink-connector-kafka_2.11</artifactId>
> >
> >       <version>${flink.version}</version>
> >
> >     </dependency>
> >
> >     <dependency>
> >
> >       <groupId>com.google.code.gson</groupId>
> >
> >       <artifactId>gson</artifactId>
> >
> >       <version>2.8.5</version>
> >
> >     </dependency>
> >
> >     <dependency>
> >
> >       <groupId>org.apache.flink</groupId>
> >
> >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> >
> >       <version>${flink.version}</version>
> >
> >     </dependency>
> >
> >     <dependency>
> >
> >       <groupId>ru.yandex.clickhouse</groupId>
> >
> >       <artifactId>clickhouse-jdbc</artifactId>
> >
> >       <version>0.3.0</version>
> >
> >     </dependency>
> >
> >     <dependency>
> >
> >       <groupId>org.apache.flink</groupId>
> >
> >         <artifactId>flink-parquet_2.11</artifactId>
> >
> >         <version>${flink.version}</version>
> >
> >     </dependency>
> >
> >     <dependency>
> >
> >          <groupId>org.apache.flink</groupId>
> >
> >          <artifactId>flink-json</artifactId>
> >
> >          <version>${flink.version}</version>
> >
> >     </dependency>
> >
> >
> > heap dump1:
> >
> > Leak Suspects
> >
> > System Overview
> >
> >  Leaks
> >
> >  Overview
> >
> >
> >   Problem Suspect 1
> >
> > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> >
> > Biggest instances:
> >
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> >
> >
> >
> > Keywords
> > org.apache.flink.util.ChildFirstClassLoader
> > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > Details »
> >
> >   Problem Suspect 2
> >
> > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> >
> > Keywords
> > org.apache.flink.core.memory.HybridMemorySegment
> > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> >
> > Details »
> >
> >
> >
> > heap dump2:
> >
> > Leak Suspects
> >
> > System Overview
> >
> >  Leaks
> >
> >  Overview
> >
> >   Problem Suspect 1
> >
> > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> >
> > Biggest instances:
> >
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> >
> >
> >
> > Keywords
> > org.apache.flink.util.ChildFirstClassLoader
> > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > Details »
> >
> >   Problem Suspect 2
> >
> > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> >
> > Biggest instances:
> >
> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> >
> >
> >
> > Keywords
> > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > Details »
> >
> >

tm.log (2M) Download Attachment
jm.log (7M) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: period batch job lead to OutOfMemoryError: Metaspace problem

Yangze Guo
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:

>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

Re: period batch job lead to OutOfMemoryError: Metaspace problem

Arvid Heise-4
Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

回复: period batch job lead to OutOfMemoryError: Metaspace problem

太平洋
My application program looks like this. Does this structure has some problem?

public class StreamingJob {
public static void main(String[] args) throws Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);

bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
Table t = bsTableEnv.sqlQuery(query);

DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);

DataStream<StatisPoint> weightPoints = points.map();

DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
.reduce().map();

// side output
final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
};

SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
.process();

DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);

                                        //write data to clickhouse
String insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));

// write data to kafka
FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); 
exStream.map().addSink(producer);

env.execute("Prediction Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}



------------------ 原始邮件 ------------------
发件人: "Arvid Heise" <[hidden email]>;
发送时间: 2021年4月8日(星期四) 下午2:33
收件人: "Yangze Guo"<[hidden email]>;
抄送: "太平洋"<[hidden email]>;"user"<[hidden email]>;"guowei.mgw"<[hidden email]>;"renqschn"<[hidden email]>;
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

Re: period batch job lead to OutOfMemoryError: Metaspace problem

Yangze Guo
IIUC, your program will finally generate 100 ChildFirstClassLoader in
a TM. But it should always be GC when job finished. So, as Arvid said,
you'd better check who is referencing those ChildFirstClassLoader.


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 5:43 PM 太平洋 <[hidden email]> wrote:

>
> My application program looks like this. Does this structure has some problem?
>
> public class StreamingJob {
> public static void main(String[] args) throws Exception {
> int i = 0;
> while (i < 100) {
> try {
> StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
> env.setRuntimeMode(RuntimeExecutionMode.BATCH);
> env.setParallelism(Parallelism);
>
> EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
> .inStreamingMode().build();
> StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);
>
> bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
> Table t = bsTableEnv.sqlQuery(query);
>
> DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);
>
> DataStream<StatisPoint> weightPoints = points.map();
>
> DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
> .reduce().map();
>
> // side output
> final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
> };
>
> SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
> .process();
>
> DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);
>
>                                         //write data to clickhouse
> String insertIntoCKSql = "xxx";
> mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
> new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
> new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
> .withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));
>
> // write data to kafka
> FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>();
> exStream.map().addSink(producer);
>
> env.execute("Prediction Program");
> } catch (Exception e) {
> e.printStackTrace();
> }
> i++;
> Thread.sleep(window * 1000);
> }
> }
> }
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Arvid Heise" <[hidden email]>;
> 发送时间: 2021年4月8日(星期四) 下午2:33
> 收件人: "Yangze Guo"<[hidden email]>;
> 抄送: "太平洋"<[hidden email]>;"user"<[hidden email]>;"guowei.mgw"<[hidden email]>;"renqschn"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> Hi,
>
> ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.
>
> Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.
>
> OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.
>
> On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
>>
>> I went through the JM & TM logs but could not find any valuable clue.
>> The exception is actually thrown by kafka-producer-network-thread.
>> Maybe @Qingsheng could also take a look?
>>
>>
>> Best,
>> Yangze Guo
>>
>> On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>> >
>> > I have configured to 512M, but problem still exist. Now the memory size is still 256M.
>> > Attachments are TM and JM logs.
>> >
>> > Look forward to your reply.
>> >
>> > ------------------ 原始邮件 ------------------
>> > 发件人: "Yangze Guo" <[hidden email]>;
>> > 发送时间: 2021年4月6日(星期二) 晚上6:35
>> > 收件人: "太平洋"<[hidden email]>;
>> > 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
>> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>> >
>> > > I have tried this method, but the problem still exist.
>> > How much memory do you configure for it?
>> >
>> > > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
>> > Not quite sure about it. AFAIK, each job will have a classloader.
>> > Multiple tasks of the same job in the same TM will share the same
>> > classloader. The classloader will be removed if there is no more task
>> > running on the TM. Classloader without reference will be finally
>> > cleanup by GC. Could you share JM and TM logs for further analysis?
>> > I'll also involve @Guowei Ma in this thread.
>> >
>> >
>> > Best,
>> > Yangze Guo
>> >
>> > On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
>> > >
>> > > I have tried this method, but the problem still exist.
>> > > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "Yangze Guo" <[hidden email]>;
>> > > 发送时间: 2021年4月6日(星期二) 下午4:32
>> > > 收件人: "太平洋"<[hidden email]>;
>> > > 抄送: "user"<[hidden email]>;
>> > > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>> > >
>> > > I think you can try to increase the JVM metaspace option for
>> > > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
>> > >
>> > > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > > Best,
>> > > Yangze Guo
>> > >
>> > >
>> > > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
>> > > >
>> > > > batch job:
>> > > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
>> > > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
>> > > >
>> > > > env:
>> > > > flink version:1.12.2
>> > > > task-manager slot count: 5
>> > > > deployment: standalone kubernetes session 模式
>> > > > dependencies:
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>org.apache.flink</groupId>
>> > > >
>> > > >       <artifactId>flink-connector-kafka_2.11</artifactId>
>> > > >
>> > > >       <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>com.google.code.gson</groupId>
>> > > >
>> > > >       <artifactId>gson</artifactId>
>> > > >
>> > > >       <version>2.8.5</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>org.apache.flink</groupId>
>> > > >
>> > > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
>> > > >
>> > > >       <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>ru.yandex.clickhouse</groupId>
>> > > >
>> > > >       <artifactId>clickhouse-jdbc</artifactId>
>> > > >
>> > > >       <version>0.3.0</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >       <groupId>org.apache.flink</groupId>
>> > > >
>> > > >         <artifactId>flink-parquet_2.11</artifactId>
>> > > >
>> > > >         <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >     <dependency>
>> > > >
>> > > >          <groupId>org.apache.flink</groupId>
>> > > >
>> > > >          <artifactId>flink-json</artifactId>
>> > > >
>> > > >          <version>${flink.version}</version>
>> > > >
>> > > >     </dependency>
>> > > >
>> > > >
>> > > > heap dump1:
>> > > >
>> > > > Leak Suspects
>> > > >
>> > > > System Overview
>> > > >
>> > > >  Leaks
>> > > >
>> > > >  Overview
>> > > >
>> > > >
>> > > >   Problem Suspect 1
>> > > >
>> > > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
>> > > >
>> > > > Biggest instances:
>> > > >
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
>> > > >
>> > > >
>> > > >
>> > > > Keywords
>> > > > org.apache.flink.util.ChildFirstClassLoader
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > > Details »
>> > > >
>> > > >   Problem Suspect 2
>> > > >
>> > > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
>> > > >
>> > > > Keywords
>> > > > org.apache.flink.core.memory.HybridMemorySegment
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > >
>> > > > Details »
>> > > >
>> > > >
>> > > >
>> > > > heap dump2:
>> > > >
>> > > > Leak Suspects
>> > > >
>> > > > System Overview
>> > > >
>> > > >  Leaks
>> > > >
>> > > >  Overview
>> > > >
>> > > >   Problem Suspect 1
>> > > >
>> > > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
>> > > >
>> > > > Biggest instances:
>> > > >
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
>> > > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
>> > > >
>> > > >
>> > > >
>> > > > Keywords
>> > > > org.apache.flink.util.ChildFirstClassLoader
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > > Details »
>> > > >
>> > > >   Problem Suspect 2
>> > > >
>> > > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
>> > > >
>> > > > Biggest instances:
>> > > >
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
>> > > >
>> > > >
>> > > >
>> > > > Keywords
>> > > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
>> > > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
>> > > > Details »
>> > > >
>> > > >
Reply | Threaded
Open this post in threaded view
|

Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Maciek Próchniak
In reply to this post by 太平洋

Hi,

Did you put the clickhouse JDBC driver on Flink main classpath (in lib folder) and not in user-jar - as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code?

When we encountered Metaspace leaks recently, in quite a few cases it turned out that the problem was the JDBC driver in user classloder which was registered by DriverManager and caused classloader leak.


maciek


On 08.04.2021 11:42, 太平洋 wrote:
My application program looks like this. Does this structure has some problem?

public class StreamingJob {
public static void main(String[] args) throws Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);

bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
Table t = bsTableEnv.sqlQuery(query);

DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);

DataStream<StatisPoint> weightPoints = points.map();

DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
.reduce().map();

// side output
final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
};

SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
.process();

DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);

                                        //write data to clickhouse
String insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));

// write data to kafka
FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); 
exStream.map().addSink(producer);

env.execute("Prediction Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}



------------------ 原始邮件 ------------------
发件人: "Arvid Heise" [hidden email];
发送时间: 2021年4月8日(星期四) 下午2:33
收件人: "Yangze Guo"[hidden email];
抄送: "太平洋"[hidden email];"user"[hidden email];"guowei.mgw"[hidden email];"renqschn"[hidden email];
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

回复: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

太平洋
I have tried  to add 'classloader.parent-first-patterns.additional: "ru.yandex.clickhouse" ' to flink-config, but problem still exist.
Is there lightweight way to put clickhouse JDBC driver on Flink lib/ folder?
 

------------------ 原始邮件 ------------------
发件人: "Maciek Próchniak" <[hidden email]>;
发送时间: 2021年4月9日(星期五) 凌晨3:24
收件人: "太平洋"<[hidden email]>;"Arvid Heise"<[hidden email]>;"Yangze Guo"<[hidden email]>;
抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;"renqschn"<[hidden email]>;
主题: Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

Did you put the clickhouse JDBC driver on Flink main classpath (in lib folder) and not in user-jar - as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code?

When we encountered Metaspace leaks recently, in quite a few cases it turned out that the problem was the JDBC driver in user classloder which was registered by DriverManager and caused classloader leak.


maciek


On 08.04.2021 11:42, 太平洋 wrote:
My application program looks like this. Does this structure has some problem?

public class StreamingJob {
public static void main(String[] args) throws Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);

bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
Table t = bsTableEnv.sqlQuery(query);

DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);

DataStream<StatisPoint> weightPoints = points.map();

DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
.reduce().map();

// side output
final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
};

SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
.process();

DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);

                                        //write data to clickhouse
String insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));

// write data to kafka
FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); 
exStream.map().addSink(producer);

env.execute("Prediction Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}



------------------ 原始邮件 ------------------
发件人: "Arvid Heise" [hidden email];
发送时间: 2021年4月8日(星期四) 下午2:33
收件人: "Yangze Guo"[hidden email];
抄送: "太平洋"[hidden email];"user"[hidden email];"guowei.mgw"[hidden email];"renqschn"[hidden email];
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Arvid Heise-4
Hi,

What do you mean by light-weight way? Just to clarify: you copy the jar once in the lib folder and restart the cluster once (and put it into the lib/ for future clusters). Not sure how it would be more light-weight.

You can still bundle it into your jar if you prefer it. It just tends to be big but if it's easier for you to not touch the cluster, then just put everything into your jar.

On Fri, Apr 9, 2021 at 4:08 AM 太平洋 <[hidden email]> wrote:
I have tried  to add 'classloader.parent-first-patterns.additional: "ru.yandex.clickhouse" ' to flink-config, but problem still exist.
Is there lightweight way to put clickhouse JDBC driver on Flink lib/ folder?
 

------------------ 原始邮件 ------------------
发件人: "Maciek Próchniak" <[hidden email]>;
发送时间: 2021年4月9日(星期五) 凌晨3:24
收件人: "太平洋"<[hidden email]>;"Arvid Heise"<[hidden email]>;"Yangze Guo"<[hidden email]>;
抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;"renqschn"<[hidden email]>;
主题: Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

Did you put the clickhouse JDBC driver on Flink main classpath (in lib folder) and not in user-jar - as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code?

When we encountered Metaspace leaks recently, in quite a few cases it turned out that the problem was the JDBC driver in user classloder which was registered by DriverManager and caused classloader leak.


maciek


On 08.04.2021 11:42, 太平洋 wrote:
My application program looks like this. Does this structure has some problem?

public class StreamingJob {
public static void main(String[] args) throws Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);

bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
Table t = bsTableEnv.sqlQuery(query);

DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);

DataStream<StatisPoint> weightPoints = points.map();

DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
.reduce().map();

// side output
final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
};

SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
.process();

DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);

                                        //write data to clickhouse
String insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));

// write data to kafka
FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); 
exStream.map().addSink(producer);

env.execute("Prediction Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}



------------------ 原始邮件 ------------------
发件人: "Arvid Heise" [hidden email];
发送时间: 2021年4月8日(星期四) 下午2:33
收件人: "Yangze Guo"[hidden email];
抄送: "太平洋"[hidden email];"user"[hidden email];"guowei.mgw"[hidden email];"renqschn"[hidden email];
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Maciek Próchniak

Hi Arvid,

"You can still bundle it into your jar if you prefer it." - is it really the case with JDBC drivers? I think that if the driver is not on Flink main classpath (that is, in the lib folder) there is no way the class would be loaded by main classloader - regardless of parent/child classloader setting?

Those settings will help if the driver is both on Flink classpath and in user jar - I noticed now the documentation is slightly misleading suggesting otherwise, isn't it?


thanks,

maciek


On 09.04.2021 11:25, Arvid Heise wrote:
Hi,

What do you mean by light-weight way? Just to clarify: you copy the jar once in the lib folder and restart the cluster once (and put it into the lib/ for future clusters). Not sure how it would be more light-weight.

You can still bundle it into your jar if you prefer it. It just tends to be big but if it's easier for you to not touch the cluster, then just put everything into your jar.

On Fri, Apr 9, 2021 at 4:08 AM 太平洋 <[hidden email]> wrote:
I have tried  to add 'classloader.parent-first-patterns.additional: "ru.yandex.clickhouse" ' to flink-config, but problem still exist.
Is there lightweight way to put clickhouse JDBC driver on Flink lib/ folder?
 

------------------ 原始邮件 ------------------
发件人: "Maciek Próchniak" <[hidden email]>;
发送时间: 2021年4月9日(星期五) 凌晨3:24
收件人: "太平洋"<[hidden email]>;"Arvid Heise"<[hidden email]>;"Yangze Guo"<[hidden email]>;
抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;"renqschn"<[hidden email]>;
主题: Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

Did you put the clickhouse JDBC driver on Flink main classpath (in lib folder) and not in user-jar - as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code?

When we encountered Metaspace leaks recently, in quite a few cases it turned out that the problem was the JDBC driver in user classloder which was registered by DriverManager and caused classloader leak.


maciek


On 08.04.2021 11:42, 太平洋 wrote:
My application program looks like this. Does this structure has some problem?

public class StreamingJob {
public static void main(String[] args) throws Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);

bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
Table t = bsTableEnv.sqlQuery(query);

DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);

DataStream<StatisPoint> weightPoints = points.map();

DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
.reduce().map();

// side output
final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
};

SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
.process();

DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);

                                        //write data to clickhouse
String insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));

// write data to kafka
FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); 
exStream.map().addSink(producer);

env.execute("Prediction Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}



------------------ 原始邮件 ------------------
发件人: "Arvid Heise" [hidden email];
发送时间: 2021年4月8日(星期四) 下午2:33
收件人: "Yangze Guo"[hidden email];
抄送: "太平洋"[hidden email];"user"[hidden email];"guowei.mgw"[hidden email];"renqschn"[hidden email];
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >
Reply | Threaded
Open this post in threaded view
|

Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Arvid Heise-4
Afaik the main issue is that the JDBC drivers are leaking as they usually assume only one classloader. If you are aware of it, you can bundle it in your jar. However, you are right - it doesn't help with OP, so it was probably not a good idea.

On Fri, Apr 9, 2021 at 11:45 AM Maciek Próchniak <[hidden email]> wrote:

Hi Arvid,

"You can still bundle it into your jar if you prefer it." - is it really the case with JDBC drivers? I think that if the driver is not on Flink main classpath (that is, in the lib folder) there is no way the class would be loaded by main classloader - regardless of parent/child classloader setting?

Those settings will help if the driver is both on Flink classpath and in user jar - I noticed now the documentation is slightly misleading suggesting otherwise, isn't it?


thanks,

maciek


On 09.04.2021 11:25, Arvid Heise wrote:
Hi,

What do you mean by light-weight way? Just to clarify: you copy the jar once in the lib folder and restart the cluster once (and put it into the lib/ for future clusters). Not sure how it would be more light-weight.

You can still bundle it into your jar if you prefer it. It just tends to be big but if it's easier for you to not touch the cluster, then just put everything into your jar.

On Fri, Apr 9, 2021 at 4:08 AM 太平洋 <[hidden email]> wrote:
I have tried  to add 'classloader.parent-first-patterns.additional: "ru.yandex.clickhouse" ' to flink-config, but problem still exist.
Is there lightweight way to put clickhouse JDBC driver on Flink lib/ folder?
 

------------------ 原始邮件 ------------------
发件人: "Maciek Próchniak" <[hidden email]>;
发送时间: 2021年4月9日(星期五) 凌晨3:24
收件人: "太平洋"<[hidden email]>;"Arvid Heise"<[hidden email]>;"Yangze Guo"<[hidden email]>;
抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;"renqschn"<[hidden email]>;
主题: Re: 回复: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

Did you put the clickhouse JDBC driver on Flink main classpath (in lib folder) and not in user-jar - as described here: https://ci.apache.org/projects/flink/flink-docs-release-1.12/ops/debugging/debugging_classloading.html#unloading-of-dynamically-loaded-classes-in-user-code?

When we encountered Metaspace leaks recently, in quite a few cases it turned out that the problem was the JDBC driver in user classloder which was registered by DriverManager and caused classloader leak.


maciek


On 08.04.2021 11:42, 太平洋 wrote:
My application program looks like this. Does this structure has some problem?

public class StreamingJob {
public static void main(String[] args) throws Exception {
int i = 0;
while (i < 100) {
try {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);
env.setParallelism(Parallelism);

EnvironmentSettings bsSettings = EnvironmentSettings.newInstance().useBlinkPlanner()
.inStreamingMode().build();
StreamTableEnvironment bsTableEnv = StreamTableEnvironment.create(env, bsSettings);

bsTableEnv.executeSql("CREATE TEMPORARY TABLE xxxx");
Table t = bsTableEnv.sqlQuery(query);

DataStream<DataPoint> points = bsTableEnv.toAppendStream(t, DataPoint.class);

DataStream<StatisPoint> weightPoints = points.map();

DataStream<PredictPoint> predictPoints = weightPoints.keyBy()
.reduce().map();

// side output
final OutputTag<PredictPoint> outPutPredict = new OutputTag<PredictPoint>("predict") {
};

SingleOutputStreamOperator<PredictPoint> mainDataStream = predictPoints
.process();

DataStream<PredictPoint> exStream = mainDataStream.getSideOutput(outPutPredict);

                                        //write data to clickhouse
String insertIntoCKSql = "xxx";
mainDataStream.addSink(JdbcSink.sink(insertIntoCKSql, new CkSinkBuilder(),
new JdbcExecutionOptions.Builder().withBatchSize(CkBatchSize).build(),
new JdbcConnectionOptions.JdbcConnectionOptionsBuilder().withDriverName(CkDriverName)
.withUrl(CkUrl).withUsername(CkUser).withPassword(CkPassword).build()));

// write data to kafka
FlinkKafkaProducer<String> producer = new FlinkKafkaProducer<>(); 
exStream.map().addSink(producer);

env.execute("Prediction Program");
} catch (Exception e) {
e.printStackTrace();
}
i++;
Thread.sleep(window * 1000);
}
}
}



------------------ 原始邮件 ------------------
发件人: "Arvid Heise" [hidden email];
发送时间: 2021年4月8日(星期四) 下午2:33
收件人: "Yangze Guo"[hidden email];
抄送: "太平洋"[hidden email];"user"[hidden email];"guowei.mgw"[hidden email];"renqschn"[hidden email];
主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem

Hi,

ChildFirstClassLoader are created (more or less) by application jar and seeing so many looks like a classloader leak to me. I'd expect you to see a new ChildFirstClassLoader popping up with each new job submission.

Can you check who is referencing the ChildFirstClassLoader transitively? Usually, it's some thread that is lingering around because some third party library is leaking threads etc.

OneInputStreamTask is legit and just indicates that you have a job running with 4 slots on that TM. It should not hold any dedicated metaspace memory.

On Thu, Apr 8, 2021 at 4:52 AM Yangze Guo <[hidden email]> wrote:
I went through the JM & TM logs but could not find any valuable clue.
The exception is actually thrown by kafka-producer-network-thread.
Maybe @Qingsheng could also take a look?


Best,
Yangze Guo

On Thu, Apr 8, 2021 at 10:39 AM 太平洋 <[hidden email]> wrote:
>
> I have configured to 512M, but problem still exist. Now the memory size is still 256M.
> Attachments are TM and JM logs.
>
> Look forward to your reply.
>
> ------------------ 原始邮件 ------------------
> 发件人: "Yangze Guo" <[hidden email]>;
> 发送时间: 2021年4月6日(星期二) 晚上6:35
> 收件人: "太平洋"<[hidden email]>;
> 抄送: "user"<[hidden email]>;"guowei.mgw"<[hidden email]>;
> 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
>
> > I have tried this method, but the problem still exist.
> How much memory do you configure for it?
>
> > is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal
> Not quite sure about it. AFAIK, each job will have a classloader.
> Multiple tasks of the same job in the same TM will share the same
> classloader. The classloader will be removed if there is no more task
> running on the TM. Classloader without reference will be finally
> cleanup by GC. Could you share JM and TM logs for further analysis?
> I'll also involve @Guowei Ma in this thread.
>
>
> Best,
> Yangze Guo
>
> On Tue, Apr 6, 2021 at 6:05 PM 太平洋 <[hidden email]> wrote:
> >
> > I have tried this method, but the problem still exist.
> > by heap dump analysis, is 21 instances of "org.apache.flink.util.ChildFirstClassLoader" normal?
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Yangze Guo" <[hidden email]>;
> > 发送时间: 2021年4月6日(星期二) 下午4:32
> > 收件人: "太平洋"<[hidden email]>;
> > 抄送: "user"<[hidden email]>;
> > 主题: Re: period batch job lead to OutOfMemoryError: Metaspace problem
> >
> > I think you can try to increase the JVM metaspace option for
> > TaskManagers through taskmanager.memory.jvm-metaspace.size. [1]
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/docs/deployment/memory/mem_trouble/#outofmemoryerror-metaspace
> >
> > Best,
> > Yangze Guo
> >
> > Best,
> > Yangze Guo
> >
> >
> > On Tue, Apr 6, 2021 at 4:22 PM 太平洋 <[hidden email]> wrote:
> > >
> > > batch job:
> > > read data from s3 by sql,then by some operators and write data to clickhouse and kafka.
> > > after some times, task-manager quit with OutOfMemoryError: Metaspace.
> > >
> > > env:
> > > flink version:1.12.2
> > > task-manager slot count: 5
> > > deployment: standalone kubernetes session 模式
> > > dependencies:
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-kafka_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>com.google.code.gson</groupId>
> > >
> > >       <artifactId>gson</artifactId>
> > >
> > >       <version>2.8.5</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >       <artifactId>flink-connector-jdbc_2.11</artifactId>
> > >
> > >       <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>ru.yandex.clickhouse</groupId>
> > >
> > >       <artifactId>clickhouse-jdbc</artifactId>
> > >
> > >       <version>0.3.0</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >       <groupId>org.apache.flink</groupId>
> > >
> > >         <artifactId>flink-parquet_2.11</artifactId>
> > >
> > >         <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >     <dependency>
> > >
> > >          <groupId>org.apache.flink</groupId>
> > >
> > >          <artifactId>flink-json</artifactId>
> > >
> > >          <version>${flink.version}</version>
> > >
> > >     </dependency>
> > >
> > >
> > > heap dump1:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 29,656,880 (41.16%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca2a1e8 - 1,474,760 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2af820 - 1,474,168 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdcaa10 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6aab0 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d1111d8 - 1,474,160 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2bb108 - 1,474,128 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73de202e0 - 1,474,120 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dadc778 - 1,474,112 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d5f70e8 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d93aa38 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e179638 - 1,474,064 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dc80418 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73dfcda60 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e4bcd38 - 1,474,056 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d6006e8 - 1,474,032 (2.05%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7d2ad8 - 1,461,944 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca1bb98 - 1,460,752 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf203f0 - 1,460,744 (2.03%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e3284a8 - 1,445,232 (2.01%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e65de00 - 1,445,232 (2.01%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 34,407 instances of "org.apache.flink.core.memory.HybridMemorySegment", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 7,707,168 (10.70%) bytes.
> > >
> > > Keywords
> > > org.apache.flink.core.memory.HybridMemorySegment
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > >
> > > Details »
> > >
> > >
> > >
> > > heap dump2:
> > >
> > > Leak Suspects
> > >
> > > System Overview
> > >
> > >  Leaks
> > >
> > >  Overview
> > >
> > >   Problem Suspect 1
> > >
> > > 21 instances of "org.apache.flink.util.ChildFirstClassLoader", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 26,061,408 (30.68%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e9e9930 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73edce0b8 - 1,474,224 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f1ad7d0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f3e5118 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73f5d3fe0 - 1,474,168 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ebd8d28 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73efc00c0 - 1,474,160 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73e2251a8 - 1,474,136 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cc24af0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cdca3e0 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73cf6f860 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d114768 - 1,474,064 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73ca6f878 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2b7640 - 1,474,056 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73d2c1d80 - 1,474,040 (1.74%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73c7e2868 - 1,469,720 (1.73%) bytes.
> > > org.apache.flink.util.ChildFirstClassLoader @ 0x73bf34a98 - 1,460,808 (1.72%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > org.apache.flink.util.ChildFirstClassLoader
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > Details »
> > >
> > >   Problem Suspect 2
> > >
> > > 4 instances of "org.apache.flink.streaming.runtime.tasks.OneInputStreamTask", loaded by "sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0" occupy 11,644,200 (13.71%) bytes.
> > >
> > > Biggest instances:
> > >
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73e2d0cb0 - 4,364,536 (5.14%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73d62fb88 - 3,643,576 (4.29%) bytes.
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask @ 0x73dae0270 - 3,635,952 (4.28%) bytes.
> > >
> > >
> > >
> > > Keywords
> > > sun.misc.Launcher$AppClassLoader @ 0x73b2d42e0
> > > org.apache.flink.streaming.runtime.tasks.OneInputStreamTask
> > > Details »
> > >
> > >