Flink on kubernetes

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink on kubernetes

祁明良
Hi All,

We are running flink(version 1.5.2) on k8s with rocksdb backend.
Each time when the job is cancelled and restarted, we face OOMKilled problem from the container.
In our case, we only assign 15% of container memory to JVM and leave others to rocksdb.
To us, it looks like memory used by rocksdb is not released after job cancelling. Anyone can gives some suggestions?
Currently our tmp fix is to restart the TM pod for each job cancelling, but it has to be manually.

Regards,
Mingliang

本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Flink on kubernetes

Lasse Nedergaard
Hi.

We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos.
If you correlate the none heap memory together with job restart you will see none heap increases for every restart until you get an OOM.

I let you know if/when I know how to handle the problem.

Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 3. sep. 2018 kl. 10.08 skrev 祁明良 <[hidden email]>:
>
> Hi All,
>
> We are running flink(version 1.5.2) on k8s with rocksdb backend.
> Each time when the job is cancelled and restarted, we face OOMKilled problem from the container.
> In our case, we only assign 15% of container memory to JVM and leave others to rocksdb.
> To us, it looks like memory used by rocksdb is not released after job cancelling. Anyone can gives some suggestions?
> Currently our tmp fix is to restart the TM pod for each job cancelling, but it has to be manually.
>
> Regards,
> Mingliang
>
> 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Flink on kubernetes

祁明良
Hi Lasse,

Is there JIRA ticket I can follow?

Best,
Mingliang

> On 3 Sep 2018, at 5:42 PM, Lasse Nedergaard <[hidden email]> wrote:
>
> Hi.
>
> We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos.
> If you correlate the none heap memory together with job restart you will see none heap increases for every restart until you get an OOM.
>
> I let you know if/when I know how to handle the problem.
>
> Med venlig hilsen / Best regards
> Lasse Nedergaard
>
>
>> Den 3. sep. 2018 kl. 10.08 skrev 祁明良 <[hidden email]>:
>>
>> Hi All,
>>
>> We are running flink(version 1.5.2) on k8s with rocksdb backend.
>> Each time when the job is cancelled and restarted, we face OOMKilled problem from the container.
>> In our case, we only assign 15% of container memory to JVM and leave others to rocksdb.
>> To us, it looks like memory used by rocksdb is not released after job cancelling. Anyone can gives some suggestions?
>> Currently our tmp fix is to restart the TM pod for each job cancelling, but it has to be manually.
>>
>> Regards,
>> Mingliang
>>
>> 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
>> This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.


本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Flink on kubernetes

Lasse Nedergaard
Please try to use fsstatebackend as a test to see if the problems disappear.

Med venlig hilsen / Best regards
Lasse Nedergaard


> Den 3. sep. 2018 kl. 11.46 skrev 祁明良 <[hidden email]>:
>
> Hi Lasse,
>
> Is there JIRA ticket I can follow?
>
> Best,
> Mingliang
>
>> On 3 Sep 2018, at 5:42 PM, Lasse Nedergaard <[hidden email]> wrote:
>>
>> Hi.
>>
>> We have documented the same on Flink 1.4.2/1.6 running on Yarn and Mesos.
>> If you correlate the none heap memory together with job restart you will see none heap increases for every restart until you get an OOM.
>>
>> I let you know if/when I know how to handle the problem.
>>
>> Med venlig hilsen / Best regards
>> Lasse Nedergaard
>>
>>
>>> Den 3. sep. 2018 kl. 10.08 skrev 祁明良 <[hidden email]>:
>>>
>>> Hi All,
>>>
>>> We are running flink(version 1.5.2) on k8s with rocksdb backend.
>>> Each time when the job is cancelled and restarted, we face OOMKilled problem from the container.
>>> In our case, we only assign 15% of container memory to JVM and leave others to rocksdb.
>>> To us, it looks like memory used by rocksdb is not released after job cancelling. Anyone can gives some suggestions?
>>> Currently our tmp fix is to restart the TM pod for each job cancelling, but it has to be manually.
>>>
>>> Regards,
>>> Mingliang
>>>
>>> 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
>>> This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.
>
>
> 本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
> This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.