FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

陈Darling

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?





" alt="PastedGraphic-1.png" apple-inline="yes" class="Apple-web-attachment Apple-edge-to-edge-visual-media" width="605" height="267" src="cid:[hidden email]">






Darling 
Andrew D.Lin




D24EB2D4-FA6B-41BD-A6B5-265B7E0C259E.png (232K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

Congxian Qiu
Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?





PastedGraphic-1.png






Darling 
Andrew D.Lin



Reply | Threaded
Open this post in threaded view
|

Re: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

Yun Tang
Hi Andrew

Have you ever checked the answer I provided for you in Chinese user mail list [1]?
  • FilesCreated is mainly due to create checkpoint, while FileDeleted is mainly due to subsume old checkpoint. Please check checkpoint interval of your jobs to avoid too frequent checkpoints.
  • Check the size of your state handles in HDFS to increease state.backend.fs.memory-threshold to a reasonable value.
  • FLINK-11696 would take effect only if you come across scenario when your job start or failover.


Best
Yun Tang



From: Congxian Qiu <[hidden email]>
Sent: Tuesday, July 23, 2019 9:48
To: 陈Darling <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
 
Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?





PastedGraphic-1.png






Darling 
Andrew D.Lin



Reply | Threaded
Open this post in threaded view
|

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

陈Darling
In reply to this post by Congxian Qiu
Hi 

In my understanding,CreateFile and FileCreated api is different,FileCreated is more like a check api, but I don’t find where it was called in the src source. I don’t understand when  FileCreated Api was called and for what。

Is FileCreated api a hdfs internal confirmation api?



FLINK-11696  is to reduce CreateFile api  by reducing mkdir.  Will FileCreated Api be reduced?

Is there anything wrong with me here?


Darling 
Andrew D.Lin



下面是被转发的邮件:

发件人: Congxian Qiu <[hidden email]>
主题: 回复: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
日期: 2019年7月23日 GMT+8 上午9:48:05
收件人: 陈Darling <[hidden email]>
抄送: user <[hidden email]>

Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?












Darling 
Andrew D.Lin

D24EB2D4-FA6B-41BD-A6B5-265B7E0C259E.png Download Attachment
Reply | Threaded
Open this post in threaded view
|

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

陈Darling
In reply to this post by Congxian Qiu
Hi Yun Tang

Your suggestion is very very important to us.
 According to your suggestion, We have suggested that users increase the interval time (1 to 5 minutes) and set state.backend.fs.memory-threshold=10k.
 
But we only have one hdfs cluster, we try to reduce Hdfs api call, I don't know if there is any possibility of re-optimization,

Thank you very much for your patience and help. 


Darling 
Andrew D.Lin



下面是被转发的邮件:

发件人: Congxian Qiu <[hidden email]>
主题: 回复: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
日期: 2019年7月23日 GMT+8 上午9:48:05
收件人: 陈Darling <[hidden email]>
抄送: user <[hidden email]>

Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?












Darling 
Andrew D.Lin


D24EB2D4-FA6B-41BD-A6B5-265B7E0C259E.png Download Attachment
Reply | Threaded
Open this post in threaded view
|

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

陈Darling
In reply to this post by Congxian Qiu
Hi Yun Tang

Your suggestion is very very important to us.
 According to your suggestion, We have suggested that users increase the interval time (1 to 5 minutes) and set state.backend.fs.memory-threshold=10k.
 
But we only have one hdfs cluster, we try to reduce Hdfs api call, I don't know if there is any possibility of re-optimization,

Thank you very much for your patience and help. 


Darling 
Andrew D.Lin



下面是被转发的邮件:

发件人: Congxian Qiu <[hidden email]>
主题: 回复: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
日期: 2019年7月23日 GMT+8 上午9:48:05
收件人: 陈Darling <[hidden email]>
抄送: user <[hidden email]>

Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?












Darling 
Andrew D.Lin


D24EB2D4-FA6B-41BD-A6B5-265B7E0C259E.png Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

Yun Tang
Hi Andrew

FilesCreated = CreateFileOps + FsDirMkdirOp Please refer to [1] and [2] to know the meaning of this metrics.



Best
Yun Tang



From: 陈Darling <[hidden email]>
Sent: Tuesday, July 23, 2019 11:32
To: [hidden email] <[hidden email]>
Cc: [hidden email] <[hidden email]>; [hidden email] <[hidden email]>
Subject: Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
 
Hi Yun Tang

Your suggestion is very very important to us.
 According to your suggestion, We have suggested that users increase the interval time (1 to 5 minutes) and set state.backend.fs.memory-threshold=10k.
 
But we only have one hdfs cluster, we try to reduce Hdfs api call, I don't know if there is any possibility of re-optimization,

Thank you very much for your patience and help. 


Darling 
Andrew D.Lin



下面是被转发的邮件:

发件人: Congxian Qiu <[hidden email]>
主题: 回复: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
日期: 2019年7月23日 GMT+8 上午9:48:05
收件人: 陈Darling <[hidden email]>
抄送: user <[hidden email]>

Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?











Darling 
Andrew D.Lin

Reply | Threaded
Open this post in threaded view
|

Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?

陈Darling

Yes,that’s the point , FilesCreated = CreateFileOps + FsDirMkdirOp

 All I can say is  --thanks

Darling 
Andrew D.Lin



下面是被转发的邮件:

发件人: Yun Tang <[hidden email]>
主题: 回复: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
日期: 2019年7月23日 GMT+8 下午4:05:42
收件人: 陈Darling <[hidden email]>, "[hidden email]" <[hidden email]>

Hi Andrew

FilesCreated = CreateFileOps + FsDirMkdirOp Please refer to [1] and [2] to know the meaning of this metrics.



Best
Yun Tang



From: 陈Darling <[hidden email]>
Sent: Tuesday, July 23, 2019 11:32
To: [hidden email] <[hidden email]>
Cc: [hidden email] <[hidden email]>; [hidden email] <[hidden email]>
Subject: Fwd: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
 
Hi Yun Tang

Your suggestion is very very important to us.
 According to your suggestion, We have suggested that users increase the interval time (1 to 5 minutes) and set state.backend.fs.memory-threshold=10k.
 
But we only have one hdfs cluster, we try to reduce Hdfs api call, I don't know if there is any possibility of re-optimization,

Thank you very much for your patience and help. 


Darling 
Andrew D.Lin



下面是被转发的邮件:

发件人: Congxian Qiu <[hidden email]>
主题: 回复: FsStateBackend,hdfs rpc api too much,FileCreated and FileDeleted is for what?
日期: 2019年7月23日 GMT+8 上午9:48:05
收件人: 陈Darling <[hidden email]>
抄送: user <[hidden email]>

Hi Andrew

These API calls are for checkpoint file created/deleted, and there is an ongoing issue[1] which want to reduce the number.

陈Darling <[hidden email]> 于2019年7月22日周一 下午11:22写道:

Hi

We use  ‘FsStateBackend' as  our state beckend !


The following figure shows the frequency of the hdfs API call.

I don’t understand FilesCreated and FileDeleted is for what?   All of these are necessary? 

 Is it possible to reduce some unnecessary?












Darling 
Andrew D.Lin

D24EB2D4-FA6B-41BD-A6B5-265B7E0C259E.png Download Attachment