Hi everyone
We have a Flink Job to write files to HDFS's different directories. It will open many files due to its high parallelism. I also found that if using rocksdb state backend, it will have even more files open during the checkpointing. We use yarn to schedule Flink job. However yarn always schedule taskmanagers to the same machine and I cannot control it! So the datanode will get very very high pressure and always throw a "bad link" error. We hava already increase the xiceviers limit of HDFS to 16384 Any idea to solve this problem? reduce the number of opening file or control the yarn scheduling to put taskmanager on different machines!
Thank you very much! regards Shengnan
|
Hi If there are indeed so many files need to upload to hdfs, then currently we do not have any solutions to limit the open files, there exist an issue[1] wants to fix this problem, and a pr for it, maybe you can try the attached pr to try it can solve your problem. ysnakie <[hidden email]> 于2020年4月24日周五 下午11:30写道:
|
Hi Yes, for your use case, if you do not have large state size, you can try to use FsStateBackend. Best, Congxian ysnakie <[hidden email]> 于2020年4月27日周一 下午3:42写道:
|
With the FsStateBackend you could also try increasing the value of state.backend.fs.memory-threshold [1]. Only those state chunks that are larger than this value are stored in separate files; smaller chunks go into the checkpoint metadata file. The default is 1KB, increasing this should reduce filesystem stress for small state. On Wed, May 6, 2020 at 12:36 PM Congxian Qiu <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |