Hi, all
I use Flink DataSet API to do some batch job, read some log then group and sort them. Our cluster has almost 2000 servers, we get used to use traditional MR job, then I tried Flink to do some experiment job, but I counter this error and can not continue, does anyone can help with it? Our MR jobs also counter such connection error sometimes, but it will retry serval times then get success. It seems that the whole calculation process failed when one single task failed in Flink. java.io.IOException: Cannot get library with hash 858478de9791c1a5fbbb138c02ec18 Best regards
Sili Liu |
Hi! The Blob server runs on the JobManager and is used to distribute JAR files. The best way to handle this scale is the following: Option (1) Use the 1.2-SNAPSHOT version to run Flink on YARN, it will add the JAR files to the Job's YARN resources - so no BLOBs need to be fetched. Option (2) Manually add your JAR files to the lib folder If you cannot do that, you can try and configure the BLOB server to handle more connections. Especially increase the backlog and number of connections. See here for the config options:
In the longer run, we are thinking to create a version of the BLOB server that distributed files via a DFS (for example HDFS). Greetings, Stephan On Mon, Nov 7, 2016 at 10:02 AM, Si-li Liu <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |