|
I am trying to run flink on kubernetes, and trying to push checkpoints to
Google Cloud Storage. Below is the docker file `FROM flink:1.6.2-hadoop28-scala_2.11-alpine RUN wget -O lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar RUN wget -O lib/gcs-connector-latest-hadoop2.jar https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar && \ wget http://ftp.fau.de/apache/flink/flink-1.6.2/flink-1.6.2-bin-hadoop28-scala_2.11.tgz && \ tar xf flink-1.6.2-bin-hadoop28-scala_2.11.tgz && \ mv flink-1.6.2/lib/flink-shaded-hadoop2* lib/ && \ rm -r flink-1.6.2*` But the checkpoints are taking around 2-3 seconds on average and around 25 seconds at max, even the state size is around 100 KB. Even the jobs are getting restarted with the error `AsynchronousException{java.lang.Exception: Could not materialize checkpoint 1640 for operator groupBy` and sometimes losing connections with task managers. Currently, I have given the heap size of 4096 MB. -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
|
Please provide the full Exception stack trace and the configuration of
your job (parallelism, number of stateful operators). Have you tried using the gcs-connector in isolation? This may not be an issue with Flink. On 28.11.2018 07:01, prakhar_mathur wrote: > I am trying to run flink on kubernetes, and trying to push checkpoints to > Google Cloud Storage. Below is the docker file > > `FROM flink:1.6.2-hadoop28-scala_2.11-alpine > > RUN wget -O lib/gcs-connector-latest-hadoop2.jar > https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar > > RUN wget -O lib/gcs-connector-latest-hadoop2.jar > https://storage.googleapis.com/hadoop-lib/gcs/gcs-connector-latest-hadoop2.jar > && \ > wget > http://ftp.fau.de/apache/flink/flink-1.6.2/flink-1.6.2-bin-hadoop28-scala_2.11.tgz > && \ > tar xf flink-1.6.2-bin-hadoop28-scala_2.11.tgz && \ > mv flink-1.6.2/lib/flink-shaded-hadoop2* lib/ && \ > rm -r flink-1.6.2*` > > But the checkpoints are taking around 2-3 seconds on average and around 25 > seconds at max, even the state size is around 100 KB. > > Even the jobs are getting restarted with the error > `AsynchronousException{java.lang.Exception: Could not materialize checkpoint > 1640 for operator groupBy` and sometimes losing connections with task > managers. > > Currently, I have given the heap size of 4096 MB. > > > > -- > Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > |
| Free forum by Nabble | Edit this page |
