Checkpointing Large State to S3

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Checkpointing Large State to S3

Gregory Fee
Hello Everyone!

I am running some streaming Flink jobs using SQL and the table api. I enabled incremental checkpointing to S3 via the RocksDBStateBackend. Even giving it an hour to checkpoint, the checkpoints all fail by timing out. Does anyone have an tips on how to configure the RocksDBStateBackend for best performance on S3? Or any tips with how to get checkpoints with large amounts of state to succeed?

Thanks!

--
<form method="post" target="_blank" onsubmit="try {return window.confirm(&quot;You are submitting information to an external page.\nAre you sure?&quot;);} catch (e) {return false;}">
Gregory Fee
Engineer
Lyft
Reply | Threaded
Open this post in threaded view
|

Re:Checkpointing Large State to S3

gerryzhou

Hi Gregory,

could you share the TaskManager's log with us? It would be helpful to diagnost the problem. And which version are you using?

Best, Sihua
On 06/7/2018 06:42[hidden email] wrote:
Hello Everyone!

I am running some streaming Flink jobs using SQL and the table api. I enabled incremental checkpointing to S3 via the RocksDBStateBackend. Even giving it an hour to checkpoint, the checkpoints all fail by timing out. Does anyone have an tips on how to configure the RocksDBStateBackend for best performance on S3? Or any tips with how to get checkpoints with large amounts of state to succeed?

Thanks!

--
Gregory Fee
Engineer
Lyft