Just curious, has anybody had success with Amazon EMR with RocksDB and checkpointing in S3?
That's the configuration I am trying to setup, but my system is running more slowly than expected. |
Hi, Yes, it's working. You would need to analyse what's working slower than expected. Checkpointing times? (Async duration? Sync duration? Start delay/back pressure?) Throughput? Recovery/startup? Are you being rate limited by Amazon? Piotrek czw., 28 sty 2021 o 03:46 Marco Villalobos <[hidden email]> napisał(a):
|
Thank you. I need to look into whether I am being rate-limited by amazon. I assumed that a rate limiting error would have bubbled up as an error in the logs. I will find a way to assure that error is logged or captured somehow. How would backpressure come into play during checkpointing? I would expect Amazon to have enough resources. When I turn my sink (the next operator) into a print, it fails during checkpointing as well. I will explore what you mentioned though. Thank you. On Mon, Feb 1, 2021 at 6:53 AM Piotr Nowojski <[hidden email]> wrote:
|
Hi Marco, > Is this assumption correct? Yes. More or else each operator is first creating a copy of its state locally and uploading to S3 this whole file at once. Please first take a look which part of checkpointing is taking so long. Re backpressure. Keep in mind that Checkpoint Barriers need to travel through the job graph. If your job is very heavily back pressured with low record throughput, checkpoints might be timeouting because Checkpoint Barriers do not manage to propagate through the job graph quickly enough. For example, take a look at my response earlier today. [1] Best, Piotrek pon., 1 lut 2021 o 17:16 Marco Villalobos <[hidden email]> napisał(a):
|
Free forum by Nabble | Edit this page |