Understand Broadcast State in Node Failure Case

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Understand Broadcast State in Node Failure Case

Chengzhi Zhao
Hey folks,

We are trying to use the broadcast state as "Shared Rule" state to filter test data in our stream pipeline, the broadcast will be connected with other streams in the pipeline.
I noticed on broadcast_state[1] important consideration page, it is mentioned No RocksDB state backend and state would be kept in in-memory at runtime. 

I am trying to figure out how it works, for example, 
1. If a node goes down, will broadcast state lost the entire state for that node and then sync from other nodes?
2. In case of the entire job fail or savepoint been triggered, how broadcast state get its state back or additional bootstrapping logic needs to be added ourselves?

Thanks for your help! 

Best regards,
Chengzhi

Reply | Threaded
Open this post in threaded view
|

Re: Understand Broadcast State in Node Failure Case

Fabian Hueske-2
Hi Chengzhi,

Broadcast State is checkpointed like any other state and will be restored in all failure cases (including the ones you mentioned).
We added the warning to inform users that Broadcast state will also be stored in the JVM memory, even if the RocksDB StateBackend was configured (which stores state on disk).
This warning is only about the size of the state, not about the consistency guarantees.

Best, Fabian

Am Mo., 22. Okt. 2018 um 19:26 Uhr schrieb Chengzhi Zhao <[hidden email]>:
Hey folks,

We are trying to use the broadcast state as "Shared Rule" state to filter test data in our stream pipeline, the broadcast will be connected with other streams in the pipeline.
I noticed on broadcast_state[1] important consideration page, it is mentioned No RocksDB state backend and state would be kept in in-memory at runtime. 

I am trying to figure out how it works, for example, 
1. If a node goes down, will broadcast state lost the entire state for that node and then sync from other nodes?
2. In case of the entire job fail or savepoint been triggered, how broadcast state get its state back or additional bootstrapping logic needs to be added ourselves?

Thanks for your help! 

Best regards,
Chengzhi

Reply | Threaded
Open this post in threaded view
|

Re: Understand Broadcast State in Node Failure Case

Chengzhi Zhao
Thanks Fabian for the clarification! 

Best regards,
Chengzhi



On Mon, Oct 22, 2018 at 5:19 PM Fabian Hueske <[hidden email]> wrote:
Hi Chengzhi,

Broadcast State is checkpointed like any other state and will be restored in all failure cases (including the ones you mentioned).
We added the warning to inform users that Broadcast state will also be stored in the JVM memory, even if the RocksDB StateBackend was configured (which stores state on disk).
This warning is only about the size of the state, not about the consistency guarantees.

Best, Fabian

Am Mo., 22. Okt. 2018 um 19:26 Uhr schrieb Chengzhi Zhao <[hidden email]>:
Hey folks,

We are trying to use the broadcast state as "Shared Rule" state to filter test data in our stream pipeline, the broadcast will be connected with other streams in the pipeline.
I noticed on broadcast_state[1] important consideration page, it is mentioned No RocksDB state backend and state would be kept in in-memory at runtime. 

I am trying to figure out how it works, for example, 
1. If a node goes down, will broadcast state lost the entire state for that node and then sync from other nodes?
2. In case of the entire job fail or savepoint been triggered, how broadcast state get its state back or additional bootstrapping logic needs to be added ourselves?

Thanks for your help! 

Best regards,
Chengzhi