Apache Flink - Question about thread safety for stateful collections (MapState)

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Flink - Question about thread safety for stateful collections (MapState)

M Singh
Hi:

I am working on a project and need to save MapState in a process function and register a timer to check for updates.  I wanted to find out if it is safe to access and modify the state in the processElement function as well as the time onTimer methods.  

The example https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html use ValueState and does not use synchronization. Is it because processElement and onTimer are executed in the same thread and so are thread safe ?

Also, I could not find any thread safety documentation about MapState.

Thanks.

Mans 


Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about thread safety for stateful collections (MapState)

Kien Truong

Hi Mans,

They're not executed in the same thread, but the methods that called them are synchronized[1] and therefore thread-safe.

Best regards,

Kien

[1] https://github.com/apache/flink/blob/1cd3ba3f2af454bc33f2c880163c01dddd4d1738/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamInputProcessor.java#L204


On 11/12/2017 4:30 AM, M Singh wrote:
Hi:

I am working on a project and need to save MapState in a process function and register a timer to check for updates.  I wanted to find out if it is safe to access and modify the state in the processElement function as well as the time onTimer methods.  

The example https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html use ValueState and does not use synchronization. Is it because processElement and onTimer are executed in the same thread and so are thread safe ?

Also, I could not find any thread safety documentation about MapState.

Thanks.

Mans 


Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about thread safety for stateful collections (MapState)

Jörn Franke
Be careful though with racing conditions .

On 12. Nov 2017, at 02:47, Kien Truong <[hidden email]> wrote:

Hi Mans,

They're not executed in the same thread, but the methods that called them are synchronized[1] and therefore thread-safe.

Best regards,

Kien

[1] https://github.com/apache/flink/blob/1cd3ba3f2af454bc33f2c880163c01dddd4d1738/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/StreamInputProcessor.java#L204


On 11/12/2017 4:30 AM, M Singh wrote:
Hi:

I am working on a project and need to save MapState in a process function and register a timer to check for updates.  I wanted to find out if it is safe to access and modify the state in the processElement function as well as the time onTimer methods.  

The example https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html use ValueState and does not use synchronization. Is it because processElement and onTimer are executed in the same thread and so are thread safe ?

Also, I could not find any thread safety documentation about MapState.

Thanks.

Mans 


Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about thread safety for stateful collections (MapState)

M Singh
Thanks Kien/Jorn:  

I see the code for processElement being called with a lock but did not see that the timer based invocation is synchronized by the same lock.  Does that mean that I should use synchonization in my code and how will that impact the performance ?  Please let me know if you have any other recommendation.

Thanks again.


On Sunday, November 12, 2017 1:16 AM, Jörn Franke <[hidden email]> wrote:


Be careful though with racing conditions .

On 12. Nov 2017, at 02:47, Kien Truong <[hidden email]> wrote:

Hi Mans,
They're not executed in the same thread, but the methods that called them are synchronized[1] and therefore thread-safe.
Best regards,
Kien

On 11/12/2017 4:30 AM, M Singh wrote:
Hi:

I am working on a project and need to save MapState in a process function and register a timer to check for updates.  I wanted to find out if it is safe to access and modify the state in the processElement function as well as the time onTimer methods.  

The example https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html use ValueState and does not use synchronization. Is it because processElement and onTimer are executed in the same thread and so are thread safe ?

Also, I could not find any thread safety documentation about MapState.

Thanks.

Mans 




Reply | Threaded
Open this post in threaded view
|

Re: Apache Flink - Question about thread safety for stateful collections (MapState)

Kien Truong

Hello M Singh,

If you check the comment at the beginning of the file, it said

Forwarding elements, watermarks, or status status elements must be protected by synchronizing
 * on the given lock object. This ensures that we don't call methods on a
 * {@link OneInputStreamOperator} concurrently with the timer callback or other things.

That means Flink guarantees to never call the timer callback and the process element function concurrently.

You don't have to do any manual synchronization by yourself.

Best regards,

Kien



    
On 11/12/2017 10:03 PM, M Singh wrote:
Thanks Kien/Jorn:  

I see the code for processElement being called with a lock but did not see that the timer based invocation is synchronized by the same lock.  Does that mean that I should use synchonization in my code and how will that impact the performance ?  Please let me know if you have any other recommendation.

Thanks again.


On Sunday, November 12, 2017 1:16 AM, Jörn Franke [hidden email] wrote:


Be careful though with racing conditions .

On 12. Nov 2017, at 02:47, Kien Truong <[hidden email]> wrote:

Hi Mans,
They're not executed in the same thread, but the methods that called them are synchronized[1] and therefore thread-safe.
Best regards,
Kien

On 11/12/2017 4:30 AM, M Singh wrote:
Hi:

I am working on a project and need to save MapState in a process function and register a timer to check for updates.  I wanted to find out if it is safe to access and modify the state in the processElement function as well as the time onTimer methods.  

The example https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/process_function.html use ValueState and does not use synchronization. Is it because processElement and onTimer are executed in the same thread and so are thread safe ?

Also, I could not find any thread safety documentation about MapState.

Thanks.

Mans