Window state with rocksdb backend

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Window state with rocksdb backend

祁明良

Hi all,


This is mingliang, I got a problem with rocksdb backend.


I'm currently using a 15min SessionWindow which also fires every 10s, there's no pre-aggregation, so the input of WindowFunction would be the whole Iterator of input object.

For window operator, I assume this collection is also a state that maintained by Flink.

Then, in each 10s fire, the window function will take the objects out from iterator and do some update, and in next fire, I assume I would get the updated value of that object.

With File system backend it was successful but eats a lot of memory and finally I got GC overhead limit, then I switch to rocksdb backend and the problem is the object in the next fire round is not updated by the previous fire round.

Do I have to do some additional staff with rocksdb backend in this case?


Thanks in advance

Mingliang



本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.

Reply | Threaded
Open this post in threaded view
|

Re: Window state with rocksdb backend

Aljoscha Krettek
Hi,

the objects you get in the WindowFunction are not supposed to be mutated. Any changes to them are not guaranteed to be synced back to the state backend.

Why are you modifying in the objects? Maybe there's another way of achieving what you want to do.

Best,
Aljoscha

On 9. Aug 2018, at 10:36, 祁明良 <[hidden email]> wrote:

Hi all,

This is mingliang, I got a problem with rocksdb backend.

I'm currently using a 15min SessionWindow which also fires every 10s, there's no pre-aggregation, so the input of WindowFunction would be the whole Iterator of input object.
For window operator, I assume this collection is also a state that maintained by Flink.
Then, in each 10s fire, the window function will take the objects out from iterator and do some update, and in next fire, I assume I would get the updated value of that object.
With File system backend it was successful but eats a lot of memory and finally I got GC overhead limit, then I switch to rocksdb backend and the problem is the object in the next fire round is not updated by the previous fire round.
Do I have to do some additional staff with rocksdb backend in this case?

Thanks in advance
Mingliang


本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! 
This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.

Reply | Threaded
Open this post in threaded view
|

Re: Window state with rocksdb backend

Stefan Richter
In reply to this post by 祁明良
Hi,

it is not quiet clear to me what your window function is doing, so sharing some (pseudo) code would be helpful. Is it ever calling a update-function for the state you are trying to modify? From the information I have it seems not the be the case and that is a wrong use of the API which required you to call the update method. This wrong use somewhat seems to work for heap-based backends, because you are manipulating the objects directly (for efficiency reasons, otherwise we always had to make deep defensive copies), but this will not work for RocksDB because you always just work on a de-serialized copy of the ground-truth, and that is why updates are explicit.

Best,
Stefan

Am 09.08.2018 um 10:36 schrieb 祁明良 <[hidden email]>:

Hi all,

This is mingliang, I got a problem with rocksdb backend.

I'm currently using a 15min SessionWindow which also fires every 10s, there's no pre-aggregation, so the input of WindowFunction would be the whole Iterator of input object.
For window operator, I assume this collection is also a state that maintained by Flink.
Then, in each 10s fire, the window function will take the objects out from iterator and do some update, and in next fire, I assume I would get the updated value of that object.
With File system backend it was successful but eats a lot of memory and finally I got GC overhead limit, then I switch to rocksdb backend and the problem is the object in the next fire round is not updated by the previous fire round.
Do I have to do some additional staff with rocksdb backend in this case?

Thanks in advance
Mingliang


本邮件及其附件含有小红书公司的保密信息,仅限于发送给以上收件人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! 
This communication may contain privileged or other confidential information of Red. If you have received it in error, please advise the sender by reply e-mail and immediately delete the message and any attachments without copying or disclosing the contents. Thank you.