The order of Retract Record

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

The order of Retract Record

lec ssmi
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  


Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

Benchao Li
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

lec ssmi
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

Benchao Li
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

lec ssmi
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

Benchao Li
You can see the AggFunction's logic here[1].
It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen.


lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

Arvid Heise-3
Which Flink version are you using? Could you share the whole code?

After seeing Benchao's link, I could only imagine that this an old bug or something in object reusage is not as it should be. But tbh broken object reusage would let the first record look like the second record and not swap them, so I'm also at a loss here.


On Mon, May 18, 2020 at 9:37 AM Benchao Li <[hidden email]> wrote:
You can see the AggFunction's logic here[1].
It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen.


lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

lec ssmi
Blink branch , not the master.
But according to the source code, the blink version should not have this problem.
I feel this is a bug. Will this disorder affect Dynamic Table?

Arvid Heise <[hidden email]> 于2020年5月18日周一 下午11:33写道:
Which Flink version are you using? Could you share the whole code?

After seeing Benchao's link, I could only imagine that this an old bug or something in object reusage is not as it should be. But tbh broken object reusage would let the first record look like the second record and not swap them, so I'm also at a loss here.


On Mon, May 18, 2020 at 9:37 AM Benchao Li <[hidden email]> wrote:
You can see the AggFunction's logic here[1].
It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen.


lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

Benchao Li
I'm not familiar with blink branch, and IMO it's just a preview edition, which will not be further developed by the community.
Most of the functionality has been merged into blink planner, could you try out the blink planner if possible?

lec ssmi <[hidden email]> 于2020年5月19日周二 上午9:37写道:
Blink branch , not the master.
But according to the source code, the blink version should not have this problem.
I feel this is a bug. Will this disorder affect Dynamic Table?

Arvid Heise <[hidden email]> 于2020年5月18日周一 下午11:33写道:
Which Flink version are you using? Could you share the whole code?

After seeing Benchao's link, I could only imagine that this an old bug or something in object reusage is not as it should be. But tbh broken object reusage would let the first record look like the second record and not swap them, so I'm also at a loss here.


On Mon, May 18, 2020 at 9:37 AM Benchao Li <[hidden email]> wrote:
You can see the AggFunction's logic here[1].
It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen.


lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

lec ssmi
 I  have  enabled the mini batch  aggregation,  could it be the reason for this?  

Benchao Li <[hidden email]> 于2020年5月19日周二 下午12:49写道:
I'm not familiar with blink branch, and IMO it's just a preview edition, which will not be further developed by the community.
Most of the functionality has been merged into blink planner, could you try out the blink planner if possible?

lec ssmi <[hidden email]> 于2020年5月19日周二 上午9:37写道:
Blink branch , not the master.
But according to the source code, the blink version should not have this problem.
I feel this is a bug. Will this disorder affect Dynamic Table?

Arvid Heise <[hidden email]> 于2020年5月18日周一 下午11:33写道:
Which Flink version are you using? Could you share the whole code?

After seeing Benchao's link, I could only imagine that this an old bug or something in object reusage is not as it should be. But tbh broken object reusage would let the first record look like the second record and not swap them, so I'm also at a loss here.


On Mon, May 18, 2020 at 9:37 AM Benchao Li <[hidden email]> wrote:
You can see the AggFunction's logic here[1].
It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen.


lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: The order of Retract Record

Benchao Li
I don't think mini batch aggregate will produce this result. 
In the (MiniBatch)GroupAggFunction's implementation, we always ensures that we send the retract message before the new accumulate message.
If you could reproduce this in the community's edition, it'll be very helpful for us to help.

lec ssmi <[hidden email]> 于2020年5月20日周三 上午10:23写道:
 I  have  enabled the mini batch  aggregation,  could it be the reason for this?  

Benchao Li <[hidden email]> 于2020年5月19日周二 下午12:49写道:
I'm not familiar with blink branch, and IMO it's just a preview edition, which will not be further developed by the community.
Most of the functionality has been merged into blink planner, could you try out the blink planner if possible?

lec ssmi <[hidden email]> 于2020年5月19日周二 上午9:37写道:
Blink branch , not the master.
But according to the source code, the blink version should not have this problem.
I feel this is a bug. Will this disorder affect Dynamic Table?

Arvid Heise <[hidden email]> 于2020年5月18日周一 下午11:33写道:
Which Flink version are you using? Could you share the whole code?

After seeing Benchao's link, I could only imagine that this an old bug or something in object reusage is not as it should be. But tbh broken object reusage would let the first record look like the second record and not swap them, so I'm also at a loss here.


On Mon, May 18, 2020 at 9:37 AM Benchao Li <[hidden email]> wrote:
You can see the AggFunction's logic here[1].
It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen.


lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Yes, as you can see,  all the console result are printed by task-1.

Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
Does this behavior happen in the same subtask of sink?

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Thanks,   but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE.
image.png
The first one was generated when I inserted, the second and third  were generated by my update. But the third  should happen before the second.
However, the result is not like this. 


Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
Yes. The retract message will be generated first, then the new result.

lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Hi:
  When encountering Retract,  there was a following sql :
    select count(1) count, word group by word

  Suppose the current aggregation result is :
     'hello'->3
When there is record to come again, the count of 'hello' will be changed to 4.
The following two records will be generated in the stream:

[-] 'hello'-> 3     //UPDATE_BEFORE
[+] 'hello'->4    //UPDATE_AFTER

Can these two records guarantee the order?Must the old result be deleted first and  then the new result inserted?


  




--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]


--
Benchao Li
School of Electronics Engineering and Computer Science, Peking University
Tel:+86-15650713730
Email: [hidden email]; [hidden email]