Hi:
When encountering Retract, there was a following sql : select count(1) count, word group by word Suppose the current aggregation result is : 'hello'->3 When there is record to come again, the count of 'hello' will be changed to 4. The following two records will be generated in the stream: [-] 'hello'-> 3 //UPDATE_BEFORE [+] 'hello'->4 //UPDATE_AFTER Can these two records guarantee the order?Must the old result be deleted first and then the new result inserted? |
Yes. The retract message will be generated first, then the new result. lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:00写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Thanks, but when I customized the sink, I found that some UPDATE operations were actually INSERT first, then DELETE. The first one was generated when I inserted, the second and third were generated by my update. But the third should happen before the second. However, the result is not like this. Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:08写道:
|
Does this behavior happen in the same subtask of sink? lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:15写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Yes, as you can see, all the console result are printed by task-1. Benchao Li <[hidden email]> 于2020年5月18日周一 下午3:25写道:
|
You can see the AggFunction's logic here[1]. It's weird that you received those records out of order. Maybe I was missing something here, but in my understanding, it should not happen. lec ssmi <[hidden email]> 于2020年5月18日周一 下午3:28写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Which Flink version are you using? Could you share the whole code? After seeing Benchao's link, I could only imagine that this an old bug or something in object reusage is not as it should be. But tbh broken object reusage would let the first record look like the second record and not swap them, so I'm also at a loss here. On Mon, May 18, 2020 at 9:37 AM Benchao Li <[hidden email]> wrote:
-- Arvid Heise | Senior Java Developer Follow us @VervericaData -- Join Flink Forward - The Apache Flink Conference Stream Processing | Event Driven | Real Time -- Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany -- Ververica GmbHRegistered at Amtsgericht Charlottenburg: HRB 158244 B Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng |
Blink branch , not the master. But according to the source code, the blink version should not have this problem. I feel this is a bug. Will this disorder affect Dynamic Table? Arvid Heise <[hidden email]> 于2020年5月18日周一 下午11:33写道:
|
I'm not familiar with blink branch, and IMO it's just a preview edition, which will not be further developed by the community. Most of the functionality has been merged into blink planner, could you try out the blink planner if possible? lec ssmi <[hidden email]> 于2020年5月19日周二 上午9:37写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
I have enabled the mini batch aggregation, could it be the reason for this? Benchao Li <[hidden email]> 于2020年5月19日周二 下午12:49写道:
|
I don't think mini batch aggregate will produce this result. In the (MiniBatch)GroupAggFunction's implementation, we always ensures that we send the retract message before the new accumulate message. If you could reproduce this in the community's edition, it'll be very helpful for us to help. lec ssmi <[hidden email]> 于2020年5月20日周三 上午10:23写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Free forum by Nabble | Edit this page |