Hi All,
Like the following code,If I use retract stream, I think Flink is able to know which item is modified( if praise has 10000 items now, when one item comes to the stream, only very small amount of data is write to sink) var praiseAggr = tableEnv.sqlQuery(s"SELECT article_id,hll(uid) as PU FROM praise group by article_id” )
tableEnv.sqlUpdate(s"insert into sinkTableName SELECT * from finalTable") But if I use the following sql, by adding a dynamic timestamp field: var praiseAggr = tableEnv.sqlQuery(s"SELECT article_id,hll(uid) as PU,LOCALTIMESTAMP as update_timestamp FROM praise group by article_id” ) Is the whole table flush to the sink? Or only the incremental value will flush to the sink? Why? Thanks, Henry |
Hi Henry, Both sql output incrementally. However there are some problems if you use retract sink. You have to pay attention to the timestamp field since each time the value is different. For example, if the value is updated from 1 to 2,
The retract row is different from the previous row because of the time field. Of course, this problem should be fixed later. Best, Hequn On Mon, Aug 20, 2018 at 6:43 PM, 徐涛 <[hidden email]> wrote:
|
Hi Hequn,
However is it semantically correct? because the sql result is not equal to the bounded table.
|
Hi Henry, If you upsert by key 'article_id', the result is correct, i.e, the result is (a, 2, 2018-08-20 20:18:10.486). What do you think? Best, Hequn On Tue, Aug 21, 2018 at 9:44 AM, 徐涛 <[hidden email]> wrote:
|
Hi Hequn,
Maybe I do not express clearly. I mean if only the update_timestamp of the increment data is updated, it is not enough. Because from the sql, it express the idea “all the time in the table is the same”, but actually each item in the table may be different. It is a bit weird. Best, Henry
|
Hi Henry, You are right that, in MySQL, SYSDATE returns the time at which it executes while LOCALTIMESTAMP returns a constant time that indicates the time at which the statement began to execute. But other database system seems don't have this constraint(correct me if I'm wrong). Sometimes we don't have to follow MySQL. Best, Hequn On Tue, Aug 21, 2018 at 10:21 AM, 徐涛 <[hidden email]> wrote:
|
Hi Hequn,
Another question, for some case, I think update the timestamp of the retract row is reasonable, for example, some user does not want to the hard delete, but the soft delete, so I write code when the retract row comes I only do the soft delete, but I want the update_timestamp different so the ETL program can know that this line has changed. For example, if the value is updated from 1 to 2,
|
Hi, You are right. We can make use of it to do soft delete. But there will be problems in other cases. For example, retract messages by the whole row. I opened a jira[1] about this problem. Thanks for bring up this discussion. Best, Hequn On Tue, Aug 21, 2018 at 12:34 PM, 徐涛 <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |