Hi:
I need to join multiple stream tables using time interval join. The problem is that the time attribute will disappear
after the jon , and pure sql cannot declare the time attribute field again . So, to make is success, I need to insert the last result of join to kafka ,and consume it and join it with another stream table in another flink job . This seems troublesome. Any good idea? |
Hi lec, AFAIK, time attribute will be preserved after time interval join. Could you share your DDL and SQL queries with us? lec ssmi <[hidden email]> 于2020年4月30日周四 下午5:48写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Thanks for your replay. But as I known, if the time attribute will be retained and the time attribute field of both streams is selected in the result after joining, who is the final time attribute variable? Benchao Li <[hidden email]> 于2020年4月30日周四 下午8:25写道:
|
Hi, If the interval join emits the time attributes of both its inputs, you can use either of them as a time attribute in a following operator because the join ensures that the watermark will be aligned with both of them. Best, Fabian Am Mo., 4. Mai 2020 um 00:48 Uhr schrieb lec ssmi <[hidden email]>:
|
I mean using pure sql statement to make it . Can it be possible? Fabian Hueske <[hidden email]> 于2020年5月4日周一 下午4:04写道:
|
Sure, you can write a SQL query with multiple interval joins that preserve event-time attributes and watermarks. There's no need to feed data back to Kafka just to inject it again to assign new watermarks. Am Di., 5. Mai 2020 um 01:45 Uhr schrieb lec ssmi <[hidden email]>:
|
But I have not found there is any syntax to specify time attribute field and watermark again with pure sql. Fabian Hueske <[hidden email]> 于 2020年5月5日周二 15:47写道:
|
Hi lec, You don't need to specify time attribute again like `TUMBLE_ROWTIME`, you just select the time attribute field from one of the input, then it will be time attribute automatically. lec ssmi <[hidden email]> 于2020年5月5日周二 下午4:42写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
As you said, if I select all the time attribute fields from both , which will be the final one? Benchao Li <[hidden email]> 于 2020年5月5日周二 17:26写道:
|
You cannot select more than one time attribute, the planner will give you an Exception if you did that. lec ssmi <[hidden email]> 于2020年5月5日周二 下午8:34写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
Even if the time attribute field is retained, will the related watermark be retained? If not, and there is no sql syntax to declare watermark again, it is equivalent to not being able to do multiple joins in one job. Benchao Li <[hidden email]> 于2020年5月5日周二 下午9:23写道:
|
Yes. The watermark will be propagated correctly, which is the min of two inputs. lec ssmi <[hidden email]> 于2020年5月6日周三 上午9:46写道:
Benchao Li School of Electronics Engineering and Computer Science, Peking University Tel:+86-15650713730 Email: [hidden email]; [hidden email] |
You can in fact forward both time attributes because Flink makes sure that the watermark is automatically adjusted to the "slower" of both input streams. You can run the following queries in the SQL CLI client (here taken an example from a Flink SQL training [1] Flink SQL> CREATE VIEW ridesWithFare AS > SELECT > * > FROM > Rides r, > Fares f > WHERE > r.rideId = f.rideId AND > NOT r.isStart AND > f.payTime BETWEEN r.rideTime - INTERVAL '5' MINUTE AND r.rideTime; [INFO] View has been created. Flink SQL> DESCRIBE ridesWithFare; root |-- rideId: BIGINT |-- taxiId: BIGINT |-- isStart: BOOLEAN |-- lon: FLOAT |-- lat: FLOAT |-- rideTime: TIMESTAMP(3) *ROWTIME* |-- psgCnt: INT |-- rideId0: BIGINT |-- payTime: TIMESTAMP(3) *ROWTIME* |-- payMethod: STRING |-- tip: FLOAT |-- toll: FLOAT |-- fare: FLOAT As you see, both rideTime and payTime are of type TIMESTAMP(3) *ROWTIME*. Hence, both can be used as time attributes later one. However, typically you'll just select one of them, e.g., when defining a grouping window. Cheers, Fabian Am Mi., 6. Mai 2020 um 03:52 Uhr schrieb Benchao Li <[hidden email]>:
|
Free forum by Nabble | Edit this page |