Wrong results for join post tumble grouping

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Wrong results for join post tumble grouping

Satyam Shekhar

Hello,


I have a table T0 with the following schema -

root
      |-- amount: BIGINT}}
      |-- timestamp: TIMESTAMP(3)


The table T0 has two rows -

amounttimestamp
00
186400000


The following query with tumble grouping returns the wrong result -

WITH CTE AS
  (SELECT SUM(amount) AS _output,
          TUMBLE_END(`timestamp`, INTERVAL '1' SECOND) AS _dim0
       FROM T0 GROUP BY TUMBLE(`timestamp`, INTERVAL '1' SECOND))
SELECT V0._output as V0_output, V1._output AS V1_output, 
       V0._dim0 as V0_time, V1._dim0 as V1_time
   FROM CTE as V0 INNER JOIN CTE V1 ON V0._dim0 = V1._dim0

 

The returned result is -

V0_outputV1_outputV0_timeV1_time
118640100086401000


The expected result is -

V0_outputV1_outputV0_timeV1_time
0010001000
118640100086401000


Running subquery for `CTE` returns the correct result -

SELECT SUM(amount) AS _output,
               TUMBLE_END(`timestamp`, INTERVAL '1' SECOND) AS _dim0
    FROM T0 GROUP BY TUMBLE(`timestamp`, INTERVAL '1' SECOND)


Result (this is correct) -

_output_dim0
01000
186401000

Also, the following query without tumble grouping returns the correct result -
WITH CTE AS
  (SELECT amount AS _output,
 `timestamp` AS _dim0 FROM T0)
SELECT V0._output as V0_output, V1._output AS V1_output, 
       V0._dim0 as V0_time, V1._dim0 as V1_time
   FROM CTE as V0 INNER JOIN CTE V1 ON V0._dim0 = V1._dim0

Result -
V0_outputV1_outputV0_timeV1_time
0000
118640000086400000

I have filed a JIRA for the issue - https://issues.apache.org/jira/browse/FLINK-19926#. Would love to get some eyes on it. 

Regards,
Satyam