DataStream.connect semantics for Flink SQL / Table API

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

DataStream.connect semantics for Flink SQL / Table API

Yuval Itzchakov
Hi,

I want to create an abstraction over N source tables (streams), and unify them into 1 table. I know UNION and UNION ALL exist, but I'm looking for DataStream.connect like semantics in regards to watermark control. I don't want to take the minimum watermark over all N streams, as I know for sure there won't be any time based calculations over the result table and I don't want data to be delayed as the different tables composing the stream highly vary in times of events flowing into the stream (one stream can receive events once an hour, the other only once a day).

I don't want to turn the Table into a DataStream, since I want to leverage predicate pushdown for the definition of the result table. Does anything like this currently exist?

--
Best Regards,
Yuval Itzchakov.
Reply | Threaded
Open this post in threaded view
|

Re: DataStream.connect semantics for Flink SQL / Table API

Aljoscha Krettek
Hi,

I think if you don't do any operations that are sensitive to event-time
then just using a UNION/UNION ALL should work because then there won't
be any buffering by event time which could delay your output.

Have you tried this and have you seen an actual delay in your output?

Best,
Aljoscha

On 12.11.20 12:57, Yuval Itzchakov wrote:

> Hi,
>
> I want to create an abstraction over N source tables (streams), and unify
> them into 1 table. I know UNION and UNION ALL exist, but I'm looking for
> DataStream.connect like semantics in regards to watermark control. I don't
> want to take the minimum watermark over all N streams, as I know for sure
> there won't be any time based calculations over the result table and I
> don't want data to be delayed as the different tables composing the stream
> highly vary in times of events flowing into the stream (one stream can
> receive events once an hour, the other only once a day).
>
> I don't want to turn the Table into a DataStream, since I want to leverage
> predicate pushdown for the definition of the result table. Does anything
> like this currently exist?
>

Reply | Threaded
Open this post in threaded view
|

Re: DataStream.connect semantics for Flink SQL / Table API

Yuval Itzchakov
Hi Aljoscha,

You're right, I had a misunderstanding about how unions without window operations work.

Thanks!


On Thu, Nov 12, 2020, 18:37 Aljoscha Krettek <[hidden email]> wrote:
Hi,

I think if you don't do any operations that are sensitive to event-time
then just using a UNION/UNION ALL should work because then there won't
be any buffering by event time which could delay your output.

Have you tried this and have you seen an actual delay in your output?

Best,
Aljoscha

On 12.11.20 12:57, Yuval Itzchakov wrote:
> Hi,
>
> I want to create an abstraction over N source tables (streams), and unify
> them into 1 table. I know UNION and UNION ALL exist, but I'm looking for
> DataStream.connect like semantics in regards to watermark control. I don't
> want to take the minimum watermark over all N streams, as I know for sure
> there won't be any time based calculations over the result table and I
> don't want data to be delayed as the different tables composing the stream
> highly vary in times of events flowing into the stream (one stream can
> receive events once an hour, the other only once a day).
>
> I don't want to turn the Table into a DataStream, since I want to leverage
> predicate pushdown for the definition of the result table. Does anything
> like this currently exist?
>