Unexpected unnamed sink in SQL job

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Unexpected unnamed sink in SQL job

Paul Lam
Hi, 

I noticed that a simple SQL like 'insert into hive_parquet_table select … from some_kafka_table' will generates an additional operator called ’Sink: Unnamed’ with parallelism 1. I wonder if it’s by design? And what’s the functionality of this operator?

Thanks in advance!

Env: 
- Flink 1.11.0 with Blink planner 
- Hive 1.1.0

Best,
Paul Lam

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected unnamed sink in SQL job

Jingsong Li
 Hi Paul,

It is a meaningless sink.

This is because for the sake of flexibility, the `StreamingFileCommitter` is implemented as a `StreamOperator` rather than a `SinkFunction`.
But `StreamTableSink` requires a `SinkFunction`, so we give a meaningless `DiscardingSink` to it. And this sink should be chained to upstream operator.

Best,
Jingsong

On Tue, Aug 4, 2020 at 5:03 PM Paul Lam <[hidden email]> wrote:
Hi, 

I noticed that a simple SQL like 'insert into hive_parquet_table select … from some_kafka_table' will generates an additional operator called ’Sink: Unnamed’ with parallelism 1. I wonder if it’s by design? And what’s the functionality of this operator?

Thanks in advance!

Env: 
- Flink 1.11.0 with Blink planner 
- Hive 1.1.0

Best,
Paul Lam



--
Best, Jingsong Lee
Reply | Threaded
Open this post in threaded view
|

Re: Unexpected unnamed sink in SQL job

Paul Lam
Hi Jingsong,

Thanks for your input. Now I understand the design.

I think in my case the StreamingFileCommitter is not chained because its upstream operator is not parallelism 1. 

BTW, it’d be better if it has a more meaningful operator name.

Best,
Paul Lam

2020年8月4日 17:11,Jingsong Li <[hidden email]> 写道:

StreamingFileCommitter

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected unnamed sink in SQL job

godfrey he
I think we assign a meaningful name to sink Transformation
 like other Transformations in StreamExecLegacySink/BatchExecLegacySink.

Paul Lam <[hidden email]> 于2020年8月4日周二 下午5:25写道:
Hi Jingsong,

Thanks for your input. Now I understand the design.

I think in my case the StreamingFileCommitter is not chained because its upstream operator is not parallelism 1. 

BTW, it’d be better if it has a more meaningful operator name.

Best,
Paul Lam

2020年8月4日 17:11,Jingsong Li <[hidden email]> 写道:

StreamingFileCommitter

Reply | Threaded
Open this post in threaded view
|

Re: Unexpected unnamed sink in SQL job

Jark Wu-3
If there is a "’Sink: Unnamed" operator using pure SQL, I think we should improve this to give a meaningful operator name.

On Tue, 4 Aug 2020 at 21:39, godfrey he <[hidden email]> wrote:
I think we assign a meaningful name to sink Transformation
 like other Transformations in StreamExecLegacySink/BatchExecLegacySink.

Paul Lam <[hidden email]> 于2020年8月4日周二 下午5:25写道:
Hi Jingsong,

Thanks for your input. Now I understand the design.

I think in my case the StreamingFileCommitter is not chained because its upstream operator is not parallelism 1. 

BTW, it’d be better if it has a more meaningful operator name.

Best,
Paul Lam

2020年8月4日 17:11,Jingsong Li <[hidden email]> 写道:

StreamingFileCommitter