Why don't operations on KeyedStream return KeyedStream?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Why don't operations on KeyedStream return KeyedStream?

Elias Levy
Operators on a KeyedStream don't return a new KeyedStream.  Is there a reason for this?  You need to perform `keyBy` again to get a KeyedStream.  Presumably if you key by the same value there won't be any shuffled data, but the key may no longer be available within the stream record.
Reply | Threaded
Open this post in threaded view
|

Re: Why don't operations on KeyedStream return KeyedStream?

vino yang
Hi Elias,

Can you express this matter more clearly? 
The reason the KeyedStream object exists is that it needs to provide some different transform methods than the DataStream object. 
These transform methods are limited to keyBy. 
Why do you need to execute keyBy twice to get a KeyedStream object? 
You can define one and reuse it, can't you?

Thanks, vino.

Elias Levy <[hidden email]> 于2018年8月29日周三 上午1:52写道:
Operators on a KeyedStream don't return a new KeyedStream.  Is there a reason for this?  You need to perform `keyBy` again to get a KeyedStream.  Presumably if you key by the same value there won't be any shuffled data, but the key may no longer be available within the stream record.
Reply | Threaded
Open this post in threaded view
|

Re: Why don't operations on KeyedStream return KeyedStream?

Fabian Hueske-2
Hi Elias,

Your assumption is correct. An operation on a KeyedStream results in a regular DataStream because the operation might change the data type or the key field.
Hence, it is not guaranteed that the same keys can be extracted from the output of the keyed operation.

However, there is a way to convert a DataStream into a KeyedStream with DataStreamUtils.reinterpretAsKeyedStream(DataStream, KeySelector).
This method returns a DataStream as KeyedStream without performing any checks. You must ensure that the data is correctly partitioned. Note, that it must be exactly the same partitioning that would be obtained by keyBy(), i.e., it is not sufficient that the keys are partitioned in some way.

Best,
Fabian

Am Mi., 29. Aug. 2018 um 04:10 Uhr schrieb vino yang <[hidden email]>:
Hi Elias,

Can you express this matter more clearly? 
The reason the KeyedStream object exists is that it needs to provide some different transform methods than the DataStream object. 
These transform methods are limited to keyBy. 
Why do you need to execute keyBy twice to get a KeyedStream object? 
You can define one and reuse it, can't you?

Thanks, vino.

Elias Levy <[hidden email]> 于2018年8月29日周三 上午1:52写道:
Operators on a KeyedStream don't return a new KeyedStream.  Is there a reason for this?  You need to perform `keyBy` again to get a KeyedStream.  Presumably if you key by the same value there won't be any shuffled data, but the key may no longer be available within the stream record.