Flink DataStream and KeyBy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Flink DataStream and KeyBy

Saiph Kappa
Hi,

This line «stream.keyBy(0)» only works if stream is of type DataStream[Tuple] - and this Tuple is not a scala tuple but a flink tuple (why not to use scala Tuple?). Currently keyBy can be applied to anything (at least in scala) like DataStream[String] and
DataStream[Array[String]].

Can anyone confirm me this?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: Flink DataStream and KeyBy

Tzu-Li Tai
Hi Saiph,

In Flink, the key for keyBy() can be provided in different ways: https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys (the doc is for DataSet API, but specifying keys is basically the same for DataStream and DataSet).

As described in the documentation, calls like keyBy(0) are meant for Tuples, so it only works for DataStream[Tuple]. Other key definition types like keyBy(new KeySelector() {...}) can basically take any DataStream of arbitrary data type. Flink finds out whether or not there is a conflict between the type of the data in the DataStream and the way the key is defined at runtime.

Hope this helps!

Cheers,
Gordon

Reply | Threaded
Open this post in threaded view
|

Re: Flink DataStream and KeyBy

Aljoscha Krettek
Hi,
using .keyBy(0) on a Scala DataStream[Tuple2] where Tuple2 is a Scala Tuple should work. Look, for example, at the SocketTextStreamWordCount example in Flink.

Cheers,
Aljoscha

> On 13 Jan 2016, at 18:25, Tzu-Li (Gordon) Tai <[hidden email]> wrote:
>
> Hi Saiph,
>
> In Flink, the key for keyBy() can be provided in different ways:
> https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys
> (the doc is for DataSet API, but specifying keys is basically the same for
> DataStream and DataSet).
>
> As described in the documentation, calls like keyBy(0) are meant for Tuples,
> so it only works for DataStream[Tuple]. Other key definition types like
> keyBy(new KeySelector() {...}) can basically take any DataStream of
> arbitrary data type. Flink finds out whether or not there is a conflict
> between the type of the data in the DataStream and the way the key is
> defined at runtime.
>
> Hope this helps!
>
> Cheers,
> Gordon
>
>
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-DataStream-and-KeyBy-tp4271p4272.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.