Hi, I'm looking for the right way to do the following scheme:My first attempt was: dataStream.keyBy(_.key).countWindow(..) But countWindow groups by a key. I however want to group all elements in partition. But apparently countWindowAll doesn't work on partitioned data. So, my last version is: dataStream.keyBy(_.key.hashCode % 4).countWindow(..)Best regards, Dmitry |
Hi Dmitry,
In all cases, the result of the countWindow will be also grouped by key because of the keyBy() that you are using. If you want to have a non-keyed stream and then split it in count windows, remove the keyBy() and instead of countWindow(), use countWindowAll(). This will have parallelism 1 but then you can repartition your stream so that the downstream operators have higher parallelism. Hope this helps, Kostas
|
Hi Dmitry, the third version is the way to go, IMO.2017-01-23 11:12 GMT+01:00 Kostas Kloudas <[hidden email]>:
|
Free forum by Nabble | Edit this page |