Hello,
I have some questions concerning Join: 1. I would like to make join with different conditions, is there any way to create a Join with conditions different to "equalTo", for example, how would I make a join with > or >= 2. I have a DataSet[Map[String, Any]]. Is it possible to specify KeySelector using a map key? I tried to use below Scala code but it doesn't work Set1.join(Set2).where(_.get("key")).equalTo(_.get("key")) |
Hi, non-equi joins are only supported by building the cross product. This is essentially the nested-loop join strategy, that a conventional database system would chose. However, such joins are prohibitively expensive when applied to large data sets. If you have one small and another large data set, you can do the join by broadcasting the smaller side to a MapFunction (withBroadcastSet() [1]) that has the larger data set as regular input and evaluate the join condition in the MapFunction. The problem with the Any key-selector is, that Flink needs to know the types when the program is optimized because it generates type specific serializers. I think an Any type does not work as join key. Best, Fabian 2015-02-21 10:28 GMT+01:00 Vinh June <[hidden email]>: Hello, |
If your join predicates are like "x < y" or "x <= y" you do not need to build the full Cartesian product in the Map function. Instead you can sort the broadcasted set or even build an index to reduce the number of predicate evaluations. 2015-02-21 11:40 GMT+01:00 Fabian Hueske <[hidden email]>:
|
Free forum by Nabble | Edit this page |