Hey!
Careful: The semantics in SQL of a "not-equal" join are quite different from a NOT IN statement.
Here is how you do the equivalent of NOT IN:
If the list of elements is small and known up front, create a hash set and give it to a filter function (closure or constructor). The filter function can look up whether the element is contained or not.
If the elements are not known up front, use a broadcast variable that you attach to a RichFilterFunction. In the filter function's open() method, grab the broadcast variable and turn it into a hash set. The filter is the same as above then.
Check out the API guides for some examples of how to use broadcast variables.
Stephan
Am 11.12.2014 12:17 schrieb "Malte Schwarzer" <
[hidden email]>:
Hi,
is there an easy way to a NOT IN or something like join().where().notEquals() on two datasets with Flink?
Cheers
Malte