Hi Flavio,
I agree, distinct() is a bit limited right now and in fact, there is no good reason for that except nobody found time to improve it.
You can use distinct(KeySelector k) to work directly on DataSet<String> but that's not very convenient either:
DataSet<String> strings = env.fromElements("Hello", "Hello", "World", "Hello");
strings.distinct(new KeySelector<String, String>() {
@Override
public String getKey(String value) throws Exception {
return value;
}
}).print();
Making distinct more generic should take long.
I'll open a JIRA and might eventually fix it, if nobody picks it up.
Cheers, Fabian