(DEPRECATED) Apache Flink User Mailing List archive.

Why don't Tuple types implement Comparable?

Classic

List

Threaded

4 messages Options

Frank Grimes

Why don't Tuple types implement Comparable?

Hi,

I've recently started to evaluate Flink and have found it odd that its Tuple types, while Serializable, don't implement java.lang.Comparable.

This means that I either need to provide an KeySelector for many operations or subtype the Tuple types and provide my own implementation of compareTo for each.

Is there a technical reason why this was omitted?

For example, the JOOQ/JOOL Tuple types all implement Comparable:

https://github.com/jOOQ/jOOL/blob/master/jOOL-java-8/src/main/java/org/jooq/lambda/tuple/Tuple2.java#L39

As an aside, I tried replacing usage of Flink's Tuple types with the JOOL ones but they caused a StackOverflowError similar to this one:

https://issues.apache.org/jira/browse/FLINK-3922

Thanks,

Frank Grimes

Stephen Connolly

Re: Why don't Tuple types implement Comparable?

On Thu, 21 Feb 2019 at 18:29, Frank Grimes <[hidden email]> wrote:

Hi,

I've recently started to evaluate Flink and have found it odd that its Tuple types, while Serializable, don't implement java.lang.Comparable.
This means that I either need to provide an KeySelector for many operations or subtype the Tuple types and provide my own implementation of compareTo for each.

Is there a technical reason why this was omitted?

A Tuple1<T0> would need to have an implementation of the comparable contract in order to be comparable. The only way I can see of writing that would be if we had:

Tuple1<T0 extends Comparable<? super T0>>

Similarly you'd have

Tuple25<T0 extends Comparable<? super T0>, T1 extends Comparable<? super T1>, T2 extends Comparable<? super T2>, ..., T24 extends Comparable<? super T24>>

Which is a lot less screen friendly than say

Tuple25<T0, T1, T2, ..., T24>

So from a purely code readability PoV you would need to at least replace the Comparable<? super T> with Comparable<T> with the immediate reduction of utility... or perhaps you would remove the generics entirely and go for just Comparable and have the implementation just follow the guarantee... except javac will error out for the erasure.

Now jOOQ just ignores the comparability contract on the generic types, e.g. https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple1.java#L35

The result is that when asked to compare you get a cast that will work or fail: https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuples.java#L31

So jOOQ is following the "fail at runtime" mode... you can make tuples of non-comparable objects and only at runtime if you try to compare them will it blow up.

Flink is following the "fail at compile time" mode... if you try to compare a tuple, you need to provide a way to compare them.

Flink does a lot of work - from what I can see - to fail early and fast. For example, if type inference fails on the dataflow plan, it will error out fast. I would hate to have a Tuple3<A,B,Y> where A and B are both Comparable and Y is not... my tests happened to have a small subset of A and B values that never resulted in a comparison of Y values... so the tests didn't fail... but now on the production environment... at runtime... 1 month later in the middle of my vacation I get a call because the topology is stuck and failing on (a,b,y1).compareTo((a,b,y2))

So my analysis would say it is not a technical reason but rather a principal reason of "fail fast"

For example, the JOOQ/JOOL Tuple types all implement Comparable:
https://github.com/jOOQ/jOOL/blob/master/jOOL-java-8/src/main/java/org/jooq/lambda/tuple/Tuple2.java#L39

As an aside, I tried replacing usage of Flink's Tuple types with the JOOL ones but they caused a StackOverflowError similar to this one:
https://issues.apache.org/jira/browse/FLINK-3922

Thanks,

Frank Grimes

Stephen Connolly

Re: Why don't Tuple types implement Comparable?

On Fri, 22 Feb 2019 at 10:16, Stephen Connolly <[hidden email]> wrote:

On Thu, 21 Feb 2019 at 18:29, Frank Grimes <[hidden email]> wrote:
Hi,

I've recently started to evaluate Flink and have found it odd that its Tuple types, while Serializable, don't implement java.lang.Comparable.
This means that I either need to provide an KeySelector for many operations or subtype the Tuple types and provide my own implementation of compareTo for each.

Is there a technical reason why this was omitted?

A Tuple1<T0> would need to have an implementation of the comparable contract in order to be comparable. The only way I can see of writing that would be if we had:

Tuple1<T0 extends Comparable<? super T0>>

Similarly you'd have

Tuple25<T0 extends Comparable<? super T0>, T1 extends Comparable<? super T1>, T2 extends Comparable<? super T2>, ..., T24 extends Comparable<? super T24>>

Which is a lot less screen friendly than say

Tuple25<T0, T1, T2, ..., T24>

So from a purely code readability PoV you would need to at least replace the Comparable<? super T> with Comparable<T> with the immediate reduction of utility... or perhaps you would remove the generics entirely and go for just Comparable and have the implementation just follow the guarantee... except javac will error out for the erasure.

Now jOOQ just ignores the comparability contract on the generic types, e.g. https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple1.java#L35

The result is that when asked to compare you get a cast that will work or fail: https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuples.java#L31

So jOOQ is following the "fail at runtime" mode... you can make tuples of non-comparable objects and only at runtime if you try to compare them will it blow up.

Flink is following the "fail at compile time" mode... if you try to compare a tuple, you need to provide a way to compare them.

Flink does a lot of work - from what I can see - to fail early and fast. For example, if type inference fails on the dataflow plan, it will error out fast. I would hate to have a Tuple3<A,B,Y> where A and B are both Comparable and Y is not... my tests happened to have a small subset of A and B values that never resulted in a comparison of Y values... so the tests didn't fail... but now on the production environment... at runtime... 1 month later in the middle of my vacation I get a call because the topology is stuck and failing on (a,b,y1).compareTo((a,b,y2))

So my analysis would say it is not a technical reason but rather a principal reason of "fail fast"

Idea: you could fake comparable for things that are not comparable.

We know the two objects are serializable, so we could just do a byte by byte comparison of their serialized representations if they do not implement Comparable... would be slower, would probably not be the comparison result that people expect... but it would be consistent and not error out.

public int compare(Serializable a,Serializable b) {

byte[] b1;

try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos)) {

oos.writeObject(a);

b1 = bos.toByteArray();

}

byte[] b2;

try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos)) {

oos.writeObject(a);

b2 = bos.toByteArray();

}

for (int i = 0; i < Math.min(b1.length,b2.length); i++) {

int r = Byte.compare(b1[i],b2[i]);

if (r != 0) {

return r;

}

return Integer.compare(b1.length,b2.length);

}

is an example of the kind of algorithm. The idea being that all you really want is a consistent comparison because they do not implement Comparable... if somebody had a useful comparison contract for objects of that type then presumably they would already have implemented the comparability contract.

Another approach that could work would be to use Tuple as a builder for ComparableTuple by giving each one a comparable method

public class Tuple2<T0,T1> ... {

...

public ComparableTuple2<T0,T1> comparable(Comparator<? super T0> c0, Comparator<? super T1> c1) {

...

}

...

}

public class ComparableTuple2<T0,T1> extends ComparableTuple {

// notice no bounds on the generic type arguments

...

// naïve storage (reduces GC pressure as the builder is not wasted)

// would need analysis to determine if better to allow the builder allocation

// to be elided by JVM and store the values directly as f0 and f1

private final Tuple2<T0,T1> tuple;

private final Comparator<? super T0> c0;

private final Comparator<? super T1>> c1;

...

}

That way you get ComparableTuple2 can ignore the type bounds on its generic arguments because the instantiation path guarantees a comparison can be made.

ComparableTuple2<User,Page> t = Tuple2.of(a,b).comparable(User::byLastName,Page::byLastAcccessed)

That seems very nice and readable...

For example, the JOOQ/JOOL Tuple types all implement Comparable:
https://github.com/jOOQ/jOOL/blob/master/jOOL-java-8/src/main/java/org/jooq/lambda/tuple/Tuple2.java#L39

As an aside, I tried replacing usage of Flink's Tuple types with the JOOL ones but they caused a StackOverflowError similar to this one:
https://issues.apache.org/jira/browse/FLINK-3922

Thanks,

Frank Grimes

Stephen Connolly

Re: Why don't Tuple types implement Comparable?

On Fri, 22 Feb 2019 at 10:38, Stephen Connolly <[hidden email]> wrote:

On Fri, 22 Feb 2019 at 10:16, Stephen Connolly <[hidden email]> wrote:
On Thu, 21 Feb 2019 at 18:29, Frank Grimes <[hidden email]> wrote:
Hi,

I've recently started to evaluate Flink and have found it odd that its Tuple types, while Serializable, don't implement java.lang.Comparable.
This means that I either need to provide an KeySelector for many operations or subtype the Tuple types and provide my own implementation of compareTo for each.

Is there a technical reason why this was omitted?

A Tuple1<T0> would need to have an implementation of the comparable contract in order to be comparable. The only way I can see of writing that would be if we had:

Tuple1<T0 extends Comparable<? super T0>>

Similarly you'd have

Tuple25<T0 extends Comparable<? super T0>, T1 extends Comparable<? super T1>, T2 extends Comparable<? super T2>, ..., T24 extends Comparable<? super T24>>

Which is a lot less screen friendly than say

Tuple25<T0, T1, T2, ..., T24>

So from a purely code readability PoV you would need to at least replace the Comparable<? super T> with Comparable<T> with the immediate reduction of utility... or perhaps you would remove the generics entirely and go for just Comparable and have the implementation just follow the guarantee... except javac will error out for the erasure.

Now jOOQ just ignores the comparability contract on the generic types, e.g. https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuple1.java#L35

The result is that when asked to compare you get a cast that will work or fail: https://github.com/jOOQ/jOOL/blob/master/jOOL/src/main/java/org/jooq/lambda/tuple/Tuples.java#L31

So jOOQ is following the "fail at runtime" mode... you can make tuples of non-comparable objects and only at runtime if you try to compare them will it blow up.

Flink is following the "fail at compile time" mode... if you try to compare a tuple, you need to provide a way to compare them.

Flink does a lot of work - from what I can see - to fail early and fast. For example, if type inference fails on the dataflow plan, it will error out fast. I would hate to have a Tuple3<A,B,Y> where A and B are both Comparable and Y is not... my tests happened to have a small subset of A and B values that never resulted in a comparison of Y values... so the tests didn't fail... but now on the production environment... at runtime... 1 month later in the middle of my vacation I get a call because the topology is stuck and failing on (a,b,y1).compareTo((a,b,y2))

So my analysis would say it is not a technical reason but rather a principal reason of "fail fast"

Idea: you could fake comparable for things that are not comparable.

We know the two objects are serializable, so we could just do a byte by byte comparison of their serialized representations if they do not implement Comparable... would be slower, would probably not be the comparison result that people expect... but it would be consistent and not error out.

public int compare(Serializable a,Serializable b) {
byte[] b1;
try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos)) {
oos.writeObject(a);
b1 = bos.toByteArray();
}
byte[] b2;
try (ByteArrayOutputStream bos = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(bos)) {
oos.writeObject(a);
b2 = bos.toByteArray();
}
for (int i = 0; i < Math.min(b1.length,b2.length); i++) {
int r = Byte.compare(b1[i],b2[i]);
if (r != 0) {
return r;
}
}
return Integer.compare(b1.length,b2.length);
}

Hmmm there is nothing stopping you from having the above as the basis for a utility universal comparison method that you would just pass by method handle... or if you want the jOOQ behaviour just go with

public class TupleComparator implements Comparator<Tuple> {

...

// do the class cast here and let it blow up at runtime

}

is an example of the kind of algorithm. The idea being that all you really want is a consistent comparison because they do not implement Comparable... if somebody had a useful comparison contract for objects of that type then presumably they would already have implemented the comparability contract.

Another approach that could work would be to use Tuple as a builder for ComparableTuple by giving each one a comparable method

public class Tuple2<T0,T1> ... {
...
public ComparableTuple2<T0,T1> comparable(Comparator<? super T0> c0, Comparator<? super T1> c1) {
...
}
...
}

public class ComparableTuple2<T0,T1> extends ComparableTuple {
// notice no bounds on the generic type arguments
...
// naïve storage (reduces GC pressure as the builder is not wasted)
// would need analysis to determine if better to allow the builder allocation
// to be elided by JVM and store the values directly as f0 and f1
private final Tuple2<T0,T1> tuple;
private final Comparator<? super T0> c0;
private final Comparator<? super T1>> c1;

...
}

That way you get ComparableTuple2 can ignore the type bounds on its generic arguments because the instantiation path guarantees a comparison can be made.

ComparableTuple2<User,Page> t = Tuple2.of(a,b).comparable(User::byLastName,Page::byLastAcccessed)

That seems very nice and readable...

Thinking more, this could have negative implications on the savepoint data sets and savepoint restore on different dataflows

For example, the JOOQ/JOOL Tuple types all implement Comparable:
https://github.com/jOOQ/jOOL/blob/master/jOOL-java-8/src/main/java/org/jooq/lambda/tuple/Tuple2.java#L39

As an aside, I tried replacing usage of Flink's Tuple types with the JOOL ones but they caused a StackOverflowError similar to this one:
https://issues.apache.org/jira/browse/FLINK-3922

Thanks,

Frank Grimes