POJO coCroup on null value

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

POJO coCroup on null value

Flavio Pompermaier
Hi to all,

I'd like to join 2 datasets of POJO, let's say for example:

Person:
 - name
 - birthPlaceId

Place:
 - id
 - name

I'd like to do people.coCoGroup(places).where("birthPlaceId").equalTo("id").with(...)

However, not all people have a birthPlaceId value in my use case..so I get a NullPointer.
Am I using the wrong operator for this? 
This is the stackTrace:

java.lang.RuntimeException: A NullPointerException occured while accessing a key field in a POJO. Most likely, the value grouped/joined on is null. Field name: birthPlaceId
at org.apache.flink.api.java.typeutils.runtime.PojoComparator.hash(PojoComparator.java:217)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.hashPartitionDefault(OutputEmitter.java:175)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:132)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:28)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:78)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)

Best,
Flavio
Reply | Threaded
Open this post in threaded view
|

Re: POJO coCroup on null value

Stephan Ewen
Hi Flavio!

Keys cannot be null in Flink, that is a contract deep in the system. 

Filter out the null valued elements, or, if you want them in the result, I would try to use a special value for "null". That should do it.

BTW: In SQL, joining on null usually filters out elements, as key operations on null are undefined.

Greetings,
Stephan


On Thu, Jul 2, 2015 at 7:10 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

I'd like to join 2 datasets of POJO, let's say for example:

Person:
 - name
 - birthPlaceId

Place:
 - id
 - name

I'd like to do people.coCoGroup(places).where("birthPlaceId").equalTo("id").with(...)

However, not all people have a birthPlaceId value in my use case..so I get a NullPointer.
Am I using the wrong operator for this? 
This is the stackTrace:

java.lang.RuntimeException: A NullPointerException occured while accessing a key field in a POJO. Most likely, the value grouped/joined on is null. Field name: birthPlaceId
at org.apache.flink.api.java.typeutils.runtime.PojoComparator.hash(PojoComparator.java:217)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.hashPartitionDefault(OutputEmitter.java:175)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:132)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:28)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:78)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: POJO coCroup on null value

Flavio Pompermaier

ok, thanks for the help Stephan!

On 2 Jul 2015 20:05, "Stephan Ewen" <[hidden email]> wrote:
Hi Flavio!

Keys cannot be null in Flink, that is a contract deep in the system. 

Filter out the null valued elements, or, if you want them in the result, I would try to use a special value for "null". That should do it.

BTW: In SQL, joining on null usually filters out elements, as key operations on null are undefined.

Greetings,
Stephan


On Thu, Jul 2, 2015 at 7:10 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

I'd like to join 2 datasets of POJO, let's say for example:

Person:
 - name
 - birthPlaceId

Place:
 - id
 - name

I'd like to do people.coCoGroup(places).where("birthPlaceId").equalTo("id").with(...)

However, not all people have a birthPlaceId value in my use case..so I get a NullPointer.
Am I using the wrong operator for this? 
This is the stackTrace:

java.lang.RuntimeException: A NullPointerException occured while accessing a key field in a POJO. Most likely, the value grouped/joined on is null. Field name: birthPlaceId
at org.apache.flink.api.java.typeutils.runtime.PojoComparator.hash(PojoComparator.java:217)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.hashPartitionDefault(OutputEmitter.java:175)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:132)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:28)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:78)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)

Best,
Flavio

Reply | Threaded
Open this post in threaded view
|

Re: POJO coCroup on null value

Fabian Hueske-2
In fact you can implement own composite data types (like Tuple, Pojo) that can deal with nullable fields as keys but you need custom serializers and comparators for that. These types won't be as efficient as types that cannot handle null fields.

Cheers, Fabian

2015-07-02 20:17 GMT+02:00 Flavio Pompermaier <[hidden email]>:

ok, thanks for the help Stephan!

On 2 Jul 2015 20:05, "Stephan Ewen" <[hidden email]> wrote:
Hi Flavio!

Keys cannot be null in Flink, that is a contract deep in the system. 

Filter out the null valued elements, or, if you want them in the result, I would try to use a special value for "null". That should do it.

BTW: In SQL, joining on null usually filters out elements, as key operations on null are undefined.

Greetings,
Stephan


On Thu, Jul 2, 2015 at 7:10 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

I'd like to join 2 datasets of POJO, let's say for example:

Person:
 - name
 - birthPlaceId

Place:
 - id
 - name

I'd like to do people.coCoGroup(places).where("birthPlaceId").equalTo("id").with(...)

However, not all people have a birthPlaceId value in my use case..so I get a NullPointer.
Am I using the wrong operator for this? 
This is the stackTrace:

java.lang.RuntimeException: A NullPointerException occured while accessing a key field in a POJO. Most likely, the value grouped/joined on is null. Field name: birthPlaceId
at org.apache.flink.api.java.typeutils.runtime.PojoComparator.hash(PojoComparator.java:217)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.hashPartitionDefault(OutputEmitter.java:175)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:132)
at org.apache.flink.runtime.operators.shipping.OutputEmitter.selectChannels(OutputEmitter.java:28)
at org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:78)
at org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)

Best,
Flavio