cogroup

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

cogroup

Michele Bertoni
Hi I have a question on cogroup

when I cogroup two dataset is there a way to compare each element on the left with each element on the right (inside a group) without collecting one side?

right now I am doing

left.cogroup(right).where(0,1,2).equalTo(0,1,2){
        (leftIterator, rightIterator, out) => {
                val lSet = leftIterator.toSet // <———— toSet
                for(r <- rightIterator)
                        for(l <- lSet)
                                //do something
        }
}

I would like to avoid the toSet


thanks for help
Reply | Threaded
Open this post in threaded view
|

Re: cogroup

Matthias J. Sax
Why do you not use a join? CoGroup seems not to be the right operator.

-Matthias

On 06/29/2015 05:40 PM, Michele Bertoni wrote:

> Hi I have a question on cogroup
>
> when I cogroup two dataset is there a way to compare each element on the left with each element on the right (inside a group) without collecting one side?
>
> right now I am doing
>
> left.cogroup(right).where(0,1,2).equalTo(0,1,2){
> (leftIterator, rightIterator, out) => {
> val lSet = leftIterator.toSet // <———— toSet
> for(r <- rightIterator)
> for(l <- lSet)
> //do something
> }
> }
>
> I would like to avoid the toSet
>
>
> thanks for help
>


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: cogroup

Fabian Hueske-2
If you just want to do the pairwise comparison try join().
Join is an inner join and will give you all pairs of elements with matching keys.
For CoGroup, there is no other way than collecting one side in memory.

Best, Fabian

2015-06-29 17:42 GMT+02:00 Matthias J. Sax <[hidden email]>:
Why do you not use a join? CoGroup seems not to be the right operator.

-Matthias

On 06/29/2015 05:40 PM, Michele Bertoni wrote:
> Hi I have a question on cogroup
>
> when I cogroup two dataset is there a way to compare each element on the left with each element on the right (inside a group) without collecting one side?
>
> right now I am doing
>
> left.cogroup(right).where(0,1,2).equalTo(0,1,2){
>       (leftIterator, rightIterator, out) => {
>               val lSet = leftIterator.toSet           // <———— toSet
>               for(r <- rightIterator)
>                       for(l <- lSet)
>                               //do something
>       }
> }
>
> I would like to avoid the toSet
>
>
> thanks for help
>


Reply | Threaded
Open this post in threaded view
|

Re: cogroup

Michele Bertoni
thanks both for answering,
that’s what i expected

I was using join at first but sadly i had to move from join to cogroup because I need outer join

the alternative to the cogroup is to “complete” the inner join extracting from the original dataset what did not matched in the cogroup by difference but I don’t think it is convenient




Il giorno 29/giu/2015, alle ore 17:58, Fabian Hueske <[hidden email]> ha scritto:

If you just want to do the pairwise comparison try join().
Join is an inner join and will give you all pairs of elements with matching keys.
For CoGroup, there is no other way than collecting one side in memory.

Best, Fabian

2015-06-29 17:42 GMT+02:00 Matthias J. Sax <[hidden email]>:
Why do you not use a join? CoGroup seems not to be the right operator.

-Matthias

On 06/29/2015 05:40 PM, Michele Bertoni wrote:
> Hi I have a question on cogroup
>
> when I cogroup two dataset is there a way to compare each element on the left with each element on the right (inside a group) without collecting one side?
>
> right now I am doing
>
> left.cogroup(right).where(0,1,2).equalTo(0,1,2){
>       (leftIterator, rightIterator, out) => {
>               val lSet = leftIterator.toSet           // <———— toSet
>               for(r <- rightIterator)
>                       for(l <- lSet)
>                               //do something
>       }
> }
>
> I would like to avoid the toSet
>
>
> thanks for help
>



Reply | Threaded
Open this post in threaded view
|

Re: cogroup

Fabian Hueske-2
Yes, if you need outer join semantics you have to go with CoGroup.
Some members of the Flink community are working on true outer joins for Flink, but I don't know what the progress is.

Best, Fabian

2015-06-29 18:05 GMT+02:00 Michele Bertoni <[hidden email]>:
thanks both for answering,
that’s what i expected

I was using join at first but sadly i had to move from join to cogroup because I need outer join

the alternative to the cogroup is to “complete” the inner join extracting from the original dataset what did not matched in the cogroup by difference but I don’t think it is convenient




Il giorno 29/giu/2015, alle ore 17:58, Fabian Hueske <[hidden email]> ha scritto:

If you just want to do the pairwise comparison try join().
Join is an inner join and will give you all pairs of elements with matching keys.
For CoGroup, there is no other way than collecting one side in memory.

Best, Fabian

2015-06-29 17:42 GMT+02:00 Matthias J. Sax <[hidden email]>:
Why do you not use a join? CoGroup seems not to be the right operator.

-Matthias

On 06/29/2015 05:40 PM, Michele Bertoni wrote:
> Hi I have a question on cogroup
>
> when I cogroup two dataset is there a way to compare each element on the left with each element on the right (inside a group) without collecting one side?
>
> right now I am doing
>
> left.cogroup(right).where(0,1,2).equalTo(0,1,2){
>       (leftIterator, rightIterator, out) => {
>               val lSet = leftIterator.toSet           // <———— toSet
>               for(r <- rightIterator)
>                       for(l <- lSet)
>                               //do something
>       }
> }
>
> I would like to avoid the toSet
>
>
> thanks for help
>




Reply | Threaded
Open this post in threaded view
|

Re: cogroup

Michele Bertoni
ok thanks!
then by now i will use it until true outer join is ready


Il giorno 29/giu/2015, alle ore 18:22, Fabian Hueske <[hidden email]> ha scritto:

Yes, if you need outer join semantics you have to go with CoGroup.
Some members of the Flink community are working on true outer joins for Flink, but I don't know what the progress is.

Best, Fabian

2015-06-29 18:05 GMT+02:00 Michele Bertoni <[hidden email]>:
thanks both for answering,
that’s what i expected

I was using join at first but sadly i had to move from join to cogroup because I need outer join

the alternative to the cogroup is to “complete” the inner join extracting from the original dataset what did not matched in the cogroup by difference but I don’t think it is convenient




Il giorno 29/giu/2015, alle ore 17:58, Fabian Hueske <[hidden email]> ha scritto:

If you just want to do the pairwise comparison try join().
Join is an inner join and will give you all pairs of elements with matching keys.
For CoGroup, there is no other way than collecting one side in memory.

Best, Fabian

2015-06-29 17:42 GMT+02:00 Matthias J. Sax <[hidden email]>:
Why do you not use a join? CoGroup seems not to be the right operator.

-Matthias

On 06/29/2015 05:40 PM, Michele Bertoni wrote:
> Hi I have a question on cogroup
>
> when I cogroup two dataset is there a way to compare each element on the left with each element on the right (inside a group) without collecting one side?
>
> right now I am doing
>
> left.cogroup(right).where(0,1,2).equalTo(0,1,2){
>       (leftIterator, rightIterator, out) => {
>               val lSet = leftIterator.toSet           // <———— toSet
>               for(r <- rightIterator)
>                       for(l <- lSet)
>                               //do something
>       }
> }
>
> I would like to avoid the toSet
>
>
> thanks for help
>