Find differences

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Find differences

Lydia Ickler
Hi,

If i have 2 DataSets A and B of Type Tuple3<Integer,Integer,Double> how would I get a subset of A (based on the fields (0,1)) that does not occur in B?
Is there maybe an already implemented method?

Best regards,
Lydia

Von meinem iPhone gesendet
Reply | Threaded
Open this post in threaded view
|

Re: Find differences

Lydia Ickler
 Nevermind! I figured it out with groupby and
Reducegroup

Von meinem iPhone gesendet

> Am 07.04.2016 um 11:51 schrieb Lydia Ickler <[hidden email]>:
>
> Hi,
>
> If i have 2 DataSets A and B of Type Tuple3<Integer,Integer,Double> how would I get a subset of A (based on the fields (0,1)) that does not occur in B?
> Is there maybe an already implemented method?
>
> Best regards,
> Lydia
>
> Von meinem iPhone gesendet
Reply | Threaded
Open this post in threaded view
|

Re: Find differences

stefanobaghino
Perhaps an outer join can do the trick as well but I don't know which one would perform better.

On Thu, Apr 7, 2016 at 12:05 PM, Lydia Ickler <[hidden email]> wrote:
 Nevermind! I figured it out with groupby and
Reducegroup

Von meinem iPhone gesendet

> Am <a href="tel:07.04.2016" value="+3907042016">07.04.2016 um 11:51 schrieb Lydia Ickler <[hidden email]>:
>
> Hi,
>
> If i have 2 DataSets A and B of Type Tuple3<Integer,Integer,Double> how would I get a subset of A (based on the fields (0,1)) that does not occur in B?
> Is there maybe an already implemented method?
>
> Best regards,
> Lydia
>
> Von meinem iPhone gesendet



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit
Reply | Threaded
Open this post in threaded view
|

Re: Find differences

Fabian Hueske-2
I would go with an outer join as Stefano suggested.
Outer joins can be executed as hash joins which will probably be more efficient than using a sort based groupBy/reduceGroup.
Also outer joins are a more intuitive and simpler, IMO.

2016-04-07 12:35 GMT+02:00 Stefano Baghino <[hidden email]>:
Perhaps an outer join can do the trick as well but I don't know which one would perform better.

On Thu, Apr 7, 2016 at 12:05 PM, Lydia Ickler <[hidden email]> wrote:
 Nevermind! I figured it out with groupby and
Reducegroup

Von meinem iPhone gesendet

> Am <a href="tel:07.04.2016" value="+3907042016" target="_blank">07.04.2016 um 11:51 schrieb Lydia Ickler <[hidden email]>:
>
> Hi,
>
> If i have 2 DataSets A and B of Type Tuple3<Integer,Integer,Double> how would I get a subset of A (based on the fields (0,1)) that does not occur in B?
> Is there maybe an already implemented method?
>
> Best regards,
> Lydia
>
> Von meinem iPhone gesendet



--
BR,
Stefano Baghino

Software Engineer @ Radicalbit