Deterministic map?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Deterministic map?

Juan Fumero
Hi,
  I am doing pure map computation with typical benchmarks like
BlackScholes and NBody.

I am using local configuration with multiple threads. It seems like,
inside the chuck (total size / numThreads) the order is correct. But the
ordering between chunks is not correct, giving an incorrect result at
the end.

What I mean by the order is, the correct result in the correct position
of the array.

Is there any way to guarantee the result?

Many thanks
Juan

Reply | Threaded
Open this post in threaded view
|

Re: Deterministic map?

Chiwan Park-2
Hi, If you use `partitionCustom()` method [1] with custom partitioner, you can guarantee the order of partition.

Regards,
Chiwan Park

[1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/DataSet.html#partitionCustom(org.apache.flink.api.common.functions.Partitioner,%20int)

> On Jul 15, 2015, at 1:10 AM, Juan Fumero <[hidden email]> wrote:
>
> Hi,
>  I am doing pure map computation with typical benchmarks like
> BlackScholes and NBody.
>
> I am using local configuration with multiple threads. It seems like,
> inside the chuck (total size / numThreads) the order is correct. But the
> ordering between chunks is not correct, giving an incorrect result at
> the end.
>
> What I mean by the order is, the correct result in the correct position
> of the array.
>
> Is there any way to guarantee the result?
>
> Many thanks
> Juan
>




Reply | Threaded
Open this post in threaded view
|

Re: Deterministic map?

Juan Fumero
Hi Chiwan,
  great thanks. Is there any example available?

Regards
Juan

On Wed, 2015-07-15 at 01:19 +0900, Chiwan Park wrote:

> Hi, If you use `partitionCustom()` method [1] with custom partitioner, you can guarantee the order of partition.
>
> Regards,
> Chiwan Park
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/DataSet.html#partitionCustom(org.apache.flink.api.common.functions.Partitioner,%20int)
>
> > On Jul 15, 2015, at 1:10 AM, Juan Fumero <[hidden email]> wrote:
> >
> > Hi,
> >  I am doing pure map computation with typical benchmarks like
> > BlackScholes and NBody.
> >
> > I am using local configuration with multiple threads. It seems like,
> > inside the chuck (total size / numThreads) the order is correct. But the
> > ordering between chunks is not correct, giving an incorrect result at
> > the end.
> >
> > What I mean by the order is, the correct result in the correct position
> > of the array.
> >
> > Is there any way to guarantee the result?
> >
> > Many thanks
> > Juan
> >
>
>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: Deterministic map?

Chiwan Park-2
Sure, here is a example [1] of using `partitionCustom()` method in Java API. Scala API is
similar to Java API.

You should implement Partitioner<KEY> interface. The interface has a method called
partition with two parameters. The first parameter is key value of each record and
the second parameter is number of partitions. The partition method must return
the number of partition which the record belongs to. The returned value must be
between 0 and numPartitions - 1.

There are 3 types of `partitionCustom()` method. The difference between three methods is
by method of specifying keys. If you want to know more detail of key specifying method
in Flink, please see the documentation [2] in Flink homepage.

Regards,
Chiwan Park

[1] https://gist.github.com/chiwanpark/e71d27cc8edae8bc7298
[2] https://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#specifying-keys

> On Jul 15, 2015, at 2:40 AM, Juan Fumero <[hidden email]> wrote:
>
> Hi Chiwan,
>  great thanks. Is there any example available?
>
> Regards
> Juan
>
> On Wed, 2015-07-15 at 01:19 +0900, Chiwan Park wrote:
>> Hi, If you use `partitionCustom()` method [1] with custom partitioner, you can guarantee the order of partition.
>>
>> Regards,
>> Chiwan Park
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/api/java/DataSet.html#partitionCustom(org.apache.flink.api.common.functions.Partitioner,%20int)
>>
>>> On Jul 15, 2015, at 1:10 AM, Juan Fumero <[hidden email]> wrote:
>>>
>>> Hi,
>>> I am doing pure map computation with typical benchmarks like
>>> BlackScholes and NBody.
>>>
>>> I am using local configuration with multiple threads. It seems like,
>>> inside the chuck (total size / numThreads) the order is correct. But the
>>> ordering between chunks is not correct, giving an incorrect result at
>>> the end.
>>>
>>> What I mean by the order is, the correct result in the correct position
>>> of the array.
>>>
>>> Is there any way to guarantee the result?
>>>
>>> Many thanks
>>> Juan
>>>
>>
>>
>>
>>
>
>