(DEPRECATED) Apache Flink User Mailing List archive.

DeltaIterations: shrink solution set

Classic

List

Threaded

8 messages Options

Kruse, Sebastian

DeltaIterations: shrink solution set

Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain them.

Thanks,

Sebastian

Alexander Alexandrov

Re: DeltaIterations: shrink solution set

I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new solutions sets should be merged.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain them.

Thanks,

Sebastian

Vasiliki Kalavri

Re: DeltaIterations: shrink solution set

Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for.

-V.

On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:

I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new solutions sets should be merged.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain them.

Thanks,

Sebastian

Alexander Alexandrov

Re: DeltaIterations: shrink solution set

True.

2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>:

Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for.

-V.

On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:
I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new solutions sets should be merged.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain them.

Thanks,

Sebastian

Stephan Ewen

Re: DeltaIterations: shrink solution set

You can also use a bulk iteration and just keep the state yourself. Since the functions love across iterations, it is easily doable to just gather the state in a HashMap yourself. Use map(), or mapPartition(), a manual partition() call - that should do the trick...

Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>:

True.

2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>:
Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for.

-V.

On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:
I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new solutions sets should be merged.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain them.

Thanks,

Sebastian

Kruse, Sebastian

RE: DeltaIterations: shrink solution set

Thanks for your answers.

I am trying to build an apriori-like algorithm to find key candidates in a relational dataset. I was considering delta iterations, because the algorithm should maintain two datasets: a set of column combinations to be checked (as delta set) and a set of tuples which are still relevant to the next iteration (as work set). So, the general proceeding is adapted from the popular frequent item set algorithm.

I am now also thinking that delta iterations are not the right thing for me, also because of other problems (only join and coGroup to be used on the solution set and “Error: Iterative task without a single iterative input.” whose cause is not obvious to me).

@Alexander: Using an output within a bulk iteration leaves me with the following exception:

Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: A data set that is part of an iteration was used as a sink or action. Did you forget to close the iteration?

Do you have any experience/proposals how to incorporate your idea nevertheless?

@Stefan: Are operators intentionally reused across iterations, i.e., is it an explicit feature or is it likely to change in the future?

Cheers,

Sebastian

From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Ewen
Sent: Mittwoch, 11. Februar 2015 10:02
To: [hidden email]
Subject: Re: DeltaIterations: shrink solution set

Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>:

True.

2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>:

Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for.

-V.

On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:

I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

Thanks,

Sebastian

Stephan Ewen

Re: DeltaIterations: shrink solution set

UDFs exist intentionally across iterations, it is a feature, to allow you to keep state. To Figure out when an iteration starts and ends, you can use a RichFunctions, which get calls to open() and close() for each iteration.

On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <[hidden email]> wrote:

Thanks for your answers.

I am trying to build an apriori-like algorithm to find key candidates in a relational dataset. I was considering delta iterations, because the algorithm should maintain two datasets: a set of column combinations to be checked (as delta set) and a set of tuples which are still relevant to the next iteration (as work set). So, the general proceeding is adapted from the popular frequent item set algorithm.

I am now also thinking that delta iterations are not the right thing for me, also because of other problems (only join and coGroup to be used on the solution set and “Error: Iterative task without a single iterative input.” whose cause is not obvious to me).

@Alexander: Using an output within a bulk iteration leaves me with the following exception:

Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: A data set that is part of an iteration was used as a sink or action. Did you forget to close the iteration?

Do you have any experience/proposals how to incorporate your idea nevertheless?

@Stefan: Are operators intentionally reused across iterations, i.e., is it an explicit feature or is it likely to change in the future?

Cheers,

Sebastian

From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Ewen
Sent: Mittwoch, 11. Februar 2015 10:02
To: [hidden email]
Subject: Re: DeltaIterations: shrink solution set

You can also use a bulk iteration and just keep the state yourself. Since the functions love across iterations, it is easily doable to just gather the state in a HashMap yourself. Use map(), or mapPartition(), a manual partition() call - that should do the trick...

Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>:

True.

2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>:

Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for.

-V.

On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:

I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new solutions sets should be merged.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted” somehow)?

My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain them.

Thanks,

Sebastian

Kruse, Sebastian

RE: DeltaIterations: shrink solution set

That sounds promising.

Yet, I have the problem that I need the candidates in different operators. While feeding them forward is probably easy, e.g., via broadcasts, feeding the candidates “backwards” to the next iteration seems to be more of a problem.

As I am only building a prototype, I might do this feed-back via some global variable, but that is very hacky. Is there some elegant way to do it? Maybe with the distributed cache?

From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Ewen
Sent: Mittwoch, 11. Februar 2015 10:44
To: [hidden email]
Subject: Re: DeltaIterations: shrink solution set

On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <[hidden email]> wrote:

Thanks for your answers.

@Alexander: Using an output within a bulk iteration leaves me with the following exception:

Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: A data set that is part of an iteration was used as a sink or action. Did you forget to close the iteration?

Do you have any experience/proposals how to incorporate your idea nevertheless?

@Stefan: Are operators intentionally reused across iterations, i.e., is it an explicit feature or is it likely to change in the future?

Cheers,

Sebastian

From: [hidden email] [mailto:[hidden email]] On Behalf Of Stephan Ewen
Sent: Mittwoch, 11. Februar 2015 10:02
To: [hidden email]
Subject: Re: DeltaIterations: shrink solution set

Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>:

True.

2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>:

Hi,

It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for.

-V.

On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:

I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not.

BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property...

Regards,
A.

2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:

Hi everyone,

Thanks,

Sebastian