Hi everyone, From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements from the solution set (apart from marking them as “deleted”
somehow)? My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would like to remove them from the solution set, otherwise retain
them. Thanks, Sebastian |
I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not. An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new solutions sets should be merged.BTW can you share some details about that particular algorithm? I was thinking about examples iterative algorithms with this property... Regards, A. 2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>:
|
Hi, It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for. -V. On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote:
|
True. 2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>:
|
You can also use a bulk iteration and just keep the state yourself. Since the functions love across iterations, it is easily doable to just gather the state in a HashMap yourself. Use map(), or mapPartition(), a manual partition() call - that should do the trick... Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>:
|
Thanks for your answers. I am trying to build an apriori-like algorithm to find key candidates in a relational dataset. I was considering delta iterations, because the algorithm should
maintain two datasets: a set of column combinations to be checked (as delta set) and a set of tuples which are still relevant to the next iteration (as work set). So, the general proceeding is adapted from the popular frequent item set algorithm. I am now also thinking that delta iterations are not the right thing for me, also because of other problems (only join and coGroup to be used on the solution
set and “Error: Iterative task without a single iterative input.” whose cause is not obvious to me). @Alexander: Using an output within a bulk iteration leaves me with the following exception: Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: A data set that is part of an iteration was used as a sink or action. Did you
forget to close the iteration? Do you have any experience/proposals how to incorporate your idea nevertheless? @Stefan: Are operators intentionally reused across iterations, i.e., is it an explicit feature or is it likely to change in the future? Cheers, Sebastian From: [hidden email] [mailto:[hidden email]]
On Behalf Of Stephan Ewen You can also use a bulk iteration and just keep the state yourself. Since the functions love across iterations, it is easily doable to just gather the state in a HashMap yourself. Use map(), or mapPartition(), a manual partition() call - that should do the
trick... Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>: True. 2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>: Hi, It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for. -V. On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote: I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not. An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats. You then have full control how the old and the new
solutions sets should be merged.
Regards, 2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>: Hi everyone, From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements
from the solution set (apart from marking them as “deleted” somehow)? My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would
like to remove them from the solution set, otherwise retain them. Thanks, Sebastian |
UDFs exist intentionally across iterations, it is a feature, to allow you to keep state. To Figure out when an iteration starts and ends, you can use a RichFunctions, which get calls to open() and close() for each iteration. On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <[hidden email]> wrote:
|
That sounds promising.
Yet, I have the problem that I need the candidates in different operators. While feeding them forward is probably easy, e.g., via broadcasts, feeding the candidates
“backwards” to the next iteration seems to be more of a problem. As I am only building a prototype, I might do this feed-back via some global variable, but that is very hacky. Is there some elegant way to do it? Maybe with
the distributed cache? From: [hidden email] [mailto:[hidden email]]
On Behalf Of Stephan Ewen UDFs exist intentionally across iterations, it is a feature, to allow you to keep state. To Figure out when an iteration starts and ends, you can use a RichFunctions, which get calls to open() and close() for each iteration. On Wed, Feb 11, 2015 at 10:40 AM, Kruse, Sebastian <[hidden email]> wrote: Thanks for your answers. I am trying to build an apriori-like algorithm to find key candidates in a relational dataset. I
was considering delta iterations, because the algorithm should maintain two datasets: a set of column combinations to be checked (as delta set) and a set of tuples which are still relevant to the next iteration (as work set). So, the general proceeding is
adapted from the popular frequent item set algorithm. I am now also thinking that delta iterations are not the right thing for me, also because of other
problems (only join and coGroup to be used on the solution set and “Error: Iterative task without a single iterative input.” whose cause is not obvious to me). @Alexander: Using an output within a bulk iteration leaves me with the following exception: Exception in thread "main" org.apache.flink.api.common.InvalidProgramException: A data set that is
part of an iteration was used as a sink or action. Did you forget to close the iteration? Do you have any experience/proposals how to incorporate your idea nevertheless? @Stefan: Are operators intentionally reused across iterations, i.e., is it an explicit feature or
is it likely to change in the future? Cheers, Sebastian From:
[hidden email] [mailto:[hidden email]]
On Behalf Of Stephan Ewen You can also use a bulk iteration and just keep the state yourself. Since the functions love across iterations, it is easily doable to just gather the state in a HashMap yourself. Use map(), or mapPartition(), a manual partition() call - that should do the
trick... Am 10.02.2015 21:44 schrieb "Alexander Alexandrov" <[hidden email]>: True. 2015-02-10 19:14 GMT+01:00 Vasiliki Kalavri <[hidden email]>: Hi, It's hard to tell without details about your algorithm, but what you're describing sounds to me like something you can use the workset for. -V. On Feb 10, 2015 6:54 PM, "Alexander Alexandrov" <[hidden email]> wrote: I am not sure whether this is supported at the moment. The only workaround I could think of is indeed to use a boolean flag that indicates whether the element has been deleted or not. An alternative approach is to ditch Flink's native iteration construct and write your intermediate results to Tachyon or HDFS after each iteration using the TypeInfoInput/OutputFormats.
You then have full control how the old and the new solutions sets should be merged.
Regards, 2015-02-10 14:18 GMT+01:00 Kruse, Sebastian <[hidden email]>: Hi everyone, From playing around a bit around with delta iterations, I saw that you can update elements from the solution set and add new elements. My question is: is it possible to remove elements
from the solution set (apart from marking them as “deleted” somehow)? My use case at hand for this is the following: In each iteration, I generate candidate solutions that I want to verify within the next iteration. If verification fails, I would
like to remove them from the solution set, otherwise retain them. Thanks, Sebastian |
Free forum by Nabble | Edit this page |