Hi,
I am trying to understand what's the intermediate caching support in Flink. For example, when there's an iterative dataset what's being cached between iterations. Is there some documentation on this? Thank you, Saliya Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
Hey Saliya,
the result of each iteration (super step) that is fed back to the iteration is cached. For the iterate operator that is the last partial solution and for the delta iterate operator it's the current solution set (https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/iterations.html). Internally, this works via custom iteration operator implementations for head and tail tasks, which are co-located and share a hash table. I think that the internals of this are not documented, you would have to look into the code for this. Most of the relevant implementations are found in the "org.apache.flink.runtime.iterative.task" package. Hope this helps... Ufuk On Sun, Jul 17, 2016 at 9:36 PM, Saliya Ekanayake <[hidden email]> wrote: > Hi, > > I am trying to understand what's the intermediate caching support in Flink. > For example, when there's an iterative dataset what's being cached between > iterations. Is there some documentation on this? > > Thank you, > Saliya > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > |
PS: I forgot to mention that also constant iteration input is cached.
On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi <[hidden email]> wrote: > Hey Saliya, > > the result of each iteration (super step) that is fed back to the > iteration is cached. For the iterate operator that is the last partial > solution and for the delta iterate operator it's the current solution > set (https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/iterations.html). > > Internally, this works via custom iteration operator implementations > for head and tail tasks, which are co-located and share a hash table. > I think that the internals of this are not documented, you would have > to look into the code for this. Most of the relevant implementations > are found in the "org.apache.flink.runtime.iterative.task" package. > > Hope this helps... > > Ufuk > > > On Sun, Jul 17, 2016 at 9:36 PM, Saliya Ekanayake <[hidden email]> wrote: >> Hi, >> >> I am trying to understand what's the intermediate caching support in Flink. >> For example, when there's an iterative dataset what's being cached between >> iterations. Is there some documentation on this? >> >> Thank you, >> Saliya >> >> -- >> Saliya Ekanayake >> Ph.D. Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> |
Thank you, Ufuk! On Tue, Jul 19, 2016 at 5:51 AM, Ufuk Celebi <[hidden email]> wrote: PS: I forgot to mention that also constant iteration input is cached. Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
Free forum by Nabble | Edit this page |