Intermediate Data Caching

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Intermediate Data Caching

Saliya Ekanayake
Hi,

I am trying to understand what's the intermediate caching support in Flink. For example, when there's an iterative dataset what's being cached between iterations. Is there some documentation on this?

Thank you,
Saliya

--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington

Reply | Threaded
Open this post in threaded view
|

Re: Intermediate Data Caching

Ufuk Celebi
Hey Saliya,

the result of each iteration (super step) that is fed back to the
iteration is cached. For the iterate operator that is the last partial
solution and for the delta iterate operator it's the current solution
set (https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/iterations.html).

Internally, this works via custom iteration operator implementations
for head and tail tasks, which are co-located and share a hash table.
I think that the internals of this are not documented, you would have
to look into the code for this. Most of the relevant implementations
are found in the "org.apache.flink.runtime.iterative.task" package.

Hope this helps...

Ufuk


On Sun, Jul 17, 2016 at 9:36 PM, Saliya Ekanayake <[hidden email]> wrote:

> Hi,
>
> I am trying to understand what's the intermediate caching support in Flink.
> For example, when there's an iterative dataset what's being cached between
> iterations. Is there some documentation on this?
>
> Thank you,
> Saliya
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>
Reply | Threaded
Open this post in threaded view
|

Re: Intermediate Data Caching

Ufuk Celebi
PS: I forgot to mention that also constant iteration input is cached.

On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi <[hidden email]> wrote:

> Hey Saliya,
>
> the result of each iteration (super step) that is fed back to the
> iteration is cached. For the iterate operator that is the last partial
> solution and for the delta iterate operator it's the current solution
> set (https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/iterations.html).
>
> Internally, this works via custom iteration operator implementations
> for head and tail tasks, which are co-located and share a hash table.
> I think that the internals of this are not documented, you would have
> to look into the code for this. Most of the relevant implementations
> are found in the "org.apache.flink.runtime.iterative.task" package.
>
> Hope this helps...
>
> Ufuk
>
>
> On Sun, Jul 17, 2016 at 9:36 PM, Saliya Ekanayake <[hidden email]> wrote:
>> Hi,
>>
>> I am trying to understand what's the intermediate caching support in Flink.
>> For example, when there's an iterative dataset what's being cached between
>> iterations. Is there some documentation on this?
>>
>> Thank you,
>> Saliya
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>
Reply | Threaded
Open this post in threaded view
|

Re: Intermediate Data Caching

Saliya Ekanayake
Thank you, Ufuk!

On Tue, Jul 19, 2016 at 5:51 AM, Ufuk Celebi <[hidden email]> wrote:
PS: I forgot to mention that also constant iteration input is cached.

On Mon, Jul 18, 2016 at 11:27 AM, Ufuk Celebi <[hidden email]> wrote:
> Hey Saliya,
>
> the result of each iteration (super step) that is fed back to the
> iteration is cached. For the iterate operator that is the last partial
> solution and for the delta iterate operator it's the current solution
> set (https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/batch/iterations.html).
>
> Internally, this works via custom iteration operator implementations
> for head and tail tasks, which are co-located and share a hash table.
> I think that the internals of this are not documented, you would have
> to look into the code for this. Most of the relevant implementations
> are found in the "org.apache.flink.runtime.iterative.task" package.
>
> Hope this helps...
>
> Ufuk
>
>
> On Sun, Jul 17, 2016 at 9:36 PM, Saliya Ekanayake <[hidden email]> wrote:
>> Hi,
>>
>> I am trying to understand what's the intermediate caching support in Flink.
>> For example, when there's an iterative dataset what's being cached between
>> iterations. Is there some documentation on this?
>>
>> Thank you,
>> Saliya
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>



--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington