TTL for State Entries / FLINK-3089

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

TTL for State Entries / FLINK-3089

Johannes Schulte
Hi,

I am trying to achieve a stream-to-stream join with big windows and are searching for a way to clean up state of old keys. I am already using a RichCoProcessFunction

I found there is already an existing ticket


but I have doubts that a registration of a timer for every incoming event is feasible as the timers seem to reside in an in-memory queue. 

The task is somewhat similar to the following blog post: http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink

Is the implementation of a custom window operator a necessity for achieving such functionality

Thanks a lot,

Johannes


Reply | Threaded
Open this post in threaded view
|

Re: TTL for State Entries / FLINK-3089

Ufuk Celebi
Looping in Aljoscha and Kostas who are the expert on this. :-)

On Mon, Mar 6, 2017 at 6:06 PM, Johannes Schulte
<[hidden email]> wrote:

> Hi,
>
> I am trying to achieve a stream-to-stream join with big windows and are
> searching for a way to clean up state of old keys. I am already using a
> RichCoProcessFunction
>
> I found there is already an existing ticket
>
> https://issues.apache.org/jira/browse/FLINK-3089
>
> but I have doubts that a registration of a timer for every incoming event is
> feasible as the timers seem to reside in an in-memory queue.
>
> The task is somewhat similar to the following blog post:
> http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink
>
> Is the implementation of a custom window operator a necessity for achieving
> such functionality
>
> Thanks a lot,
>
> Johannes
>
>
Reply | Threaded
Open this post in threaded view
|

Re: TTL for State Entries / FLINK-3089

Aljoscha Krettek
Hi Johannes,
I think what you can do is not register a timer for every event but for
every key, with a certain granularity. When that timer fires you check
what you want to clean up for that key and maybe register another timer
for the future. This way, the size of your timer state is bounded by
your key cardinality and I think people have used Flink with
timers/windows with key cardinalities of several 100 millions.

Best,
Aljoscha

On Wed, Mar 8, 2017, at 14:37, Ufuk Celebi wrote:

> Looping in Aljoscha and Kostas who are the expert on this. :-)
>
> On Mon, Mar 6, 2017 at 6:06 PM, Johannes Schulte
> <[hidden email]> wrote:
> > Hi,
> >
> > I am trying to achieve a stream-to-stream join with big windows and are
> > searching for a way to clean up state of old keys. I am already using a
> > RichCoProcessFunction
> >
> > I found there is already an existing ticket
> >
> > https://issues.apache.org/jira/browse/FLINK-3089
> >
> > but I have doubts that a registration of a timer for every incoming event is
> > feasible as the timers seem to reside in an in-memory queue.
> >
> > The task is somewhat similar to the following blog post:
> > http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink
> >
> > Is the implementation of a custom window operator a necessity for achieving
> > such functionality
> >
> > Thanks a lot,
> >
> > Johannes
> >
> >
Reply | Threaded
Open this post in threaded view
|

Re: TTL for State Entries / FLINK-3089

Johannes Schulte
Hey Aljoscha,

thank you for your reply. The amount and quality of response on this list are really great to see and a good way to learn.

I will try this and see how this works out.

Cheers,

Johannes

On Thu, Mar 9, 2017 at 3:55 PM, Aljoscha Krettek <[hidden email]> wrote:
Hi Johannes,
I think what you can do is not register a timer for every event but for
every key, with a certain granularity. When that timer fires you check
what you want to clean up for that key and maybe register another timer
for the future. This way, the size of your timer state is bounded by
your key cardinality and I think people have used Flink with
timers/windows with key cardinalities of several 100 millions.

Best,
Aljoscha

On Wed, Mar 8, 2017, at 14:37, Ufuk Celebi wrote:
> Looping in Aljoscha and Kostas who are the expert on this. :-)
>
> On Mon, Mar 6, 2017 at 6:06 PM, Johannes Schulte
> <[hidden email]> wrote:
> > Hi,
> >
> > I am trying to achieve a stream-to-stream join with big windows and are
> > searching for a way to clean up state of old keys. I am already using a
> > RichCoProcessFunction
> >
> > I found there is already an existing ticket
> >
> > https://issues.apache.org/jira/browse/FLINK-3089
> >
> > but I have doubts that a registration of a timer for every incoming event is
> > feasible as the timers seem to reside in an in-memory queue.
> >
> > The task is somewhat similar to the following blog post:
> > http://devblog.mediamath.com/real-time-streaming-attribution-using-apache-flink
> >
> > Is the implementation of a custom window operator a necessity for achieving
> > such functionality
> >
> > Thanks a lot,
> >
> > Johannes
> >
> >