[DISCUSS] Improving Trigger/Window API and Semantics

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[DISCUSS] Improving Trigger/Window API and Semantics

Aljoscha Krettek
Hi,
I’m also sending this to @user because the Trigger API concerns users directly.

There are some things in the Trigger API that I think require some improvements. The issues are trigger testability, fire semantics and composite triggers and lateness. I started a document to keep track of things (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing). Please read it if you are interested and want to get involved in this. We’ll evolve the document together and come up with Jira issues for the subtasks.

Cheers,
Aljoscha
Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Improving Trigger/Window API and Semantics

Aljoscha Krettek
Hi,
my previous message might be a bit hard to parse for people that are not very deep into the Trigger implementation. So I’ll try to give a bit more explanation right in the mail.

The basic idea is that we observed some basic problems that keep coming up for people on the mailing lists and I want to try and address them.

The first problem is with the Trigger semantics and the confusion between triggers that purge the window contents and those that don’t. (For example, using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad idea because state will be kept indefinitely.) While working on this we should also tacke the issue of providing composite triggers such as Repeatedly (fires a child-trigger repeatedly), Any (fires when any child trigger fires) and All (fires when all child triggers fire).

Lateness. Right now, it is possible to write custom triggers that can deal with late elements and can even behave differently based on the amount of lateness. There is, however, no API for dealing with lateness. We should address this.

The third issue is Trigger testability. We should introduce a testing harness for triggers and move the processing time triggers to use a clock provider instead of directly using System.currentTimeMillis(). This will allow testing them deterministically.

All of these are expanded upon in the document I linked to before: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing I think all of this is very important for people working on event-time based pipelines.

Feedback is very welcome and I hope that we can expand the document together and come up with good solutions.

Cheers,
Aljoscha
> On 21 Mar 2016, at 17:46, Aljoscha Krettek <[hidden email]> wrote:
>
> Hi,
> I’m also sending this to @user because the Trigger API concerns users directly.
>
> There are some things in the Trigger API that I think require some improvements. The issues are trigger testability, fire semantics and composite triggers and lateness. I started a document to keep track of things (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing). Please read it if you are interested and want to get involved in this. We’ll evolve the document together and come up with Jira issues for the subtasks.
>
> Cheers,
> Aljoscha

Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Improving Trigger/Window API and Semantics

Fabian Hueske-2
Thanks for the write-up Aljoscha.
I think it is a really good idea to separate the different aspects (fire, purging, lateness) a bit. At the moment, all of these need to be handled in the Trigger and a custom trigger is necessary whenever, you want some of these aspects slightly differently handled. This makes the Trigger interface and implementations of it really hard to understand.

+1 for the suggested changes.
Are there plans to touch the Evictor interface as well? IMO, this needs a redesign as well.

Fabian

2016-03-21 19:21 GMT+01:00 Aljoscha Krettek <[hidden email]>:
Hi,
my previous message might be a bit hard to parse for people that are not very deep into the Trigger implementation. So I’ll try to give a bit more explanation right in the mail.

The basic idea is that we observed some basic problems that keep coming up for people on the mailing lists and I want to try and address them.

The first problem is with the Trigger semantics and the confusion between triggers that purge the window contents and those that don’t. (For example, using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad idea because state will be kept indefinitely.) While working on this we should also tacke the issue of providing composite triggers such as Repeatedly (fires a child-trigger repeatedly), Any (fires when any child trigger fires) and All (fires when all child triggers fire).

Lateness. Right now, it is possible to write custom triggers that can deal with late elements and can even behave differently based on the amount of lateness. There is, however, no API for dealing with lateness. We should address this.

The third issue is Trigger testability. We should introduce a testing harness for triggers and move the processing time triggers to use a clock provider instead of directly using System.currentTimeMillis(). This will allow testing them deterministically.

All of these are expanded upon in the document I linked to before: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing I think all of this is very important for people working on event-time based pipelines.

Feedback is very welcome and I hope that we can expand the document together and come up with good solutions.

Cheers,
Aljoscha
> On 21 Mar 2016, at 17:46, Aljoscha Krettek <[hidden email]> wrote:
>
> Hi,
> I’m also sending this to @user because the Trigger API concerns users directly.
>
> There are some things in the Trigger API that I think require some improvements. The issues are trigger testability, fire semantics and composite triggers and lateness. I started a document to keep track of things (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing). Please read it if you are interested and want to get involved in this. We’ll evolve the document together and come up with Jira issues for the subtasks.
>
> Cheers,
> Aljoscha


Reply | Threaded
Open this post in threaded view
|

Re: [DISCUSS] Improving Trigger/Window API and Semantics

Aljoscha Krettek
Hi,
I have some thoughts about Evictors as well yes, but I didn’t yet write them down. The basic idea about them is this:

class Evictor {
   Predicate getPredicate(Iterable<StreamRecord<T>> elements, int size, W window);
}

class Predicate {
  boolean evict(StreamRecord<T> element);
}

The evictor will return a predicate that is evaluated on every element in the buffer to decide whether we should keep it or not. The predicate can keep internal state. So with the size it gets in getPredicate() it can do count based eviction (just evict elements until you reach your desired quota). We can also do eviction based on event-time which was not possible before because you could only evict from the start of the buffer. What do you think?

Cheers,
Aljoscha

> On 22 Mar 2016, at 09:24, Fabian Hueske <[hidden email]> wrote:
>
> Thanks for the write-up Aljoscha.
> I think it is a really good idea to separate the different aspects (fire, purging, lateness) a bit. At the moment, all of these need to be handled in the Trigger and a custom trigger is necessary whenever, you want some of these aspects slightly differently handled. This makes the Trigger interface and implementations of it really hard to understand.
>
> +1 for the suggested changes.
> Are there plans to touch the Evictor interface as well? IMO, this needs a redesign as well.
>
> Fabian
>
> 2016-03-21 19:21 GMT+01:00 Aljoscha Krettek <[hidden email]>:
> Hi,
> my previous message might be a bit hard to parse for people that are not very deep into the Trigger implementation. So I’ll try to give a bit more explanation right in the mail.
>
> The basic idea is that we observed some basic problems that keep coming up for people on the mailing lists and I want to try and address them.
>
> The first problem is with the Trigger semantics and the confusion between triggers that purge the window contents and those that don’t. (For example, using a ContinuousEventTimeTrigger with EventTimeWindows assigner is a bad idea because state will be kept indefinitely.) While working on this we should also tacke the issue of providing composite triggers such as Repeatedly (fires a child-trigger repeatedly), Any (fires when any child trigger fires) and All (fires when all child triggers fire).
>
> Lateness. Right now, it is possible to write custom triggers that can deal with late elements and can even behave differently based on the amount of lateness. There is, however, no API for dealing with lateness. We should address this.
>
> The third issue is Trigger testability. We should introduce a testing harness for triggers and move the processing time triggers to use a clock provider instead of directly using System.currentTimeMillis(). This will allow testing them deterministically.
>
> All of these are expanded upon in the document I linked to before: https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing I think all of this is very important for people working on event-time based pipelines.
>
> Feedback is very welcome and I hope that we can expand the document together and come up with good solutions.
>
> Cheers,
> Aljoscha
> > On 21 Mar 2016, at 17:46, Aljoscha Krettek <[hidden email]> wrote:
> >
> > Hi,
> > I’m also sending this to @user because the Trigger API concerns users directly.
> >
> > There are some things in the Trigger API that I think require some improvements. The issues are trigger testability, fire semantics and composite triggers and lateness. I started a document to keep track of things (https://docs.google.com/document/d/1Xp-YBf87vLTduYSivgqWVEMjYUmkA-hyb4muX3KRl08/edit?usp=sharing). Please read it if you are interested and want to get involved in this. We’ll evolve the document together and come up with Jira issues for the subtasks.
> >
> > Cheers,
> > Aljoscha
>
>