Could watermark could be took into consideration after the channel become active from idle at once?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Could watermark could be took into consideration after the channel become active from idle at once?

JING ZHANG
Hi community,
Now after a channel become active from idle, the watermark on this
channel would not be took into account when align watermarks util it
generates a watermark equals to or bigger than last emitted watermark.
It makes sense because it could prevent the newly active task resumed
from idle dragging down the entire job.

However, if the newly active task generate watermarks which are
smaller than but very closely to the last emitted one for a long time,
it could not be took into consideration in watermark alignment which
may lead to its data maybe dropped at the later window operator.

Is there any way to solve this problem? Or could we add a
configuration to define different watermark aligned behavior?

Any suggestion is appreciated. Thanks a lot.

Regards
JING
Reply | Threaded
Open this post in threaded view
|

Re: Could watermark could be took into consideration after the channel become active from idle at once?

Roman Khachatryan
Hi,

AFAIK, this behavior is not configurable.
However, for this to happen the channel must consistently generate
watermarks smaller than watermarks from ALL aligned channels (and its
elements must have a smaller timestamp). I'm not sure how likely it
is. Is it something you see in production?

Regards,
Roman

On Thu, May 20, 2021 at 4:11 PM 张静 <[hidden email]> wrote:

>
> Hi community,
> Now after a channel become active from idle, the watermark on this
> channel would not be took into account when align watermarks util it
> generates a watermark equals to or bigger than last emitted watermark.
> It makes sense because it could prevent the newly active task resumed
> from idle dragging down the entire job.
>
> However, if the newly active task generate watermarks which are
> smaller than but very closely to the last emitted one for a long time,
> it could not be took into consideration in watermark alignment which
> may lead to its data maybe dropped at the later window operator.
>
> Is there any way to solve this problem? Or could we add a
> configuration to define different watermark aligned behavior?
>
> Any suggestion is appreciated. Thanks a lot.
>
> Regards
> JING
Reply | Threaded
Open this post in threaded view
|

Re: Could watermark could be took into consideration after the channel become active from idle at once?

JING ZHANG
Hi, roman
   Thanks for reply very much.
   In our case, we see some data was dropped in window operator. We
found the root cause by adding a temporary metric about number of
aligned channels, found an active channel resumed from Idle was not
took into account for some time (not alway btw). We could walk around
by allow lateness on window operator, however I wonder does it make
sense to add a configuration to define different watermark aligned
behavior?

Regards
JING

Roman Khachatryan <[hidden email]> 于2021年5月21日周五 上午1:05写道:

>
> Hi,
>
> AFAIK, this behavior is not configurable.
> However, for this to happen the channel must consistently generate
> watermarks smaller than watermarks from ALL aligned channels (and its
> elements must have a smaller timestamp). I'm not sure how likely it
> is. Is it something you see in production?
>
> Regards,
> Roman
>
> On Thu, May 20, 2021 at 4:11 PM 张静 <[hidden email]> wrote:
> >
> > Hi community,
> > Now after a channel become active from idle, the watermark on this
> > channel would not be took into account when align watermarks util it
> > generates a watermark equals to or bigger than last emitted watermark.
> > It makes sense because it could prevent the newly active task resumed
> > from idle dragging down the entire job.
> >
> > However, if the newly active task generate watermarks which are
> > smaller than but very closely to the last emitted one for a long time,
> > it could not be took into consideration in watermark alignment which
> > may lead to its data maybe dropped at the later window operator.
> >
> > Is there any way to solve this problem? Or could we add a
> > configuration to define different watermark aligned behavior?
> >
> > Any suggestion is appreciated. Thanks a lot.
> >
> > Regards
> > JING
Reply | Threaded
Open this post in threaded view
|

Re: Could watermark could be took into consideration after the channel become active from idle at once?

liujiangang
We meet the same problem in our company. One stream always has data. The other stream is much smaller and can be idle. Once the smaller one becomes active, its data may be dropped in this case.

张静 [via Apache Flink User Mailing List archive.] <[hidden email]> 于2021年5月21日周五 上午10:24写道:
Hi, roman
   Thanks for reply very much.
   In our case, we see some data was dropped in window operator. We
found the root cause by adding a temporary metric about number of
aligned channels, found an active channel resumed from Idle was not
took into account for some time (not alway btw). We could walk around
by allow lateness on window operator, however I wonder does it make
sense to add a configuration to define different watermark aligned
behavior?

Regards
JING

Roman Khachatryan <[hidden email]> 于2021年5月21日周五 上午1:05写道:

>
> Hi,
>
> AFAIK, this behavior is not configurable.
> However, for this to happen the channel must consistently generate
> watermarks smaller than watermarks from ALL aligned channels (and its
> elements must have a smaller timestamp). I'm not sure how likely it
> is. Is it something you see in production?
>
> Regards,
> Roman
>
> On Thu, May 20, 2021 at 4:11 PM 张静 <[hidden email]> wrote:
> >
> > Hi community,
> > Now after a channel become active from idle, the watermark on this
> > channel would not be took into account when align watermarks util it
> > generates a watermark equals to or bigger than last emitted watermark.
> > It makes sense because it could prevent the newly active task resumed
> > from idle dragging down the entire job.
> >
> > However, if the newly active task generate watermarks which are
> > smaller than but very closely to the last emitted one for a long time,
> > it could not be took into consideration in watermark alignment which
> > may lead to its data maybe dropped at the later window operator.
> >
> > Is there any way to solve this problem? Or could we add a
> > configuration to define different watermark aligned behavior?
> >
> > Any suggestion is appreciated. Thanks a lot.
> >
> > Regards
> > JING



To start a new topic under Apache Flink User Mailing List archive., email [hidden email]
To unsubscribe from Apache Flink User Mailing List archive., click here.
NAML