multi-sql checkpoint fail

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

multi-sql checkpoint fail

forideal

Hello friend

I have two SQL, checkpoint fails all the time. One task is to open a sliding window for an hour, and then another task consumes the output data of the previous task. There will be no problem with the two tasks submitted separately.
-- first Calculation
-- second Write the calculation to redis

-- first
insert into
  dw_access_log
select
  time_key,
  query_nor,
  query_nor_counter,
  '1' as group_key
from(
    select
      HOP_START(
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      ) as time_key,
      query_nor,
      count(1) as query_nor_counter
    from(
        select
          RED_JSON_VALUE(request, '$.query_nor') as query_nor,
          RED_JSON_VALUE(request, '$.target') as target,
          event_time_fake
        from
          (
            select
              red_pb_parser(body, 'request') as request,
              event_time_fake
            from
              access_log_source
          )
      )
    group by
      query_nor,
      HOP(   -- sliding window size one hour, step one minute
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      )
  )
where
  query_nor_counter > 100;

-- second
insert into
  dw_sink_access_log
select
  'fix_key' as `key`,
  get_json_value(query_nor, query_nor_counter) as `value` -- agg_func
from
  dw_access_log
group by
  tumble (time_key_fake, interval '1' MINUTE),
  group_key
Picture Link:

Best, forideal




 

Reply | Threaded
Open this post in threaded view
|

Re: multi-sql checkpoint fail

tison
Hi,

Could you share the stack traces?

Best,
tison.


forideal <[hidden email]> 于2020年4月18日周六 上午12:33写道:

Hello friend

I have two SQL, checkpoint fails all the time. One task is to open a sliding window for an hour, and then another task consumes the output data of the previous task. There will be no problem with the two tasks submitted separately.
-- first Calculation
-- second Write the calculation to redis

-- first
insert into
  dw_access_log
select
  time_key,
  query_nor,
  query_nor_counter,
  '1' as group_key
from(
    select
      HOP_START(
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      ) as time_key,
      query_nor,
      count(1) as query_nor_counter
    from(
        select
          RED_JSON_VALUE(request, '$.query_nor') as query_nor,
          RED_JSON_VALUE(request, '$.target') as target,
          event_time_fake
        from
          (
            select
              red_pb_parser(body, 'request') as request,
              event_time_fake
            from
              access_log_source
          )
      )
    group by
      query_nor,
      HOP(   -- sliding window size one hour, step one minute
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      )
  )
where
  query_nor_counter > 100;

-- second
insert into
  dw_sink_access_log
select
  'fix_key' as `key`,
  get_json_value(query_nor, query_nor_counter) as `value` -- agg_func
from
  dw_access_log
group by
  tumble (time_key_fake, interval '1' MINUTE),
  group_key
Picture Link:

Best, forideal




 

Reply | Threaded
Open this post in threaded view
|

Re: multi-sql checkpoint fail

Jark Wu-3
Hi,

What's the statebackend are you using? Is it Heap statebackend?

Best,
Jark

On Sat, 18 Apr 2020 at 07:06, tison <[hidden email]> wrote:
Hi,

Could you share the stack traces?

Best,
tison.


forideal <[hidden email]> 于2020年4月18日周六 上午12:33写道:

Hello friend

I have two SQL, checkpoint fails all the time. One task is to open a sliding window for an hour, and then another task consumes the output data of the previous task. There will be no problem with the two tasks submitted separately.
-- first Calculation
-- second Write the calculation to redis

-- first
insert into
  dw_access_log
select
  time_key,
  query_nor,
  query_nor_counter,
  '1' as group_key
from(
    select
      HOP_START(
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      ) as time_key,
      query_nor,
      count(1) as query_nor_counter
    from(
        select
          RED_JSON_VALUE(request, '$.query_nor') as query_nor,
          RED_JSON_VALUE(request, '$.target') as target,
          event_time_fake
        from
          (
            select
              red_pb_parser(body, 'request') as request,
              event_time_fake
            from
              access_log_source
          )
      )
    group by
      query_nor,
      HOP(   -- sliding window size one hour, step one minute
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      )
  )
where
  query_nor_counter > 100;

-- second
insert into
  dw_sink_access_log
select
  'fix_key' as `key`,
  get_json_value(query_nor, query_nor_counter) as `value` -- agg_func
from
  dw_access_log
group by
  tumble (time_key_fake, interval '1' MINUTE),
  group_key
Picture Link:

Best, forideal




 

Reply | Threaded
Open this post in threaded view
|

Re:Re: multi-sql checkpoint fail

forideal
Hi Tison, Jark Wu:

   Thanks for your reply !!!

   What's the statebackend are you using? Is it Heap statebackend?
   
    rocksdb backend uses incremental checkpoint.

   Could you share the stack traces?
     I looked at the flame chart myself and found that it was stuck at the end of the window where the calculation took place.
 
   I found that checkpoint failed because of watermark. For the same operator, the difference between max watermark and min watermark is 30 minutes (my checkpoint interval is 10 minutes). This may be caused by the slow calculation of windows.


Best forideal




At 2020-04-18 21:51:13, "Jark Wu" <[hidden email]> wrote:

Hi,

What's the statebackend are you using? Is it Heap statebackend?

Best,
Jark

On Sat, 18 Apr 2020 at 07:06, tison <[hidden email]> wrote:
Hi,

Could you share the stack traces?

Best,
tison.


forideal <[hidden email]> 于2020年4月18日周六 上午12:33写道:

Hello friend

I have two SQL, checkpoint fails all the time. One task is to open a sliding window for an hour, and then another task consumes the output data of the previous task. There will be no problem with the two tasks submitted separately.
-- first Calculation
-- second Write the calculation to redis

-- first
insert into
  dw_access_log
select
  time_key,
  query_nor,
  query_nor_counter,
  '1' as group_key
from(
    select
      HOP_START(
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      ) as time_key,
      query_nor,
      count(1) as query_nor_counter
    from(
        select
          RED_JSON_VALUE(request, '$.query_nor') as query_nor,
          RED_JSON_VALUE(request, '$.target') as target,
          event_time_fake
        from
          (
            select
              red_pb_parser(body, 'request') as request,
              event_time_fake
            from
              access_log_source
          )
      )
    group by
      query_nor,
      HOP(   -- sliding window size one hour, step one minute
        event_time_fake,
        interval '1' MINUTE,
        interval '60' MINUTE
      )
  )
where
  query_nor_counter > 100;

-- second
insert into
  dw_sink_access_log
select
  'fix_key' as `key`,
  get_json_value(query_nor, query_nor_counter) as `value` -- agg_func
from
  dw_access_log
group by
  tumble (time_key_fake, interval '1' MINUTE),
  group_key
Picture Link:

Best, forideal