Lack of KeyedBroadcastStateBootstrapFunction

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Lack of KeyedBroadcastStateBootstrapFunction

Mark Niehe
Hey all,

I have another question about the State Processor API. I can't seem to find a way to create a KeyedBroadcastStateBootstrapFunction operator. The two options currently available to bootstrap a savepoint with state are KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because these are the only two options, it's not possible to bootstrap both keyed and broadcast state for the same operator. Are there any plans to add that functionality or did I miss it entirely when going through the API docs?

Thanks,
--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!
Reply | Threaded
Open this post in threaded view
|

Re: Lack of KeyedBroadcastStateBootstrapFunction

Dawid Wysakowicz-2

Hi,

I am not very familiar with the State Processor API, but from a brief look at it, I think you are right. I think the State Processor API does not support mixing different kinds of states in a single operator for now. At least not in a nice way. Probably you could implement the KeyedBroadcastStateBootstrapFunction yourself and us it with KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory). I understand this is probably not the easiest task.

I am not aware if there are plans to support that out of the box, but I cc'ed Gordon and Seth who if I remember correctly worked on that API. I hope they might give you some more insights.

Best,

Dawid

 On 23/03/2020 17:36, Mark Niehe wrote:
Hey all,

I have another question about the State Processor API. I can't seem to find a way to create a KeyedBroadcastStateBootstrapFunction operator. The two options currently available to bootstrap a savepoint with state are KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because these are the only two options, it's not possible to bootstrap both keyed and broadcast state for the same operator. Are there any plans to add that functionality or did I miss it entirely when going through the API docs?

Thanks,
--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Fwd: Lack of KeyedBroadcastStateBootstrapFunction

Tzu-Li (Gordon) Tai
It seems like Seth's reply didn't make it to the mailing lists somehow.
Forwarding his reply below:

---------- Forwarded message ---------
From: Seth Wiesman <[hidden email]>
Date: Thu, Mar 26, 2020 at 5:16 AM
Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction
To: Dawid Wysakowicz <[hidden email]>
Cc: <[hidden email]>, Tzu-Li (Gordon) Tai <[hidden email]>


As Dawid mentioned, you can implement your own operator using the transform method to do this yourself. Unfortunately, that is fairly low level and would require you to understand some flink amount internals.

The real problem is that the state processor api does not support two input operators. We originally skipped that because there were a number of open questions about how best to do it and it wasn't clear that it would be a necessary feature. Typically, flink users use two input operators to do some sort of join. And when bootstrapping state, you typically only want to pre-fill one side of that join. KeyedBroadcastState is clearly a good counter-argument to that.

I've opened a ticket for the feature if you would like to comment there.


On Tue, Mar 24, 2020 at 9:17 AM Dawid Wysakowicz <[hidden email]> wrote:

Hi,

I am not very familiar with the State Processor API, but from a brief look at it, I think you are right. I think the State Processor API does not support mixing different kinds of states in a single operator for now. At least not in a nice way. Probably you could implement the KeyedBroadcastStateBootstrapFunction yourself and us it with KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory). I understand this is probably not the easiest task.

I am not aware if there are plans to support that out of the box, but I cc'ed Gordon and Seth who if I remember correctly worked on that API. I hope they might give you some more insights.

Best,

Dawid

 On 23/03/2020 17:36, Mark Niehe wrote:
Hey all,

I have another question about the State Processor API. I can't seem to find a way to create a KeyedBroadcastStateBootstrapFunction operator. The two options currently available to bootstrap a savepoint with state are KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because these are the only two options, it's not possible to bootstrap both keyed and broadcast state for the same operator. Are there any plans to add that functionality or did I miss it entirely when going through the API docs?

Thanks,
--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!
Reply | Threaded
Open this post in threaded view
|

Re: Lack of KeyedBroadcastStateBootstrapFunction

Mark Niehe
Hi Gordan and Seth,

Thanks for explanation and opening up the ticket. I'll add some details in the ticket to explain what we're trying to do which will hopefully add some context.

--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!

On Mon, Mar 30, 2020 at 1:04 AM Tzu-Li (Gordon) Tai <[hidden email]> wrote:
It seems like Seth's reply didn't make it to the mailing lists somehow.
Forwarding his reply below:

---------- Forwarded message ---------
From: Seth Wiesman <[hidden email]>
Date: Thu, Mar 26, 2020 at 5:16 AM
Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction
To: Dawid Wysakowicz <[hidden email]>
Cc: <[hidden email]>, Tzu-Li (Gordon) Tai <[hidden email]>


As Dawid mentioned, you can implement your own operator using the transform method to do this yourself. Unfortunately, that is fairly low level and would require you to understand some flink amount internals.

The real problem is that the state processor api does not support two input operators. We originally skipped that because there were a number of open questions about how best to do it and it wasn't clear that it would be a necessary feature. Typically, flink users use two input operators to do some sort of join. And when bootstrapping state, you typically only want to pre-fill one side of that join. KeyedBroadcastState is clearly a good counter-argument to that.

I've opened a ticket for the feature if you would like to comment there.


On Tue, Mar 24, 2020 at 9:17 AM Dawid Wysakowicz <[hidden email]> wrote:

Hi,

I am not very familiar with the State Processor API, but from a brief look at it, I think you are right. I think the State Processor API does not support mixing different kinds of states in a single operator for now. At least not in a nice way. Probably you could implement the KeyedBroadcastStateBootstrapFunction yourself and us it with KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory). I understand this is probably not the easiest task.

I am not aware if there are plans to support that out of the box, but I cc'ed Gordon and Seth who if I remember correctly worked on that API. I hope they might give you some more insights.

Best,

Dawid

 On 23/03/2020 17:36, Mark Niehe wrote:
Hey all,

I have another question about the State Processor API. I can't seem to find a way to create a KeyedBroadcastStateBootstrapFunction operator. The two options currently available to bootstrap a savepoint with state are KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because these are the only two options, it's not possible to bootstrap both keyed and broadcast state for the same operator. Are there any plans to add that functionality or did I miss it entirely when going through the API docs?

Thanks,
--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!
Reply | Threaded
Open this post in threaded view
|

Re: Lack of KeyedBroadcastStateBootstrapFunction

Tzu-Li (Gordon) Tai
Thanks! Looking forward to that.

On Tue, Mar 31, 2020 at 1:02 AM Mark Niehe <[hidden email]> wrote:
Hi Gordan and Seth,

Thanks for explanation and opening up the ticket. I'll add some details in the ticket to explain what we're trying to do which will hopefully add some context.

--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!

On Mon, Mar 30, 2020 at 1:04 AM Tzu-Li (Gordon) Tai <[hidden email]> wrote:
It seems like Seth's reply didn't make it to the mailing lists somehow.
Forwarding his reply below:

---------- Forwarded message ---------
From: Seth Wiesman <[hidden email]>
Date: Thu, Mar 26, 2020 at 5:16 AM
Subject: Re: Lack of KeyedBroadcastStateBootstrapFunction
To: Dawid Wysakowicz <[hidden email]>
Cc: <[hidden email]>, Tzu-Li (Gordon) Tai <[hidden email]>


As Dawid mentioned, you can implement your own operator using the transform method to do this yourself. Unfortunately, that is fairly low level and would require you to understand some flink amount internals.

The real problem is that the state processor api does not support two input operators. We originally skipped that because there were a number of open questions about how best to do it and it wasn't clear that it would be a necessary feature. Typically, flink users use two input operators to do some sort of join. And when bootstrapping state, you typically only want to pre-fill one side of that join. KeyedBroadcastState is clearly a good counter-argument to that.

I've opened a ticket for the feature if you would like to comment there.


On Tue, Mar 24, 2020 at 9:17 AM Dawid Wysakowicz <[hidden email]> wrote:

Hi,

I am not very familiar with the State Processor API, but from a brief look at it, I think you are right. I think the State Processor API does not support mixing different kinds of states in a single operator for now. At least not in a nice way. Probably you could implement the KeyedBroadcastStateBootstrapFunction yourself and us it with KeyedOperatorTransformation#transform(org.apache.flink.state.api.SavepointWriterOperatorFactory). I understand this is probably not the easiest task.

I am not aware if there are plans to support that out of the box, but I cc'ed Gordon and Seth who if I remember correctly worked on that API. I hope they might give you some more insights.

Best,

Dawid

 On 23/03/2020 17:36, Mark Niehe wrote:
Hey all,

I have another question about the State Processor API. I can't seem to find a way to create a KeyedBroadcastStateBootstrapFunction operator. The two options currently available to bootstrap a savepoint with state are KeyedStateBootstrapFunction and BroadcastStateBootstrapFunction. Because these are the only two options, it's not possible to bootstrap both keyed and broadcast state for the same operator. Are there any plans to add that functionality or did I miss it entirely when going through the API docs?

Thanks,
--
Mark Niehe ·  Software Engineer
Integrations  ·  Blog  ·  We're Hiring!