Re: Please help, I need to bootstrap keyed state into a stream

Posted by Marco Villalobos-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Please-help-I-need-to-bootstrap-keyed-state-into-a-stream-tp37276p37374.html

Hi Seth,

Thank you for the advice. The solution you mentioned is exactly what I did.

I wrote a small tutorial that explains how to repeat that pattern.

You can read about my solution at https://github.com/minmay/flink-patterns/tree/master/bootstrap-keyed-state-into-stream

Regarding the NullPointerException when running locally, thank you for filing a ticket. It would be very nice to get that fixed.

Sincerely, 

Marco A. Villalobos



On Aug 12, 2020, at 9:40 AM, Seth Wiesman <[hidden email]> wrote:

Just to summarize the conversation so far:

The state processor api reads data from a 3rd party system - such as JDBC in this example - and generates a savepoint file that is written out to some DFS.  This savepoint can then be used to when starting a flink streaming application. It is a two-step process, creating the savepoint in one job and then starting a streaming application from that savepoint in another.

These jobs do not have to be a single application, and in general, I recommend they be developed as two separate jobs. The reason being, bootstrapping state is a one-time process while your streaming application runs forever. It will simplify your development and operations in the long term if you do not mix concerns.

Concerning the NullPointerException:

The max parallelism must be at least 128. I've opened a ticket to track and resolve this issue.

Seth

On Mon, Aug 10, 2020 at 6:38 PM Marco Villalobos <[hidden email]> wrote:
I think there is a bug in Flink when running locally without a cluster.

My code worked in a cluster, but failed when run locally.

My code does not save null values in Map State.

> On Aug 9, 2020, at 11:27 PM, Tzu-Li Tai <[hidden email]> wrote:
>
> Hi,
>
> For the NullPointerException, what seems to be happening is that you are
> setting NULL values in your MapState, that is not allowed by the API.
>
> Otherwise, the code that you showed for bootstrapping state seems to be
> fine.
>
>> I have yet to find a working example that shows how to do both
>> (bootstrapping state and start a streaming application with that state)
>
> Not entirely sure what you mean here by "doing both".
> The savepoint written using the State Processor API (what you are doing in
> the bootstrap() method) is a savepoint that may be restored from as you
> would with a typical Flink streaming job restore.
> So, usually the bootstrapping part happens as a batch "offline" job, while
> you keep your streaming job as a separate job. What are you trying to
> achieve with having both written within the same job?
>
> Cheers,
> Gordon
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/