Re: Please help, I need to bootstrap keyed state into a stream

Posted by Marco Villalobos-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Please-help-I-need-to-bootstrap-keyed-state-into-a-stream-tp37276p37308.html

Thank you. Your instruction was helpful in my solving this.

You can read about my solution at https://github.com/minmay/flink-patterns/tree/master/bootstrap-keyed-state-into-stream

On Aug 10, 2020, at 4:07 AM, orionemail <[hidden email]> wrote:

I recently was in the same situation as Marco, the docs do explain what you need to do, but without experience with Flink it might still not be obvious what you need to do.

What I did initially:

Setup the job to run in a 'write a save state' mode by implementing a command line switch I could use when running the job:

flink run somejob.jar -d /some/path

The code then when run with this switch ran *only* the required code to setup a version of state and write that to a savestate.

This worked and I was on my way.

However, I then decided to split this out into a new flink 'jar' with the sole purpose of creating a save state.  This is a cleaner approach in my case and also removes dependancies (my state was loaded from DynamoDB) that were only required in this one instance.

As rebuilding the state from this application is intended to only be done the once, with checkpoints/savestates the main approach going forward.

Just remember to name your Operators with the same ID/name to make sure it is compatible.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Monday, 10 August 2020 07:27, Tzu-Li Tai <[hidden email]> wrote:

Hi,

For the NullPointerException, what seems to be happening is that you are
setting NULL values in your MapState, that is not allowed by the API.

Otherwise, the code that you showed for bootstrapping state seems to be
fine.

I have yet to find a working example that shows how to do both
(bootstrapping state and start a streaming application with that state)

Not entirely sure what you mean here by "doing both".
The savepoint written using the State Processor API (what you are doing in
the bootstrap() method) is a savepoint that may be restored from as you
would with a typical Flink streaming job restore.
So, usually the bootstrapping part happens as a batch "offline" job, while
you keep your streaming job as a separate job. What are you trying to
achieve with having both written within the same job?

Cheers,
Gordon


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/