Re: Please help, I need to bootstrap keyed state into a stream
Posted by
Marco Villalobos-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Please-help-I-need-to-bootstrap-keyed-state-into-a-stream-tp37276p37374.html
Hi Seth,
Thank you for the advice. The solution you mentioned is exactly what I did.
I wrote a small tutorial that explains how to repeat that pattern.
Regarding the NullPointerException when running locally, thank you for filing a ticket. It would be very nice to get that fixed.
Sincerely,
Marco A. Villalobos
Just to summarize the conversation so far:
The state processor api reads data from a 3rd party system - such as JDBC in this example - and generates a savepoint file that is written out to some DFS. This savepoint can then be used to when starting a flink streaming application. It is a two-step process, creating the savepoint in one job and then starting a streaming application from that savepoint in another.
These jobs do not have to be a single application, and in general, I recommend they be developed as two separate jobs. The reason being, bootstrapping state is a one-time process while your streaming application runs forever. It will simplify your development and operations in the long term if you do not mix concerns.
Concerning the NullPointerException:
The max parallelism must be at least 128. I've opened a ticket to track and resolve this issue.
Seth
On Mon, Aug 10, 2020 at 6:38 PM Marco Villalobos <
[hidden email]> wrote:
I think there is a bug in Flink when running locally without a cluster.
My code worked in a cluster, but failed when run locally.
My code does not save null values in Map State.
> On Aug 9, 2020, at 11:27 PM, Tzu-Li Tai <[hidden email]> wrote:
>
> Hi,
>
> For the NullPointerException, what seems to be happening is that you are
> setting NULL values in your MapState, that is not allowed by the API.
>
> Otherwise, the code that you showed for bootstrapping state seems to be
> fine.
>
>> I have yet to find a working example that shows how to do both
>> (bootstrapping state and start a streaming application with that state)
>
> Not entirely sure what you mean here by "doing both".
> The savepoint written using the State Processor API (what you are doing in
> the bootstrap() method) is a savepoint that may be restored from as you
> would with a typical Flink streaming job restore.
> So, usually the bootstrapping part happens as a batch "offline" job, while
> you keep your streaming job as a separate job. What are you trying to
> achieve with having both written within the same job?
>
> Cheers,
> Gordon
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/