Please advise bootstrapping large state
Posted by
Marco Villalobos-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Please-advise-bootstrapping-large-state-tp44460.html
I must bootstrap state from postgres (approximately 200 GB of data) and I notice that the state processor API requires the DataSet API in order to bootstrap state for the Stream API.
I wish there was a way to use the SQL API and use a partitioned scan, but I don't know if that is even possible with the DataSet API.
I never used the DataSet API, and I am unsure how it manages memory, or distributes load, when handling large state.
Would it run out of memory if I map data from a JDBCInputFormat into a large DataSet and then use that to bootstrap state for my stream job?
Any advice on how I should proceed with this would be greatly appreciated.
Thank you.