(DEPRECATED) Apache Flink User Mailing List archive.

Please advise bootstrapping large state

Posted by Marco Villalobos-2 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Please-advise-bootstrapping-large-state-tp44460.html

I must bootstrap state from postgres (approximately 200 GB of data) and I notice that the state processor API requires the DataSet API in order to bootstrap state for the Stream API.

I wish there was a way to use the SQL API and use a partitioned scan, but I don't know if that is even possible with the DataSet API.

I never used the DataSet API, and I am unsure how it manages memory, or distributes load, when handling large state.

Would it run out of memory if I map data from a JDBCInputFormat into a large DataSet and then use that to bootstrap state for my stream job?

Any advice on how I should proceed with this would be greatly appreciated.

Thank you.