Read once input data?
Posted by
Saliya Ekanayake on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Read-once-input-data-tp4932.html
Hi,
I see that an InputFormat's open() and nextRecord() methods get called for each terminal operation on a given dataset using that particular InputFormat. Is it possible to avoid this - possibly using some caching technique in Flink?
For example, I've some code like below and I see for both the last two statements (reduce() and count()) the above methods in the input format get called. Btw. this is a custom input format I wrote to represent a binary matrix stored as Short values.
ShortMatrixInputFormat smif = new ShortMatrixInputFormat();
DataSet<Short[]> ds = env.createInput(smif, BasicArrayTypeInfo.SHORT_ARRAY_TYPE_INFO);
MapOperator<Short[], DoubleStatistics> op = ds.map(...)
op.reduce(...)
op.count(...)
Thank you,
Saliya
--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center