Posted by
Sebastian Neef on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Cannot-write-record-to-fresh-sort-buffer-Record-too-large-tp13644p13754.html
Hi,
@Ted:
> Is it possible to prune (unneeded) field(s) so that heap requirement is
> lower ?
The XmlInputFormat [0] splits the raw data into smaller chunks, which
are then further processed. I don't think I can reduce the field's
(Tuple2<LongWritable, Text>) sizes. The major difference to Mahout's
XmlInputFormat is the compressed file support, which does not seem to
exist [1].
@Stephan, @Kurt
> - You would rather need MORE managed memory, not less, because the sorter
> uses that.
> I think the only way is adding more managed memor
Ah, okay. Seems like I misunderstood it, but tested with up to 0.8 of a
46 GB RAM allocation anyway. Does that mean, that I have to scale the
amount of RAM proportionally to the dataset's size in this case? I'd
expected Flink to start caching and slowing down?
> - We added the "large record handler" to the sorter for exactly these use
> cases.
Okay, so spilling to disk is theoretically possible and the crashes
should not occur then?
> [...] it is thrown during the combing phase which only uses an in-memory sorter, which doesn't have large record handle mechanism.
Are there ways to circumvent this restriction (sorting step?) or
otherwise optimize the process?
> Can you check in the code whether it is enabled? You'll have to go
> through a bit of the code to see that.
Although, I'm not deeply involved with Flink's internal sourcecode, I'll
try my best to figure that out.
Thanks,
Sebastian
[0]
http://paste.gehaxelt.in/?336f8247fa50171e#DSH0poFcVIR29X7lb98qRhUG/jrkKkUrfkUs7ECSyeE=[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Reading-compressed-XML-data-td10985.html