Re: Reading from HBase problem
Posted by
Hilmi Yildirim on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Reading-from-HBase-problem-tp1545p1556.html
Hi,
Now I tested the "count" method. It returns the same result as the
flatmap.groupBy(0).sum(1) method.
Furthermore, the Hbase contains nearly 100 mio. rows but the result
is 102 mio.. This means that the HbaseInput reads more rows than the
HBase contains.
Best Regards,
Hilmi
Am 08.06.2015 um 23:29 schrieb Fabian
Hueske:
Hi Hilmi,
I see two possible reasons:
1) The data source / InputFormat is not properly
working, so not all HBase records are read/forwarded,
or
2) The aggregation / count is buggy
Roberts suggestion will use an alternative mechanism to do
the count. In fact, you can count with groupBy(0).sum()
and accumulators at the same time.
If both counts are the same, this will indicate that the
aggregation is correct and hint that the HBase format is
faulty.
In any case, it would be very good to know your findings.
Please keep us updated.
One more hint, if you want to do a full aggregate, you don't
have to use a "dummy" key like "a". Instead, you can work with
Tuple1<Long> and directly call sum(0) without doing the
groupBy().
Best, Fabian