In case anyone is interested, I wrote a blog on how to analyze graphs stored in HBase with Apache Flink Gelly:
|
Interesting blog. From your experience, is there anything on hbase side which you see room for improvement ? Which hbase release are you using ? Cheers On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <[hidden email]> wrote:
|
One thing I really appreciate about HBase is its flexibility. It doesn't enforce a schema, but also doesn't prevent you from building a schema layer on top. It is very customizable, allowing you to push arbitrary code to the server in the form of filters and coprocessors. Not having such higher-layer features built into HBase allows it to remain flexibile, but it does have a down-side. One complaint is that for a new user coming to HBase, who perhaps does want to work with things like query languages, schemas, secondary indices, transactions, and so forth, it can be daunting to research and understand what other projects in the HBase ecosystem can help him/her, how others have used such projects, and under what use cases each project might be successful or not. Perhaps a good start would be something like an "HBase ecosystem" page at the website that would list projects like Phoenix, Tephra, and others in the HBase ecosystem. The Apache TinkerPop site has a listing of projects in its ecosystem at http://tinkerpop.apache.org. I think new users coming to HBase aren't even aware of the larger ecosystem, and sometimes end up selecting alternative data stores as a result. P.S. I'm using HBase 1.1.2 On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <[hidden email]> wrote:
|
Also Google Cloud Bigtable has such a page at https://cloud.google.com/bigtable/docs/integrations On Thu, Jul 27, 2017 at 6:57 PM, Robert Yokota <[hidden email]> wrote:
|
Thank you for sharing! On 28 July 2017 at 05:01, Robert Yokota <[hidden email]> wrote:
|
Restarting this thread since it is relevant to us. We are thinking of using
HBase/Cassandra to store graph data and then load the data from here into Flink/Gelly. One of the issues we are concerned about is the read performance. So far we tried our tests with data residing on HDFS and that worked fine. Is there any guidance on reading from HBase for batch jobs ? Wondering if any experience with this approach. Do's/Don'ts etc.. Thanks -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ |
Have you checked janusgraph source code , it used also hbase as a storage backend: It combines it with elasticsearch for indexing. Maybe you can inspire from the architecture there. Generally, hbase it depends a lot on how the data is written to regions, the order of data and the right key (-> this has then impact on how it is read, also in flink to use locality). There is of course more detail on that and depends on the use case. Generally the hbase documentation is rather good.
|
Free forum by Nabble | Edit this page |