Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Robert Yokota
In case anyone is interested, I wrote a blog on how to analyze graphs stored in HBase with Apache Flink Gelly:

Reply | Threaded
Open this post in threaded view
|

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Ted Yu
Interesting blog.

From your experience, is there anything on hbase side which you see room for improvement ?

Which hbase release are you using ?

Cheers

On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <[hidden email]> wrote:
In case anyone is interested, I wrote a blog on how to analyze graphs stored in HBase with Apache Flink Gelly:


Reply | Threaded
Open this post in threaded view
|

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Robert Yokota

One thing I really appreciate about HBase is its flexibility.  It doesn't enforce a schema, but also doesn't prevent you from building a schema layer on top.  It is very customizable, allowing you to push arbitrary code to the server in the form of filters and coprocessors.

Not having such higher-layer features built into HBase allows it to remain flexibile, but it does have a down-side.  One complaint is that for a new user coming to HBase, who perhaps does want to work with things like query languages, schemas, secondary indices, transactions, and so forth, it can be daunting to research and understand what other projects in the HBase ecosystem can help him/her, how others have used such projects, and under what use cases each project might be successful or not.

Perhaps a good start would be something like an "HBase ecosystem" page at the website that would list projects like Phoenix, Tephra, and others in the HBase ecosystem.  The Apache TinkerPop site has a listing of projects in its ecosystem at http://tinkerpop.apache.org.   I think new users coming to HBase aren't even aware of the larger ecosystem, and sometimes end up selecting alternative data stores as a result.

P.S.  I'm using HBase 1.1.2

On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <[hidden email]> wrote:
Interesting blog.

From your experience, is there anything on hbase side which you see room for improvement ?

Which hbase release are you using ?

Cheers

On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <[hidden email]> wrote:
In case anyone is interested, I wrote a blog on how to analyze graphs stored in HBase with Apache Flink Gelly:



Reply | Threaded
Open this post in threaded view
|

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Robert Yokota
Also Google Cloud Bigtable has such a page at https://cloud.google.com/bigtable/docs/integrations

On Thu, Jul 27, 2017 at 6:57 PM, Robert Yokota <[hidden email]> wrote:

One thing I really appreciate about HBase is its flexibility.  It doesn't enforce a schema, but also doesn't prevent you from building a schema layer on top.  It is very customizable, allowing you to push arbitrary code to the server in the form of filters and coprocessors.

Not having such higher-layer features built into HBase allows it to remain flexibile, but it does have a down-side.  One complaint is that for a new user coming to HBase, who perhaps does want to work with things like query languages, schemas, secondary indices, transactions, and so forth, it can be daunting to research and understand what other projects in the HBase ecosystem can help him/her, how others have used such projects, and under what use cases each project might be successful or not.

Perhaps a good start would be something like an "HBase ecosystem" page at the website that would list projects like Phoenix, Tephra, and others in the HBase ecosystem.  The Apache TinkerPop site has a listing of projects in its ecosystem at http://tinkerpop.apache.org.   I think new users coming to HBase aren't even aware of the larger ecosystem, and sometimes end up selecting alternative data stores as a result.

P.S.  I'm using HBase 1.1.2

On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <[hidden email]> wrote:
Interesting blog.

From your experience, is there anything on hbase side which you see room for improvement ?

Which hbase release are you using ?

Cheers

On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <[hidden email]> wrote:
In case anyone is interested, I wrote a blog on how to analyze graphs stored in HBase with Apache Flink Gelly:




Reply | Threaded
Open this post in threaded view
|

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Vasiliki Kalavri
Thank you for sharing!

On 28 July 2017 at 05:01, Robert Yokota <[hidden email]> wrote:
Also Google Cloud Bigtable has such a page at https://cloud.google.com/bigtable/docs/integrations

On Thu, Jul 27, 2017 at 6:57 PM, Robert Yokota <[hidden email]> wrote:

One thing I really appreciate about HBase is its flexibility.  It doesn't enforce a schema, but also doesn't prevent you from building a schema layer on top.  It is very customizable, allowing you to push arbitrary code to the server in the form of filters and coprocessors.

Not having such higher-layer features built into HBase allows it to remain flexibile, but it does have a down-side.  One complaint is that for a new user coming to HBase, who perhaps does want to work with things like query languages, schemas, secondary indices, transactions, and so forth, it can be daunting to research and understand what other projects in the HBase ecosystem can help him/her, how others have used such projects, and under what use cases each project might be successful or not.

Perhaps a good start would be something like an "HBase ecosystem" page at the website that would list projects like Phoenix, Tephra, and others in the HBase ecosystem.  The Apache TinkerPop site has a listing of projects in its ecosystem at http://tinkerpop.apache.org.   I think new users coming to HBase aren't even aware of the larger ecosystem, and sometimes end up selecting alternative data stores as a result.

P.S.  I'm using HBase 1.1.2

On Thu, Jul 27, 2017 at 5:42 PM, Ted Yu <[hidden email]> wrote:
Interesting blog.

From your experience, is there anything on hbase side which you see room for improvement ?

Which hbase release are you using ?

Cheers

On Thu, Jul 27, 2017 at 3:11 PM, Robert Yokota <[hidden email]> wrote:
In case anyone is interested, I wrote a blog on how to analyze graphs stored in HBase with Apache Flink Gelly:





Reply | Threaded
Open this post in threaded view
|

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

santoshg
Restarting this thread since it is relevant to us. We are thinking of using
HBase/Cassandra to store graph data and then load the data from here into
Flink/Gelly. One of the issues we are concerned about is the read
performance. So far we tried our tests with data residing on HDFS and that
worked fine.

Is there any guidance on reading from HBase for batch jobs ? Wondering if
any experience with this approach. Do's/Don'ts etc..

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Graph Analytics on HBase With HGraphDB and Apache Flink Gelly

Jörn Franke
Have you checked janusgraph source code , it used also hbase as a storage backend:
It combines it with elasticsearch for indexing. Maybe you can inspire from the architecture there.

Generally, hbase it depends a lot on how the data is written to regions, the order of data and the right key (-> this has then impact on how it is read, also in flink to use locality). There is of course more detail on that and depends on the use case. Generally the hbase documentation is rather good.

On 4. Apr 2018, at 23:38, santoshg <[hidden email]> wrote:

Restarting this thread since it is relevant to us. We are thinking of using
HBase/Cassandra to store graph data and then load the data from here into
Flink/Gelly. One of the issues we are concerned about is the read
performance. So far we tried our tests with data residing on HDFS and that
worked fine.

Is there any guidance on reading from HBase for batch jobs ? Wondering if
any experience with this approach. Do's/Don'ts etc..

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/