Hi, I need some suggestions regarding accessing RDF triples from flink. I'm trying to integrate flink in a pipeline where the input for flink comes from SPARQL query on a Jena model. And after modification of triples using flink, I will be performing SPARQL update using Jena to save my changes.
Any suggestion will be of great help. Regards, Ritesh |
Ho Ritesh, From my experience reading triples from jena models is evil because it has some problems with garbage collection. On 6 Apr 2016 00:51, "Ritesh Kumar Singh" <[hidden email]> wrote:
|
Hi Flavio,
Basically, any help regarding this will be helpful. Regards, Ritesh On Wed, Apr 6, 2016 at 2:45 PM, Flavio Pompermaier <[hidden email]> wrote:
|
Hi Ritesh,
Jena could store triples in NQuadsInputFormat that is an HadoopInputFormat so that you can read data in effiient way with Flink. Unfortunately I rembember that I had some problem usign it so I just export my Jena model as NQuads so then I can parse it efficiently with Flink as a text file. However the parsing with sesame 4 is more efficient in terms of speed and garbage collection. What I do is to convert every quad into a tuple5, group triples/quads by subject and then apply some logic. The quads grouped by subject is what we call "entiton atom" and combining them leads to an "entiton molecule" (i.e. a graph rooted in some entiton atom). We presented our work at FlinkForward 2015 in Berlin: If you need some code that reads the nquads with Flink I can give you some code, just write me in private! Best, Flavio On Wed, Apr 6, 2016 at 3:57 PM, Ritesh Kumar Singh <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |