Login  Register

Multiple ElasticSearch sinks

classic Classic list List threaded Threaded
2 messages Options Options
Embed post
Permalink
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Multiple ElasticSearch sinks

Flavio Pompermaier
858 posts
Hi to all,

I have a Flink job that produce json objects that I'd like to index in different Elasticsearch indices depending on the "type" attribute of my json object (e.g. "people", "places", etc..).
Is there any previous attempt to do something like that in Flink?
I was thinking to use the EsHadoopOutputFormat but it requires to specify the index name in the job conf..however, in my use case I'll know the target indices only once the computation finish so Flink can't know how many sinks there will be in the pre-flight phase..

My solution at the moment was to implement my own mapPartition function that instantiate a client to ES and index the json documents in the right index at the end of the job pipeline..is there any better approach to it?

Best,
Flavio
Reply | Threaded
Open this post in threaded view
| More
Print post
Permalink

Re: Multiple ElasticSearch sinks

rmetzger0
1086 posts
Hi,
I don't know anybody who has reported about something like this before on our lists.
Since you don't know the types before, the mapPartition approach sounds good.

On Fri, Jul 10, 2015 at 5:02 PM, Flavio Pompermaier <[hidden email]> wrote:
Hi to all,

I have a Flink job that produce json objects that I'd like to index in different Elasticsearch indices depending on the "type" attribute of my json object (e.g. "people", "places", etc..).
Is there any previous attempt to do something like that in Flink?
I was thinking to use the EsHadoopOutputFormat but it requires to specify the index name in the job conf..however, in my use case I'll know the target indices only once the computation finish so Flink can't know how many sinks there will be in the pre-flight phase..

My solution at the moment was to implement my own mapPartition function that instantiate a client to ES and index the json documents in the right index at the end of the job pipeline..is there any better approach to it?

Best,
Flavio