Writing a DataSet to ElasticSearch

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Writing a DataSet to ElasticSearch

Niels Basjes
Hi,

I have a job in Flink 1.10.0 which creates data that I need to write to ElasticSearch.
Because it really is a Batch (and doing it as a stream keeps giving OOM problems: big + unordered + groupby) I'm trying to do it as a real batch.

To write a DataSet to some output (that is not a file) an OutputFormat implementation is needed.
public DataSink<T> output(OutputFormat<T> outputFormat)
The problem I have is that I have not been able to find a "OutputFormat" for ElasticSearch.
Adding ES as a Sink to a DataStream is trivial because a Sink is provided out of the box.

The only alternative I came up with is to write the output of my batch to a file and then load that (with a stream) into ES.

What is the proper solution?
Is there an OutputFormat for ES I can use that I overlooked?

--
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply | Threaded
Open this post in threaded view
|

Re: Writing a DataSet to ElasticSearch

rmetzger0
Hey Niels,

For the OOM problem: Did you try RocksDB?

I don't think there's an ES OutputFormat.

I guess there's no way around implementing your own OutputFormat for ES, if you want to use the DataSet API. It should not be too hard to implement.


On Sun, Mar 1, 2020 at 1:42 PM Niels Basjes <[hidden email]> wrote:
Hi,

I have a job in Flink 1.10.0 which creates data that I need to write to ElasticSearch.
Because it really is a Batch (and doing it as a stream keeps giving OOM problems: big + unordered + groupby) I'm trying to do it as a real batch.

To write a DataSet to some output (that is not a file) an OutputFormat implementation is needed.
public DataSink<T> output(OutputFormat<T> outputFormat)
The problem I have is that I have not been able to find a "OutputFormat" for ElasticSearch.
Adding ES as a Sink to a DataStream is trivial because a Sink is provided out of the box.

The only alternative I came up with is to write the output of my batch to a file and then load that (with a stream) into ES.

What is the proper solution?
Is there an OutputFormat for ES I can use that I overlooked?

--
Best regards / Met vriendelijke groeten,

Niels Basjes