Number of parallel connections for Elasticsearch Connector

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Number of parallel connections for Elasticsearch Connector

Rex Fenley
Hello,

How many connections does the ES connector use to write to Elasticsearch? We have a single machine with 16 vCPUs and parallelism of 4 running our job, with -p 4 I'd expect there to be 4 parallel bulk request writers / connections to Elasticsearch. Is there a place in the code to confirm this?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: Number of parallel connections for Elasticsearch Connector

Rex Fenley

Does each subtask of an Elasticsearch sink have it's own separate Bulk Processor to allow for parallel bulk writes?

Thanks!

On Sat, Jan 16, 2021 at 10:33 AM Rex Fenley <[hidden email]> wrote:
Hello,

How many connections does the ES connector use to write to Elasticsearch? We have a single machine with 16 vCPUs and parallelism of 4 running our job, with -p 4 I'd expect there to be 4 parallel bulk request writers / connections to Elasticsearch. Is there a place in the code to confirm this?

Thanks!

--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US



--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US

Reply | Threaded
Open this post in threaded view
|

Re: Number of parallel connections for Elasticsearch Connector

Yangze Guo
Hi, Rex.

> How many connections does the ES connector use to write to Elasticsearch?
I think the number is equal to your parallelism. Each subtask of an
Elasticsearch sink will have its own separate Bulk Processor as both
the Client and the Bulk Processor are class private[1]. The subtasks
will be placed into different slots and have their own Elasticsearch
sink instance.

[1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L204.

Best,
Yangze Guo

On Sun, Jan 17, 2021 at 11:33 AM Rex Fenley <[hidden email]> wrote:

>
> I found the following, indicating that there is no concurrency for the Elasticsearch Connector https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L382
>
> Does each subtask of an Elasticsearch sink have it's own separate Bulk Processor to allow for parallel bulk writes?
>
> Thanks!
>
> On Sat, Jan 16, 2021 at 10:33 AM Rex Fenley <[hidden email]> wrote:
>>
>> Hello,
>>
>> How many connections does the ES connector use to write to Elasticsearch? We have a single machine with 16 vCPUs and parallelism of 4 running our job, with -p 4 I'd expect there to be 4 parallel bulk request writers / connections to Elasticsearch. Is there a place in the code to confirm this?
>>
>> Thanks!
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com |  BLOG  |  FOLLOW US  |  LIKE US
>
>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com |  BLOG  |  FOLLOW US  |  LIKE US
Reply | Threaded
Open this post in threaded view
|

Re: Number of parallel connections for Elasticsearch Connector

Rex Fenley
Great, thanks!

On Sun, Jan 17, 2021 at 6:24 PM Yangze Guo <[hidden email]> wrote:
Hi, Rex.

> How many connections does the ES connector use to write to Elasticsearch?
I think the number is equal to your parallelism. Each subtask of an
Elasticsearch sink will have its own separate Bulk Processor as both
the Client and the Bulk Processor are class private[1]. The subtasks
will be placed into different slots and have their own Elasticsearch
sink instance.

[1] https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L204.

Best,
Yangze Guo

On Sun, Jan 17, 2021 at 11:33 AM Rex Fenley <[hidden email]> wrote:
>
> I found the following, indicating that there is no concurrency for the Elasticsearch Connector https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L382
>
> Does each subtask of an Elasticsearch sink have it's own separate Bulk Processor to allow for parallel bulk writes?
>
> Thanks!
>
> On Sat, Jan 16, 2021 at 10:33 AM Rex Fenley <[hidden email]> wrote:
>>
>> Hello,
>>
>> How many connections does the ES connector use to write to Elasticsearch? We have a single machine with 16 vCPUs and parallelism of 4 running our job, with -p 4 I'd expect there to be 4 parallel bulk request writers / connections to Elasticsearch. Is there a place in the code to confirm this?
>>
>> Thanks!
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com |  BLOG  |  FOLLOW US  |  LIKE US
>
>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com |  BLOG  |  FOLLOW US  |  LIKE US


--

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com |  BLOG  |  FOLLOW US  |  LIKE US