R/W traffic estimation between Flink and Zookeeper

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

R/W traffic estimation between Flink and Zookeeper

Hao Sun
Hi Is there a way to estimate read/write traffic between flink and zk?
I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: R/W traffic estimation between Flink and Zookeeper

Piotr Nowojski
Hi,

Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric.

As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network. In the very simplest cases it is possible to achieve millions of records per second per machine. It would be best to try it out in your particular use case on some small scale.

Piotrek

> On 11 Oct 2017, at 19:58, Hao Sun <[hidden email]> wrote:
>
> Hi Is there a way to estimate read/write traffic between flink and zk?
> I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.
>
> Thanks

Reply | Threaded
Open this post in threaded view
|

Re: R/W traffic estimation between Flink and Zookeeper

Hao Sun
Thanks Piotr, does Flink read/write to zookeeper every time it process a record?
I thought only JM uses ZK to keep some meta level data, not sure why `it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network.`

On Thu, Oct 12, 2017 at 1:35 AM Piotr Nowojski <[hidden email]> wrote:
Hi,

Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric.

As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network. In the very simplest cases it is possible to achieve millions of records per second per machine. It would be best to try it out in your particular use case on some small scale.

Piotrek

> On 11 Oct 2017, at 19:58, Hao Sun <[hidden email]> wrote:
>
> Hi Is there a way to estimate read/write traffic between flink and zk?
> I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.
>
> Thanks

Reply | Threaded
Open this post in threaded view
|

Re: R/W traffic estimation between Flink and Zookeeper

Stefan Richter
Hi,

I think Zookeeper is only used as a meta data store in HA mode. Interactions with ZK are not part of the per-record stream processing code paths of Flink. Things that are written to ZK can (also depending on your job) include e.g. the job graph, Kafka offsets, or the meta data about available checkpoints to recover from. Some of those interactions happen only once per job, others happen periodically. In the big picture, interactions with ZK happen rather rarely, but of course this also depends on configuration parameters like your checkpointing interval. For a typical job, I would estimate that ZK interactions occur less than once per second. As for typical message sizes, if would estimate something between a few bytes or kilobytes for most messages and somewhere in the low two-digit megabytes as a typical max size.

Best,
Stefan

Am 15.11.2017 um 18:41 schrieb Hao Sun <[hidden email]>:

Thanks Piotr, does Flink read/write to zookeeper every time it process a record?
I thought only JM uses ZK to keep some meta level data, not sure why `it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network.`

On Thu, Oct 12, 2017 at 1:35 AM Piotr Nowojski <[hidden email]> wrote:
Hi,

Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric.

As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network. In the very simplest cases it is possible to achieve millions of records per second per machine. It would be best to try it out in your particular use case on some small scale.

Piotrek

> On 11 Oct 2017, at 19:58, Hao Sun <[hidden email]> wrote:
>
> Hi Is there a way to estimate read/write traffic between flink and zk?
> I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.
>
> Thanks


Reply | Threaded
Open this post in threaded view
|

Re: R/W traffic estimation between Flink and Zookeeper

Hao Sun

Great, thanks for the info, Stefan.


On Thu, Nov 16, 2017, 01:59 Stefan Richter <[hidden email]> wrote:
Hi,

I think Zookeeper is only used as a meta data store in HA mode. Interactions with ZK are not part of the per-record stream processing code paths of Flink. Things that are written to ZK can (also depending on your job) include e.g. the job graph, Kafka offsets, or the meta data about available checkpoints to recover from. Some of those interactions happen only once per job, others happen periodically. In the big picture, interactions with ZK happen rather rarely, but of course this also depends on configuration parameters like your checkpointing interval. For a typical job, I would estimate that ZK interactions occur less than once per second. As for typical message sizes, if would estimate something between a few bytes or kilobytes for most messages and somewhere in the low two-digit megabytes as a typical max size.

Best,
Stefan

Am 15.11.2017 um 18:41 schrieb Hao Sun <[hidden email]>:

Thanks Piotr, does Flink read/write to zookeeper every time it process a record?
I thought only JM uses ZK to keep some meta level data, not sure why `it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network.`

On Thu, Oct 12, 2017 at 1:35 AM Piotr Nowojski <[hidden email]> wrote:
Hi,

Are you asking how to measure records/s or is it possible to achieve it? To measure it you can check numRecordsInPerSecond metric.

As far if 1000 records/s is possible, it depends on many things like state backend used, state size, complexity of your application, size of the records, number of machines, their hardware and the network. In the very simplest cases it is possible to achieve millions of records per second per machine. It would be best to try it out in your particular use case on some small scale.

Piotrek

> On 11 Oct 2017, at 19:58, Hao Sun <[hidden email]> wrote:
>
> Hi Is there a way to estimate read/write traffic between flink and zk?
> I am looking for something like 1000 reads/sec or 1000 writes/sec. And the size of the message.
>
> Thanks