(DEPRECATED) Apache Flink User Mailing List archive.

StateFun scalability

Classic

List

Threaded

2 messages Options

Martijn de Heus

StateFun scalability

Hi all,

I’ve been working with StateFun for a bit for my university project. I am now trying to increase the number of StateFun workers and the parallelism, however this barely seems to increase the throughput of my system.

I have 5000 function instances in my system during my tests. Once I increase the workers from 1 to 3 I notice a significant increase in throughput, however from 3 to 5 (or even to 7) I notice no increase. I run all workers with 4 CPUs and made sure that Kafka and my deployed colocated functions are not causing any bottlenecks. I also have many partitions for the ingress topics.

I attached my flink-conf.yaml below. Is this expected behaviour for StateFun or am I missing some configuration which can improve my performance. Also if this is expected for StateFun, what could be causing this?

Best regards,

Martijn

jobmanager.rpc.address: statefun-master
taskmanager.numberOfTaskSlots: 1
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
classloader.parent-first-patterns.additional: org.apache.flink.statefun;org.apache.kafka;com.google.protobuf
state.checkpoints.dir:
file:///checkpoint-dir
state.backend: rocksdb
state.backend.rocksdb.timer-service.factory: ROCKSDB
state.backend.incremental: true
execution.checkpointing.interval: 10sec
execution.checkpointing.mode: EXACTLY_ONCE
restart-strategy: fixed-delay
restart-strategy.fixed-delay.attempts: 2147483647
restart-strategy.fixed-delay.delay: 1sec
jobmanager.memory.process.size: 1g
taskmanager.memory.process.size: 1g
parallelism.default: 5

Igal Shilman

Re: StateFun scalability

Hello Martijn,

Great to hear that you are exploring StateFun as part of your university project!

Can you please clarify:

- how do you measure throughput?

- by co-located functions, do you mean a remote function on the same machine?

- Can you share a little bit more about your functions, what are they doing?

- Do you use any kind of state?

- What kind of messages do you send? are you using Protobuf for messages or something else?

Can you validate your setup vs a vanilla Flink program (something like a wordcount)

Thanks,

Igal

On Thu, Feb 4, 2021 at 9:51 PM Martijn de Heus <[hidden email]> wrote:

Hi all,

I’ve been working with StateFun for a bit for my university project. I am now trying to increase the number of StateFun workers and the parallelism, however this barely seems to increase the throughput of my system.

I have 5000 function instances in my system during my tests. Once I increase the workers from 1 to 3 I notice a significant increase in throughput, however from 3 to 5 (or even to 7) I notice no increase. I run all workers with 4 CPUs and made sure that Kafka and my deployed colocated functions are not causing any bottlenecks. I also have many partitions for the ingress topics.

I attached my flink-conf.yaml below. Is this expected behaviour for StateFun or am I missing some configuration which can improve my performance. Also if this is expected for StateFun, what could be causing this?

Best regards,

Martijn

jobmanager.rpc.address: statefun-master

taskmanager.numberOfTaskSlots: 1

blob.server.port: 6124

jobmanager.rpc.port: 6123

taskmanager.rpc.port: 6122

classloader.parent-first-patterns.additional: org.apache.flink.statefun;org.apache.kafka;com.google.protobuf

state.checkpoints.dir: file:///checkpoint-dir

state.backend: rocksdb

state.backend.rocksdb.timer-service.factory: ROCKSDB

state.backend.incremental: true

execution.checkpointing.interval: 10sec

execution.checkpointing.mode: EXACTLY_ONCE

restart-strategy: fixed-delay

restart-strategy.fixed-delay.attempts: 2147483647

restart-strategy.fixed-delay.delay: 1sec

jobmanager.memory.process.size: 1g

taskmanager.memory.process.size: 1g

parallelism.default: 5