Hi Jessy,
I
have a stateless Flink application where the source and sink
are two different Kafka topics. Is there any benefit in
adding checkpointing for this application?. will it help in
some way for the rewind and replays while restarting from
the failure?
If
you do want to make sure that your application has either
AT_LEAST_ONCE or EXACTLY_ONCE semantic[1] you need to enable the
checkpointing. Flink needs to keep track of the offsets which it
stores in its state to achieve those. Therefore even though your
transformations do not have state themselves, the source does
have a state.
So my question is, processing the incoming streams based
on these rules stored in Flink state per key is efficient or
not ( i am using rocksdb as state-backend ) ?
There is no one good answer for that question. It varies a lot
depending on the volume of data etc. I'd recommend checking it
yourselves if you are happy with the performance. That's
definitely an approach a lot of people implemented and were happy
with it. There is also a blog post (rather oldish by now) which
describes how you could implement such pattern[2]
Is
there any benefit in using Apache camel along with Flink ?
I
am not very familiar with Apache Camel so can't say much on
this. As far as I know Apache Camel is more of a routing system,
whereas Flink is a data processing framework.
Best,
Dawid
[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.13/docs/connectors/datastream/kafka/#kafka-producers-and-fault-tolerance
[2] https://flink.apache.org/2019/06/26/broadcast-state.html
On 10/05/2021 11:13, Jessy Ping wrote:
Hi all,
Currently,
we are exploring the various features of Flink and need some
clarification on the below-mentioned questions.
- I have a
stateless Flink application where the source and sink are
two different Kafka topics. Is there any benefit in adding
checkpointing for this application?. will it help in some
way for the rewind and replays while restarting from the
failure?
- I have a
stateful use case where events are processed based on a
set of dynamic rules provided by an external system, say a
Kafka source. Also, the actual events are distinguishable
based on a key.A broadcast function is used for
broadcasting the dynamic rules and storing the same in
Flink state.
So my question is, processing the incoming streams based on
these rules stored in Flink state per key is efficient or
not ( i am using rocksdb as state-backend ) ?
What about using an external cache for this?
Is stateful function a good contender here?
- Is
there any benefit in using Apache camel along with Flink ?
Thanks
Jessy