Re: Help needed to increase throughput of simple flink app

Posted by Arvid Heise-3 on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Help-needed-to-increase-throughput-of-simple-flink-app-tp39289p39405.html

The common solution is to use a schema registry, like Confluent schema registry [1]. All records have a small 5 byte prefix that identifies the schema and that gets fetched by deserializer [2]. Here are some resources on how to properly secure communication if needed [3].

[1] https://docs.confluent.io/current/schema-registry/index.html
[2] https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/formats/avro/registry/confluent/ConfluentRegistryAvroDeserializationSchema.html
[3] https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-schema-registry.html

On Thu, Nov 12, 2020 at 10:11 AM ashwinkonale <[hidden email]> wrote:
Hi,
Thanks a lot for the reply. And you both are right. Serializing
GenericRecord without specifying schema was indeed a HUGE bottleneck in my
app. I got to know it through jfr analysis and then read the blog post you
mentioned. Now I am able to pump in lot more data per second. (In my test
setup atleast). I am going to try this with kafka.
But now it poses me a problem, that my app cannot handle schema changes
automatically since at the startup flink needs to know schema. If there is a
backward compatible change in upstream, new messages will not be read
properly. Do you know any workarounds for this ?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng