Hi,
Generally speaking you can handle reading multiple avro schema
versions as long as you have access to all other versions. How it
is usually done is with some schema registry service. Flink comes
with utility ConfluentRegistryAvroDeserializationSchema
that allows you to read binary encoded avro messages with
lookups to confluent schema registry. Maybe you can reuse some
logic there to deserialize your avro messages once extracted
from json.
Once you have your messages as Avro objects
you could have a look at StreamingFileSink and
ParquetAvroWriters classes. Those classes can be used to write
avro records to parquet files.
Best,
Dawid
On 27/11/2018 11:35, CPC wrote:
Hi everybody,
We are planning to use flink for our kafka to hdfs
ingestion. We are consuming avro messages encoded as json and
then writing them to hdfs as parquet. But our avro schema is
changing time to time in a backward compatible way. And
because of deployments you can see messages like
v1 v1 v1 v1 v2 v2 v1 v2 v1 ..... v2 v2 v2
i mean in the course of deployment there is a small time
window in which messages can with mixed versions. I just want
to ask whether flink/avro/parquet can handle this scenario or
is there something we need to do as well. Thanks in advance.