Architecture question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

Architecture question

I need to grab avro data from a kafka topic and write to the local file

Inside the avro record there is a date time field. From that field I need to
name the file accordingly. (20180103) as an example

I was thinking of using flink to read, unpack this generic record then put
to a sink that will sort to make sure it goes into the right file.

Does anyhow have a high-level approach for this ?

The bucketing sink look promising. Any examples of this type of problem for
flink to solve ?


Sent from:
Reply | Threaded
Open this post in threaded view

Re: Architecture question

Fabian Hueske-2

What you are looking for is a BucketingSink that works on event time (the timestamp is encoded in your data).
AFAIK, Flink's BucketingSink has been designed to work in processing time, but you can implement a Bucketer that creates buckets based on a timestamp in the data.
You might need to play around with the parameters for closing open buckets for a good behavior (similar to watermark tuning).

Best, Fabian

2018-02-14 22:18 GMT+01:00 robert <[hidden email]>:
I need to grab avro data from a kafka topic and write to the local file

Inside the avro record there is a date time field. From that field I need to
name the file accordingly. (20180103) as an example

I was thinking of using flink to read, unpack this generic record then put
to a sink that will sort to make sure it goes into the right file.

Does anyhow have a high-level approach for this ?

The bucketing sink look promising. Any examples of this type of problem for
flink to solve ?


Sent from: