Architecture question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Architecture question

robert
I need to grab avro data from a kafka topic and write to the local file
system

Inside the avro record there is a date time field. From that field I need to
name the file accordingly. (20180103) as an example


I was thinking of using flink to read, unpack this generic record then put
to a sink that will sort to make sure it goes into the right file.

Does anyhow have a high-level approach for this ?

The bucketing sink look promising. Any examples of this type of problem for
flink to solve ?

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Architecture question

Fabian Hueske-2
Hi,

What you are looking for is a BucketingSink that works on event time (the timestamp is encoded in your data).
AFAIK, Flink's BucketingSink has been designed to work in processing time, but you can implement a Bucketer that creates buckets based on a timestamp in the data.
You might need to play around with the parameters for closing open buckets for a good behavior (similar to watermark tuning).

Best, Fabian

2018-02-14 22:18 GMT+01:00 robert <[hidden email]>:
I need to grab avro data from a kafka topic and write to the local file
system

Inside the avro record there is a date time field. From that field I need to
name the file accordingly. (20180103) as an example


I was thinking of using flink to read, unpack this generic record then put
to a sink that will sort to make sure it goes into the right file.

Does anyhow have a high-level approach for this ?

The bucketing sink look promising. Any examples of this type of problem for
flink to solve ?

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/