Hi,
I am using a Data Source Generator, which very closely resembles the one here on the dataartisans github page https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java This essentially reads from a file and then emits the data which is then consumed by my flink program. But when I run this, I get 4 instances of the same tuple, so if I have 10 samples, I am getting 40 samples. I am not really sure what's the problem here, if needed I can post my code as well. Thanks and Regards Biplob Biswas |
Can you please post your code as well? The duplication might happen in
your part of the code and not the TaxiRideSource. – Ufuk On Mon, Jun 6, 2016 at 10:30 AM, Biplob Biswas <[hidden email]> wrote: > Hi, > > I am using a Data Source Generator, which very closely resembles the one > here on the dataartisans github page > > https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java > > This essentially reads from a file and then emits the data which is then > consumed by my flink program. > > But when I run this, I get 4 instances of the same tuple, so if I have 10 > samples, I am getting 40 samples. > > I am not really sure what's the problem here, if needed I can post my code > as well. > > Thanks and Regards > Biplob Biswas > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. |
Hi,
I tried streaming the source data 2 ways 1. Is a simple straight forward way of sending data without using the serving speed concept http://pastebin.com/cTv0Pk5U 2. The one where I use the TaxiRide source which is exactly similar except loading the data in the proper data structures. http://pastebin.com/NenvXShH I hope to get a solution out of it. Thanks and Regards Biplob Biswas |
We solved this problem yesterday at the Flink Hackathon. The issue was that the source function was started with parallelism 4 and each function read the whole file.2016-06-06 16:53 GMT+02:00 Biplob Biswas <[hidden email]>: Hi, |
Yes Thanks a lot, also the fact that I was using ParallelSourceFunction was problematic. So as suggested by Fabian and Robert, I used Source Function and then in the flink job, i set the output of map with a parallelism of 4 to get the desired result.
Thanks again. |
Free forum by Nabble | Edit this page |