Data Source Generator emits 4 instances of the same tuple

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Source Generator emits 4 instances of the same tuple

Biplob Biswas
Hi,

I am using a Data Source Generator, which very closely resembles the one here on the dataartisans github page

https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java

This essentially reads from a file and then emits the data which is then consumed by my flink program.

But when I run this, I get 4 instances of the same tuple, so if I have 10 samples, I am getting 40 samples.

I am not really sure what's the problem here, if needed I can post my code as well.

Thanks and Regards
Biplob Biswas
Reply | Threaded
Open this post in threaded view
|

Re: Data Source Generator emits 4 instances of the same tuple

Ufuk Celebi
Can you please post your code as well? The duplication might happen in
your part of the code and not the TaxiRideSource.

– Ufuk


On Mon, Jun 6, 2016 at 10:30 AM, Biplob Biswas <[hidden email]> wrote:

> Hi,
>
> I am using a Data Source Generator, which very closely resembles the one
> here on the dataartisans github page
>
> https://github.com/dataArtisans/flink-training-exercises/blob/master/src/main/java/com/dataartisans/flinktraining/exercises/datastream_java/sources/TaxiRideSource.java
>
> This essentially reads from a file and then emits the data which is then
> consumed by my flink program.
>
> But when I run this, I get 4 instances of the same tuple, so if I have 10
> samples, I am getting 40 samples.
>
> I am not really sure what's the problem here, if needed I can post my code
> as well.
>
> Thanks and Regards
> Biplob Biswas
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Data Source Generator emits 4 instances of the same tuple

Biplob Biswas
Hi,

I tried streaming the source data 2 ways

1. Is a simple straight forward way of sending data without using the serving speed concept
http://pastebin.com/cTv0Pk5U


2. The one where I use the TaxiRide source which is exactly similar except loading the data in the proper data structures.
http://pastebin.com/NenvXShH


I hope to get a solution out of it.

Thanks and Regards
Biplob Biswas

Reply | Threaded
Open this post in threaded view
|

Re: Data Source Generator emits 4 instances of the same tuple

Fabian Hueske-2
We solved this problem yesterday at the Flink Hackathon.
The issue was that the source function was started with parallelism 4 and each function read the whole file.

Cheers, Fabian

2016-06-06 16:53 GMT+02:00 Biplob Biswas <[hidden email]>:
Hi,

I tried streaming the source data 2 ways

1. Is a simple straight forward way of sending data without using the
serving speed concept
http://pastebin.com/cTv0Pk5U


2. The one where I use the TaxiRide source which is exactly similar except
loading the data in the proper data structures.
http://pastebin.com/NenvXShH


I hope to get a solution out of it.

Thanks and Regards
Biplob Biswas





--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Data-Source-Generator-emits-4-instances-of-the-same-tuple-tp7392p7405.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: Data Source Generator emits 4 instances of the same tuple

Biplob Biswas
Yes Thanks a lot, also the fact that I was using ParallelSourceFunction was problematic. So as suggested by Fabian and Robert, I used Source Function and then in the flink job, i set the output of map with a parallelism of 4 to get the desired result.

Thanks again.