Ververica Flink training resources

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Ververica Flink training resources

Piper Piper
Hi Flink community,

I have two questions regarding the Ververica Flink Training resources.

1. In the official Flink documentation, the hyperlinks to the github sites for the exercises in the "Learn Flink" section are not working. If possible, please provide me with the correct links for the exercises.

2. The schema of the Taxi Fares dataset matches with the old dataset (nycTaxiFares.gz). However, the schema of the Taxi Ride dataset given in the Ververica github site does not seem to match the dataset in the old file (nycTaxiRides.gz). Please advise.

Given Schema: rideId, taxiId, driverId, isStart, startTime, endTime, startLon, startLat, endLon, endLat, passengerCnt

nycTaxiRides.gz sample line (after extracting to file nycTaxiRides4): 6,START,2013-01-01 00:00:00,1970-01-01 00:00:00,-73.866135,40.771091,-73.961334,40.764912,6,2013000006,2013000006

Thank you!

Piper
Reply | Threaded
Open this post in threaded view
|

Re: Ververica Flink training resources

David Anderson-3
Piper,

1. Thanks for reporting the problem with the broken links. I've just fixed this.

2. The exercises were recently rewritten so that they no longer use the old file-based datasets. Now they use data generators that are included in the project. As part of this update, the schema was modified slightly (so that the TaxiRide and TaxiFare types can be serialized with Flink's POJO serializer). Is this causing a problem? 

Best,
David

On Sun, Aug 23, 2020 at 12:20 AM Piper Piper <[hidden email]> wrote:
Hi Flink community,

I have two questions regarding the Ververica Flink Training resources.

1. In the official Flink documentation, the hyperlinks to the github sites for the exercises in the "Learn Flink" section are not working. If possible, please provide me with the correct links for the exercises.

2. The schema of the Taxi Fares dataset matches with the old dataset (nycTaxiFares.gz). However, the schema of the Taxi Ride dataset given in the Ververica github site does not seem to match the dataset in the old file (nycTaxiRides.gz). Please advise.

Given Schema: rideId, taxiId, driverId, isStart, startTime, endTime, startLon, startLat, endLon, endLat, passengerCnt

nycTaxiRides.gz sample line (after extracting to file nycTaxiRides4): 6,START,2013-01-01 00:00:00,1970-01-01 00:00:00,-73.866135,40.771091,-73.961334,40.764912,6,2013000006,2013000006

Thank you!

Piper
Reply | Threaded
Open this post in threaded view
|

Re: Ververica Flink training resources

Piper Piper
Hi David

1. Thank you for fixing the links!

2. I downloaded the repo and data files in the middle of the rewriting, so the schema mentioned in the repo did not match the files. The new exercises are running well but I could not adjust the servingspeedfactor to speed up the serving of data events. I'm guessing this feature was removed in the new repo.

Best,
Piper

On Sun, Aug 23, 2020 at 10:15 AM David Anderson <[hidden email]> wrote:
Piper,

1. Thanks for reporting the problem with the broken links. I've just fixed this.

2. The exercises were recently rewritten so that they no longer use the old file-based datasets. Now they use data generators that are included in the project. As part of this update, the schema was modified slightly (so that the TaxiRide and TaxiFare types can be serialized with Flink's POJO serializer). Is this causing a problem? 

Best,
David

On Sun, Aug 23, 2020 at 12:20 AM Piper Piper <[hidden email]> wrote:
Hi Flink community,

I have two questions regarding the Ververica Flink Training resources.

1. In the official Flink documentation, the hyperlinks to the github sites for the exercises in the "Learn Flink" section are not working. If possible, please provide me with the correct links for the exercises.

2. The schema of the Taxi Fares dataset matches with the old dataset (nycTaxiFares.gz). However, the schema of the Taxi Ride dataset given in the Ververica github site does not seem to match the dataset in the old file (nycTaxiRides.gz). Please advise.

Given Schema: rideId, taxiId, driverId, isStart, startTime, endTime, startLon, startLat, endLon, endLat, passengerCnt

nycTaxiRides.gz sample line (after extracting to file nycTaxiRides4): 6,START,2013-01-01 00:00:00,1970-01-01 00:00:00,-73.866135,40.771091,-73.961334,40.764912,6,2013000006,2013000006

Thank you!

Piper
Reply | Threaded
Open this post in threaded view
|

Re: Ververica Flink training resources

David Anderson-3
Piper,

I'm happy to know that the exercises are working for you.
 
The new exercises are running well but I could not adjust the servingspeedfactor to speed up the serving of data events. I'm guessing this feature was removed in the new repo.

That's right. The feature of adjusting the serving speed wasn't needed for the exercises, and was sometimes a point of confusion during training. It seemed best to remove this distraction.

Best,
David 

On Sun, Aug 23, 2020 at 9:21 PM Piper Piper <[hidden email]> wrote:
Hi David

1. Thank you for fixing the links!

2. I downloaded the repo and data files in the middle of the rewriting, so the schema mentioned in the repo did not match the files. The new exercises are running well but I could not adjust the servingspeedfactor to speed up the serving of data events. I'm guessing this feature was removed in the new repo.

Best,
Piper

On Sun, Aug 23, 2020 at 10:15 AM David Anderson <[hidden email]> wrote:
Piper,

1. Thanks for reporting the problem with the broken links. I've just fixed this.

2. The exercises were recently rewritten so that they no longer use the old file-based datasets. Now they use data generators that are included in the project. As part of this update, the schema was modified slightly (so that the TaxiRide and TaxiFare types can be serialized with Flink's POJO serializer). Is this causing a problem? 

Best,
David

On Sun, Aug 23, 2020 at 12:20 AM Piper Piper <[hidden email]> wrote:
Hi Flink community,

I have two questions regarding the Ververica Flink Training resources.

1. In the official Flink documentation, the hyperlinks to the github sites for the exercises in the "Learn Flink" section are not working. If possible, please provide me with the correct links for the exercises.

2. The schema of the Taxi Fares dataset matches with the old dataset (nycTaxiFares.gz). However, the schema of the Taxi Ride dataset given in the Ververica github site does not seem to match the dataset in the old file (nycTaxiRides.gz). Please advise.

Given Schema: rideId, taxiId, driverId, isStart, startTime, endTime, startLon, startLat, endLon, endLat, passengerCnt

nycTaxiRides.gz sample line (after extracting to file nycTaxiRides4): 6,START,2013-01-01 00:00:00,1970-01-01 00:00:00,-73.866135,40.771091,-73.961334,40.764912,6,2013000006,2013000006

Thank you!

Piper