(DEPRECATED) Apache Flink User Mailing List archive.

Yahoo! Streaming Benchmark with Flink

Classic

List

Threaded

3 messages Options

Eric Fukuda

Yahoo! Streaming Benchmark with Flink

Hi,

I have two questions on the blog post on Yahoo! Streaming Benchmark with Flink [1].

First is about the join operation to associate ad_ids and campaign_ids. In flink.benchmark.state.AdvertisingTopologyFlinkStateHighKeyCard, I don't see this being done. Is there a reason for this?

Second is about Akka actor. Reading flink.benchmark.state.QueryableWindowOperator or flink.benchmark.state.QueryableWindowOperatorEvicting, it looks like the Akka actor is being prepared but not used in the actual processing (processElement()). Is this correct? And how do I enable Akka in the job?

[1] http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

Thanks,

Eric

Till Rohrmann

Re: Yahoo! Streaming Benchmark with Flink

Hi Eric,

concerning your first question. I think that AdvertisingTopologyFlinkStateHighKeyCard models a different scenario where one tries to count the number ads per campaign for a large number of campaigns. In this scenario, the input data already contains the campaign id for each ad. I think this is the job for the paragraph "Winning Twitter Hack Week: Eliminating the key-value store bottleneck".

concerning your second question. The response actor is registered at the registration service. The registration service exposes the akka URL of this actor under the index of the running task. When you run AkkaStateQuery, the registration is queried to retrieve the akka URL and then a query state request is sent to the response actor via the QueryActor. That is how the actor comes into play.

At the moment the registration service is implemented using ZooKeeper. This means that the akka URL is written to ZooKeeper from where it can be retrieved.

I hope this answers your questions.

Cheers,

Till

On Fri, Oct 28, 2016 at 2:47 AM, Eric Fukuda <[hidden email]> wrote:

Hi,

I have two questions on the blog post on Yahoo! Streaming Benchmark with Flink [1].

First is about the join operation to associate ad_ids and campaign_ids. In flink.benchmark.state.AdvertisingTopologyFlinkStateHighKeyCard, I don't see this being done. Is there a reason for this?

Second is about Akka actor. Reading flink.benchmark.state.QueryableWindowOperator or flink.benchmark.state.QueryableWindowOperatorEvicting, it looks like the Akka actor is being prepared but not used in the actual processing (processElement()). Is this correct? And how do I enable Akka in the job?

[1] http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

Thanks,
Eric

Eric Fukuda

Re: Yahoo! Streaming Benchmark with Flink

Thanks Till, your reply answered my questions perfectly.

Regards,

Eric

On Fri, Oct 28, 2016 at 11:00 AM, Till Rohrmann <[hidden email]> wrote:

Hi Eric,

concerning your first question. I think that AdvertisingTopologyFlinkStateHighKeyCard models a different scenario where one tries to count the number ads per campaign for a large number of campaigns. In this scenario, the input data already contains the campaign id for each ad. I think this is the job for the paragraph "Winning Twitter Hack Week: Eliminating the key-value store bottleneck".

concerning your second question. The response actor is registered at the registration service. The registration service exposes the akka URL of this actor under the index of the running task. When you run AkkaStateQuery, the registration is queried to retrieve the akka URL and then a query state request is sent to the response actor via the QueryActor. That is how the actor comes into play.

At the moment the registration service is implemented using ZooKeeper. This means that the akka URL is written to ZooKeeper from where it can be retrieved.

I hope this answers your questions.

Cheers,
Till

On Fri, Oct 28, 2016 at 2:47 AM, Eric Fukuda <[hidden email]> wrote:
Hi,

I have two questions on the blog post on Yahoo! Streaming Benchmark with Flink [1].

First is about the join operation to associate ad_ids and campaign_ids. In flink.benchmark.state.AdvertisingTopologyFlinkStateHighKeyCard, I don't see this being done. Is there a reason for this?

Second is about Akka actor. Reading flink.benchmark.state.QueryableWindowOperator or flink.benchmark.state.QueryableWindowOperatorEvicting, it looks like the Akka actor is being prepared but not used in the actual processing (processElement()). Is this correct? And how do I enable Akka in the job?

[1] http://data-artisans.com/extending-the-yahoo-streaming-benchmark/

Thanks,
Eric