How to use broadcast variables in data stream

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to use broadcast variables in data stream

zhen li
Hi,all:
    I want to use some other broadcast  resources such as list or map  in the flatmap function or customer triggers, but I don’t find some api to satisfy.
    Anyone can help?
    thanks  



Reply | Threaded
Open this post in threaded view
|

Re: How to use broadcast variables in data stream

Fabian Hueske-2
Hi,

if the list is static and not too large, you can pass it as a parameter to the function.
Function objects are serialized (using Java's default serialization) and shipped to the workers for execution.

If the data is dynamic, you might want to have a look at Broadcast state [1].

Best, Fabian


2018-06-21 4:38 GMT+02:00 zhen li <[hidden email]>:
Hi,all:
    I want to use some other broadcast  resources such as list or map  in the flatmap function or customer triggers, but I don’t find some api to satisfy.
    Anyone can help?
    thanks   




Reply | Threaded
Open this post in threaded view
|

Re: How to use broadcast variables in data stream

zhen li
Thanks for your reply.
But  broadcast state seems not supported in Flink-1.3 .
I found this in Flink-1.3:
Broadcasting
DataStream → DataStream

Broadcasts elements to every partition.

dataStream.broadcast();
But I don’t know how to convert it to list and get it in stream context .

Reply | Threaded
Open this post in threaded view
|

Re: How to use broadcast variables in data stream

Fabian Hueske-2
That's correct.

Broadcast state was added with Flink 1.5. You can use DataStream.broadcast() in Flink 1.3 but it has a few limitations.
For example, you cannot connect a keyed and a broadcasted stream.

2018-06-21 11:58 GMT+02:00 zhen li <[hidden email]>:
Thanks for your reply.
But  broadcast state seems not supported in Flink-1.3 .
I found this in Flink-1.3:
Broadcasting
DataStream → DataStream

Broadcasts elements to every partition.

dataStream.broadcast();
But I don’t know how to convert it to list and get it in stream context .


Reply | Threaded
Open this post in threaded view
|

Re: How to use broadcast variables in data stream

zhen li
Hi,Fabian:
    I use connected stream to solve this problem,one is config stream that load some config data from redis.another is real data stream. Due to the config data is vary big, then the config stream is slowly than the real data stream. When use some config to deal data in flatmap2, it arise the NullPointerException. Is there any config to make one stream finished and then start the second? or any other ways to solve this problem?
.

在 2018年6月21日,下午6:31,Fabian Hueske <[hidden email]> 写道:

That's correct.

Broadcast state was added with Flink 1.5. You can use DataStream.broadcast() in Flink 1.3 but it has a few limitations.
For example, you cannot connect a keyed and a broadcasted stream.

2018-06-21 11:58 GMT+02:00 zhen li <[hidden email]>:
Thanks for your reply.
But  broadcast state seems not supported in Flink-1.3 .
I found this in Flink-1.3:
Broadcasting
DataStream → DataStream

Broadcasts elements to every partition.

dataStream.broadcast();
But I don’t know how to convert it to list and get it in stream context .



Reply | Threaded
Open this post in threaded view
|

Re: How to use broadcast variables in data stream

Rong Rong
Hi Zhen,

This might be a rather inefficient solution. We have encountered situations that we need to have some daily config update pushed to our flink streaming application, where the state is very large (but keyed). We end-up having a service to push that data into a separated kafka stream (which basically has one burst of messages each day), they used a stream-stream join to consolidate the configuration with the actual real data stream. 

This would only work if your real-stream and your configuration are basically sharing the same keyed dimension though.

--
Rong

On Sat, Jun 30, 2018 at 2:37 AM zhen li <[hidden email]> wrote:
Hi,Fabian:
    I use connected stream to solve this problem,one is config stream that load some config data from redis.another is real data stream. Due to the config data is vary big, then the config stream is slowly than the real data stream. When use some config to deal data in flatmap2, it arise the NullPointerException. Is there any config to make one stream finished and then start the second? or any other ways to solve this problem?
.

在 2018年6月21日,下午6:31,Fabian Hueske <[hidden email]> 写道:

That's correct.

Broadcast state was added with Flink 1.5. You can use DataStream.broadcast() in Flink 1.3 but it has a few limitations.
For example, you cannot connect a keyed and a broadcasted stream.

2018-06-21 11:58 GMT+02:00 zhen li <[hidden email]>:
Thanks for your reply.
But  broadcast state seems not supported in Flink-1.3 .
I found this in Flink-1.3:
Broadcasting
DataStream → DataStream

Broadcasts elements to every partition.

dataStream.broadcast();
But I don’t know how to convert it to list and get it in stream context .