Adding to Fabian's and Sebastian's answer:Variable in Closure (global variable)------------------------------------------------------- Happens when you reference some variable in the program from a function. The variable becomes part of the Function's closure.- The variable is distributed with the CODE. It is part of the function object and is distributed with by the TaskDeployment messages.- Data needs to be available in the driver program (cannot be a Flink DataSet, which lives distributedly)- Should be used for constants or config parameters or simple scalar values.Summary: Small data that is available on the client (driver program)Broadcast set------------------------------------------------------- Refers to data that is produced by a Flink operation (DataSet) and that lives in the cluster, rather than on the client (or in the driver program)- Data distribution is part of the distributed data flow and happens through the Flink network stack- Can be much larger than the closure variables.- Should be used when you want to make an intermediate result of a Flink computation accessible to all functions.Greetings,StephanOn Mon, Apr 27, 2015 at 10:23 AM, Fabian Hueske <[hidden email]> wrote:Best, FabianIn general, I would avoid to use static variables in Flink programs.You should also be aware that the value of a static variable is only accessible within the same JVM.Flink is a distributed system and runs in multiple JVMs. So if you set a value in one JVM it is not visible in another JVM (on a different node).2015-04-26 9:54 GMT+02:00 Sebastian <[hidden email]>:Hi Hung,
A broadcast variable can also refer to an intermediate result of a Flink computation.
Best,
Sebastian
On <a href="tel:25.04.2015%2021" value="+12504201521" target="_blank">25.04.2015 21:10, HungChang wrote:
Hi,
What would be the difference between using global variable and broadcasting
it?
A toy example:
// Using global
{{...
private static int num = 10;
}
public class DivByTen implements FlatMapFunction<Tuple1<Double>,
Tuple1<Double>> {
@Override
public void flatMap(Tuple1<Double>value, Collector<Tuple1<Double>> out)
{
out.collect(new Tuple1<Double>(value/ num));
}
}}
// Using broadcasting :
{...
public static class DivByTen extends
RichGMapFunction<Tuple1<Double>, Tuple1<Double>>{
private long num;
@Override
public void open(Configuration parameters) throws Exception {
super.open(parameters);
num = getRuntimeContext().<Integer> getBroadcastVariable(
"num").get(0);
}
@Override
public void map(Tuple1<Double>value, Collector<Tuple1<Double>> out))
throws Exception{
out.collect(new Tuple1<Double>(value/num));
}
}
}
Best regards,
Hung
--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Free forum by Nabble | Edit this page |