http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128p1140.html
Adding to Fabian's and Sebastian's answer:
Variable in Closure (global variable)
------------------------------------------------------
- Happens when you reference some variable in the program from a function. The variable becomes part of the Function's closure.
- The variable is distributed with the CODE. It is part of the function object and is distributed with by the TaskDeployment messages.
- Data needs to be available in the driver program (cannot be a Flink DataSet, which lives distributedly)
- Should be used for constants or config parameters or simple scalar values.
Summary: Small data that is available on the client (driver program)
Broadcast set
------------------------------------------------------
- Refers to data that is produced by a Flink operation (DataSet) and that lives in the cluster, rather than on the client (or in the driver program)
- Data distribution is part of the distributed data flow and happens through the Flink network stack
- Can be much larger than the closure variables.
- Should be used when you want to make an intermediate result of a Flink computation accessible to all functions.
Greetings,
Stephan