Difference between using a global variable and broadcasting a variable

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Difference between using a global variable and broadcasting a variable

Hung
This post was updated on .
Hi,

What would be the difference between using global variable and broadcasting it?

A toy example:

// Using global
{
private static int num = 10;
{main...}

public class DivByTen implements FlatMapFunction<Tuple1<Double>, Tuple1<Double>> {
  @Override
  public void flatMap(Tuple1<Double>value, Collector<Tuple1<Double>> out) {
     out.collect(new Tuple1<Double>(value/ num));
  }
}}

// Using broadcasting :
{
{main...}
public static class DivByTen extends
                        RichGMapFunction<Tuple1<Double>, Tuple1<Double>>{

                private long num;

                @Override
                public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        num = getRuntimeContext().<Integer> getBroadcastVariable(
                                        "num").get(0);
                }

                @Override
                public void map(Tuple1<Double>value, Collector<Tuple1<Double>> out)) throws Exception{
                        out.collect(new Tuple1<Double>(value/num));
                }
        }
}

Best regards,

Hung
Reply | Threaded
Open this post in threaded view
|

Re: Difference between using a global variable and broadcasting a variable

Sebastian Schelter-2
Hi Hung,

A broadcast variable can also refer to an intermediate result of a Flink
computation.

Best,
Sebastian

On 25.04.2015 21:10, HungChang wrote:

> Hi,
>
> What would be the difference between using global variable and broadcasting
> it?
>
> A toy example:
>
> // Using global
> {{...
> private static int num = 10;
> }
>
> public class DivByTen implements FlatMapFunction<Tuple1&lt;Double>,
> Tuple1<Double>> {
>    @Override
>    public void flatMap(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out)
> {
>       out.collect(new Tuple1<Double>(value/ num));
>    }
> }}
>
> // Using broadcasting :
> {...
> public static class DivByTen extends
> RichGMapFunction<Tuple1&lt;Double>, Tuple1<Double>>{
>
> private long num;
>
> @Override
> public void open(Configuration parameters) throws Exception {
> super.open(parameters);
> num = getRuntimeContext().<Integer> getBroadcastVariable(
> "num").get(0);
> }
>
> @Override
> public void map(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out))
> throws Exception{
> out.collect(new Tuple1<Double>(value/num));
> }
> }
> }
>
> Best regards,
>
> Hung
>
>
>
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: Difference between using a global variable and broadcasting a variable

Fabian Hueske-2
You should also be aware that the value of a static variable is only accessible within the same JVM.
Flink is a distributed system and runs in multiple JVMs. So if you set a value in one JVM it is not visible in another JVM (on a different node).

In general, I would avoid to use static variables in Flink programs.

Best, Fabian

2015-04-26 9:54 GMT+02:00 Sebastian <[hidden email]>:
Hi Hung,

A broadcast variable can also refer to an intermediate result of a Flink computation.

Best,
Sebastian


On <a href="tel:25.04.2015%2021" value="+12504201521" target="_blank">25.04.2015 21:10, HungChang wrote:
Hi,

What would be the difference between using global variable and broadcasting
it?

A toy example:

// Using global
{{...
private static int num = 10;
}

public class DivByTen implements FlatMapFunction<Tuple1&lt;Double>,
Tuple1<Double>> {
   @Override
   public void flatMap(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out)
{
      out.collect(new Tuple1<Double>(value/ num));
   }
}}

// Using broadcasting :
{...
public static class DivByTen extends
                        RichGMapFunction<Tuple1&lt;Double>, Tuple1<Double>>{

                private long num;

                @Override
                public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        num = getRuntimeContext().<Integer> getBroadcastVariable(
                                        "num").get(0);
                }

                @Override
                public void map(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out))
throws Exception{                       
                        out.collect(new Tuple1<Double>(value/num));
                }
        }
}

Best regards,

Hung



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: Difference between using a global variable and broadcasting a variable

Stephan Ewen
Adding to Fabian's and Sebastian's answer:


Variable in Closure (global variable)
------------------------------------------------------
 - Happens when you reference some variable in the program from a function. The variable becomes part of the Function's closure.
 - The variable is distributed with the CODE. It is part of the function object and is distributed with by the TaskDeployment messages.
 - Data needs to be available in the driver program (cannot be a Flink DataSet, which lives distributedly)
 - Should be used for constants or config parameters or simple scalar values.

Summary: Small data that is available on the client (driver program)



Broadcast set
------------------------------------------------------
 - Refers to data that is produced by a Flink operation (DataSet) and that lives in the cluster, rather than on the client (or in the driver program)
 - Data distribution is part of the distributed data flow and happens through the Flink network stack
 - Can be much larger than the closure variables.
 - Should be used when you want to make an intermediate result of a Flink computation accessible to all functions.


Greetings,
Stephan



On Mon, Apr 27, 2015 at 10:23 AM, Fabian Hueske <[hidden email]> wrote:
You should also be aware that the value of a static variable is only accessible within the same JVM.
Flink is a distributed system and runs in multiple JVMs. So if you set a value in one JVM it is not visible in another JVM (on a different node).

In general, I would avoid to use static variables in Flink programs.

Best, Fabian

2015-04-26 9:54 GMT+02:00 Sebastian <[hidden email]>:
Hi Hung,

A broadcast variable can also refer to an intermediate result of a Flink computation.

Best,
Sebastian


On <a href="tel:25.04.2015%2021" value="+12504201521" target="_blank">25.04.2015 21:10, HungChang wrote:
Hi,

What would be the difference between using global variable and broadcasting
it?

A toy example:

// Using global
{{...
private static int num = 10;
}

public class DivByTen implements FlatMapFunction<Tuple1&lt;Double>,
Tuple1<Double>> {
   @Override
   public void flatMap(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out)
{
      out.collect(new Tuple1<Double>(value/ num));
   }
}}

// Using broadcasting :
{...
public static class DivByTen extends
                        RichGMapFunction<Tuple1&lt;Double>, Tuple1<Double>>{

                private long num;

                @Override
                public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        num = getRuntimeContext().<Integer> getBroadcastVariable(
                                        "num").get(0);
                }

                @Override
                public void map(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out))
throws Exception{                       
                        out.collect(new Tuple1<Double>(value/num));
                }
        }
}

Best regards,

Hung



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.



Reply | Threaded
Open this post in threaded view
|

Re: Difference between using a global variable and broadcasting a variable

Stephan Ewen
Hi!

I put a quick summary into the wiki. For future reference.


Greetings,
Stephan


On Mon, Apr 27, 2015 at 11:10 AM, Stephan Ewen <[hidden email]> wrote:
Adding to Fabian's and Sebastian's answer:


Variable in Closure (global variable)
------------------------------------------------------
 - Happens when you reference some variable in the program from a function. The variable becomes part of the Function's closure.
 - The variable is distributed with the CODE. It is part of the function object and is distributed with by the TaskDeployment messages.
 - Data needs to be available in the driver program (cannot be a Flink DataSet, which lives distributedly)
 - Should be used for constants or config parameters or simple scalar values.

Summary: Small data that is available on the client (driver program)



Broadcast set
------------------------------------------------------
 - Refers to data that is produced by a Flink operation (DataSet) and that lives in the cluster, rather than on the client (or in the driver program)
 - Data distribution is part of the distributed data flow and happens through the Flink network stack
 - Can be much larger than the closure variables.
 - Should be used when you want to make an intermediate result of a Flink computation accessible to all functions.


Greetings,
Stephan



On Mon, Apr 27, 2015 at 10:23 AM, Fabian Hueske <[hidden email]> wrote:
You should also be aware that the value of a static variable is only accessible within the same JVM.
Flink is a distributed system and runs in multiple JVMs. So if you set a value in one JVM it is not visible in another JVM (on a different node).

In general, I would avoid to use static variables in Flink programs.

Best, Fabian

2015-04-26 9:54 GMT+02:00 Sebastian <[hidden email]>:
Hi Hung,

A broadcast variable can also refer to an intermediate result of a Flink computation.

Best,
Sebastian


On <a href="tel:25.04.2015%2021" value="+12504201521" target="_blank">25.04.2015 21:10, HungChang wrote:
Hi,

What would be the difference between using global variable and broadcasting
it?

A toy example:

// Using global
{{...
private static int num = 10;
}

public class DivByTen implements FlatMapFunction<Tuple1&lt;Double>,
Tuple1<Double>> {
   @Override
   public void flatMap(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out)
{
      out.collect(new Tuple1<Double>(value/ num));
   }
}}

// Using broadcasting :
{...
public static class DivByTen extends
                        RichGMapFunction<Tuple1&lt;Double>, Tuple1<Double>>{

                private long num;

                @Override
                public void open(Configuration parameters) throws Exception {
                        super.open(parameters);
                        num = getRuntimeContext().<Integer> getBroadcastVariable(
                                        "num").get(0);
                }

                @Override
                public void map(Tuple1<Double>value, Collector<Tuple1&lt;Double>> out))
throws Exception{                       
                        out.collect(new Tuple1<Double>(value/num));
                }
        }
}

Best regards,

Hung



--
View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Difference-between-using-a-global-variable-and-broadcasting-a-variable-tp1128.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.