I call Pandas UDF N times, do I have to initiate the UDF N times?

Posted by Yik San Chan on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/I-call-Pandas-UDF-N-times-do-I-have-to-initiate-the-UDF-N-times-tp43576.html

Hi community,

I am using PyFlink and Pandas UDF in my job.

The job executes a SQL like this:

```
SELECT
LABEL_ENCODE(a),
LABEL_ENCODE(b),
LABEL_ENCODE(c)
...
```

And my LABEL_ENCODE UDF is defined below:

```
class LabelEncode(ScalarFunction):
  def open(self, function_context):
    logging.info("LabelEncode.open")
    self.encoder = load_encoder()
  def eval(self, x):
    ...

labelEncode = udf(LabelEncode(), ...)
```

When I run the job, according to taskmanger log, "LabelEncode.open" is printed 3 times, which is exactly the times LABEL_ENCODE udf is called.

Since every LabelEncode.open causes an I/O (load_encoder() does so), I wonder if I can only initiate the UDF once, and use it 3 times?

Thank you!

Best,
Yik San