Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

Subramanya Suresh
Hi, 
We are running into some trouble with StreamExecutionEnvironment.registerCachedFile (works perfectly fine in 1.4.2). 

  • We register some CSV files in HDFS with executionEnvironment.registerCachedFile("hdfs:///myPath/myCsv",  myCSV.csv)
  • In a UDF (ScalarFunction), in the open function, we do a FunctionContext.getCachedFile("myCSV) to load the CSV in a singleton. 
We are running into a 
java.lang.IllegalArgumentException: File with name 'myCSV.csv' is not available. Did you forget to register the file?

Sincerely, 

--

Reply | Threaded
Open this post in threaded view
|

Re: Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

Fabian Hueske-2
Hi,

The functionality of the SQL ScalarFunction is backed by Flink's distributed cache and just passes on the function call.
I tried it locally on my machine and it works for me.

What is your setup? Are you running on Yarn?

Maybe Chesnay or Dawid (added to CC) can help to track the problem down.

Best, Fabian

2018-09-18 6:10 GMT+02:00 Subramanya Suresh <[hidden email]>:
Hi, 
We are running into some trouble with StreamExecutionEnvironment.registerCachedFile (works perfectly fine in 1.4.2). 

  • We register some CSV files in HDFS with executionEnvironment.registerCachedFile("hdfs:///myPath/myCsv",  myCSV.csv)
  • In a UDF (ScalarFunction), in the open function, we do a FunctionContext.getCachedFile("myCSV) to load the CSV in a singleton. 
We are running into a 
java.lang.IllegalArgumentException: File with name 'myCSV.csv' is not available. Did you forget to register the file?

Sincerely, 

--


Reply | Threaded
Open this post in threaded view
|

Re: Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

Subramanya Suresh
Yes it works locally for me as well. We are running on YARN where it fails on 1.6.0 though (works fine with 1.4.2). 

Regards, 

On Tue, Sep 18, 2018 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

The functionality of the SQL ScalarFunction is backed by Flink's distributed cache and just passes on the function call.
I tried it locally on my machine and it works for me.

What is your setup? Are you running on Yarn?

Maybe Chesnay or Dawid (added to CC) can help to track the problem down.

Best, Fabian

2018-09-18 6:10 GMT+02:00 Subramanya Suresh <[hidden email]>:
Hi, 
We are running into some trouble with StreamExecutionEnvironment.registerCachedFile (works perfectly fine in 1.4.2). 

  • We register some CSV files in HDFS with executionEnvironment.registerCachedFile("hdfs:///myPath/myCsv",  myCSV.csv)
  • In a UDF (ScalarFunction), in the open function, we do a FunctionContext.getCachedFile("myCSV) to load the CSV in a singleton. 
We are running into a 
java.lang.IllegalArgumentException: File with name 'myCSV.csv' is not available. Did you forget to register the file?

Sincerely, 

--





--

Reply | Threaded
Open this post in threaded view
|

Re: Migration to Flink 1.6.0, issues with StreamExecutionEnvironment.registerCachedFile

Dawid Wysakowicz-2

Hi Subramanya,

I could reproduce this behavior running a job in YARN cluster. This works in standalone cluster just fine. We've changed a little bit how the cache entries are distributed in 1.6.0. I am investigating this problem right now. Would you like to create a JIRA bug for it?

Best,
Dawid

On 19/09/18 06:41, Subramanya Suresh wrote:
Yes it works locally for me as well. We are running on YARN where it fails on 1.6.0 though (works fine with 1.4.2). 

Regards, 

On Tue, Sep 18, 2018 at 1:47 AM, Fabian Hueske <[hidden email]> wrote:
Hi,

The functionality of the SQL ScalarFunction is backed by Flink's distributed cache and just passes on the function call.
I tried it locally on my machine and it works for me.

What is your setup? Are you running on Yarn?

Maybe Chesnay or Dawid (added to CC) can help to track the problem down.

Best, Fabian

2018-09-18 6:10 GMT+02:00 Subramanya Suresh <[hidden email]>:
Hi, 
We are running into some trouble with StreamExecutionEnvironment.registerCachedFile (works perfectly fine in 1.4.2). 

  • We register some CSV files in HDFS with executionEnvironment.registerCachedFile("hdfs:///myPath/myCsv",  myCSV.csv)
  • In a UDF (ScalarFunction), in the open function, we do a FunctionContext.getCachedFile("myCSV) to load the CSV in a singleton. 
We are running into a 
java.lang.IllegalArgumentException: File with name 'myCSV.csv' is not available. Did you forget to register the file?

                            
Sincerely, 

--





--



signature.asc (849 bytes) Download Attachment