Pyflink 1.10.0 issue on cluster

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Pyflink 1.10.0 issue on cluster

rookieCOder
This post was updated on .
'm coding with pyflink 1.10.0 and building cluster with flink 1.10.0
I define the source and the sink as following:
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2674/%E6%97%A0%E6%A0%87%E9%A2%98.png
When I run this code only on master, it's OK. When I run this code on cluster, with 1 master and 1 salve, and I submit the task on master like this:
sudo flink-1.10.0/bin/flink run -py main.py
And error occurs like:
Caused by: java.io.FileNotFoundException: The provided file path
/opt/raw_data/input_json_65adbe54-cfdc-11ea-9d47-020012970011 does not exist.
This file is stored on master's local file system. It seems that the slaves
read their own file system instead of the master's. Or maybe there are other
points I ignored (maybe some configurations in flink when I start the cluster).
The question is how can I avoid the error?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Pyflink 1.10.0 issue on cluster

Xingbo Huang
Hi rookieCOder,
You need to make sure that your files can be read by each slaves, so an alternative solution is to put your files on hdfs

Best,
Xingbo

rookieCOder <[hidden email]> 于2020年7月27日周一 下午5:49写道:
'm coding with pyflink 1.10.0 and building cluster with flink 1.10.0
I define the source and the sink as following:
<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t2674/%E6%97%A0%E6%A0%87%E9%A2%98.png>
When I run this code only on master, it's OK. When I run this code on
cluster, with 1 master and 1 salve, and I submit the task on master like
this:
sudo flink-1.10.0/bin/flink run -py main.py
And error occurs like:
Caused by: java.io.FileNotFoundException: The provided file path
/opt/raw_data/input_json_65adbe54-cfdc-11ea-9d47-020012970011 does not
exist.
This file is stored on master's local file system. It seems that the slaves
read
their own file system instead of the master's. Or maybe there are other
reasons.
The question is how can I avoid the error?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Pyflink 1.10.0 issue on cluster

rookieCOder
Hi, Xingbo
Thanks for your reply.
So the point is that simply link the source or the sink to the master's
local file system will cause the error that the slaves cannot read the
source/sink files? Thus the simplest solution is to make sure that slaves
have access to the master's local filesystem (by nfs or hdfs)?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Pyflink 1.10.0 issue on cluster

Xingbo Huang
Yes. You are right.

Best,
Xingbo

rookieCOder <[hidden email]> 于2020年7月27日周一 下午6:30写道:
Hi, Xingbo
Thanks for your reply.
So the point is that simply link the source or the sink to the master's
local file system will cause the error that the slaves cannot read the
source/sink files? Thus the simplest solution is to make sure that slaves
have access to the master's local filesystem (by nfs or hdfs)?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Pyflink 1.10.0 issue on cluster

rookieCOder
Hi,
And I've got another question.
If I use user-defined function in pyflink, which only depends library A. And
what the flink does is using the udf in tables.
Does that mean I only need to install library A on the slaves?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: Pyflink 1.10.0 issue on cluster

Xingbo Huang
Hi,
You can use the `set_python_requirements` method to specify your requirement.txt which you can refer to the documentation[1] for details

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/python/dependency_management.html#python-dependency

Best,
Xingbo

rookieCOder <[hidden email]> 于2020年7月27日周一 下午8:29写道:
Hi,
And I've got another question.
If I use user-defined function in pyflink, which only depends library A. And
what the flink does is using the udf in tables.
Does that mean I only need to install library A on the slaves?



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/