Hi, I have a PyFlink job that consists of:
Here's a simplified structure of the code layout. flink/ ├── deps │ ├── jar │ │ ├── flink-connector-kafka_2.11-1.12.2.jar │ │ └── kafka-clients-2.4.1.jar │ └── pip │ └── requirements.txt ├── conf │ └── job.yaml └── job ├── some_file_x.py ├── some_file_y.py └── main.py I'm able to execute this job running it locally i.e. invoking something like: python main.py --config <path_to_job_yaml> I'm loading the jars inside the Python code, using env.add_jars(...). Now, the next step is to submit this job to a Flink cluster running on K8S. I'm looking for any best practices in packaging and specifying dependencies that people tend to follow. As per the documentation here [1], various Python files, including the conf YAML, can be specified using the --pyFiles option and Java dependencies can be specified using --jarfile option. So, how can I specify 3rdparty Python package dependencies? According to another piece of documentation here [2], I should be able to specify the requirements.txt directly inside the code and submit it via the --pyFiles option. Is that right? Are there any other best practices folks use to package/submit jobs? Thanks, Sumeet |
Hi Sumeet, Is there a problem with the documented approaches on how to submit the Python program (not working) or are you asking in general? Given the documentation, I would assume that you can configure the requirements.txt via `set_python_requirements`. I am also pulling in Dian who might be able to tell you more about the Python deployment options. If you are not running on a session cluster, then you can also create a K8s image which contains your user code. That way you ship your job when deploying the cluster. Cheers, Till On Wed, Apr 28, 2021 at 10:17 AM Sumeet Malhotra <[hidden email]> wrote:
|
Hi Till, There’s no problem with the documented approach. I was looking if there were any standardized ways of organizing, packaging and deploying Python code on a Flink cluster. Thanks, Sumeet On Thu, Apr 29, 2021 at 12:37 PM Till Rohrmann <[hidden email]> wrote:
|
Alright. Then let's see what Dian recommends to do. Cheers, Till On Thu, Apr 29, 2021 at 9:25 AM Sumeet Malhotra <[hidden email]> wrote:
|
In reply to this post by Sumeet Malhotra
Hi Sumeet,
For the Python dependencies, multiple ways have been provided to specify them and you could take either way of them. Regarding to requirements.txt, there are 3 ways provided and you could specify it via either of them: - API inside the code: set_python_requirements - command line option: -pyreq [1] - configuration: python.requirements So you don’t need to specify them both inside the code and the command line options. PS: It seems that -pyreq is missing from the latest CLI documentation, however, actually it’s there and you could refer to the 1.11 documentation for now. I’ll try to add it back ASAP. Regards, Dian
|
Hi Sumeet, FYI: the documentation about the CLI options of PyFlink has already been updated [1]. [1] https://ci.apache.org/projects/flink/flink-docs-release-1.12/deployment/cli.html#submitting-pyflink-jobs Regards, Dian On Thu, Apr 29, 2021 at 4:46 PM Dian Fu <[hidden email]> wrote:
|
Thanks for updating the documentation Dian. Appreciate it. ..Sumeet On Sun, May 2, 2021 at 10:53 AM Dian Fu <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |