http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Running-a-Python-streaming-job-with-Java-dependencies-tp21817.html
Hi,
I'm trying to run a job with Flink's new Python streaming API but I'm running into issues with Java imports.
I have a Jython project in IntelliJ with a lot of Java dependencies configured through Maven. I can't figure out how to make Flink "see" these dependencies.
from org.apache.flink.streaming.api.functions.source import SourceFunction
from org.apache.flink.api.common.functions import FlatMapFunction, ReduceFunction
from org.apache.flink.api.java.functions import KeySelector
from org.apache.flink.streaming.api.windowing.time.Time import milliseconds
# Added an extra import, this fails with an ImportError
import com.google.gson.GsonBuilder
class Generator(SourceFunction):
def __init__(self, num_iters):
self._running = True
self._num_iters = num_iters
# ... rest of the file is as in the documentation
This runs without any exceptions when run from IntelliJ (assuming com.google.gson is added in the POM), but when I try to run it as a Flink job with this command:
./pyflink-stream.sh ~/flink-python/MinimalExample.py - --local
it fails to find the dependency:
Starting execution of program
Failed to run plan: null
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/t1/gcltcjcn5zdgqfqrc32xk90x85xkg9/T/flink_streaming_plan_0bfab09c-baeb-414f-a718-01a5c71b3507/MinimalExample.py", line 7, in <module>
ImportError: No module named google
How can I point pyflink-stream.sh to these Maven dependencies? I've tried modifying the script to add my .m2/ directory to the classpath (using flink run -C), but that didn't make any difference:
bin=`dirname "$0"`
bin=`cd "$bin"; pwd`
. "$bin"/config.sh
"$FLINK_BIN_DIR"/flink run -C "file:///Users/jmalt/.m2/" --class org.apache.flink.streaming.python.api.PythonStreamBinder -v "$FLINK_ROOT_DIR"/opt/flink-streaming-python*.jar "$@"
Thanks,
Joe Malt
Engineering Intern, Stream Processing
Yelp