Hello everyone, I am trying to run Flink on Raspberry Pis. My first test for word count in a single node worked. I just have to decrease the Heap memory of the jobmanager.heap.mb and taskmanager.heap.mb to 512. My second test is to add 2 slave nodes I got the error: "Java HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the file log/flink-root-taskexecutor-0-*.out. This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html) says that in order to Raspberry Pi ARM architecture works with JVM it is necessary to configure the JVM as: -Xms600M -Xmx600M -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode then I set this variables on the path inside the file flink-conf.yaml env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode" and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in this release." is not showing anymore. However, the connection from the master node to the slave node is still not possible. Does anybody know how I must configure flink to deal with that? This is the error stack trace: 2017-05-25 12:40:26,421 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING to RUNNING. 2017-05-25 12:40:26,891 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Source: Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched from RUNNING to FAILED. java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) at java.lang.Thread.run(Thread.java:745) 2017-05-25 12:40:26,898 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Socket Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state RUNNING to FAILING. java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) at java.lang.Thread.run(Thread.java:745) 2017-05-25 12:40:26,921 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING. 2017-05-25 12:40:26,975 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to CANCELED. Thanks, Felipe |
Did you start job manager and task manager on the same resbery pi ? On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <[hidden email]> wrote:
|
yes. when I execute the jps command on the master node I see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I believe it is the jobManager). On the slave nodes I see TaskManagerRunner when I run jps command On Mon, Aug 6, 2018 at 12:13 PM miki haiat <[hidden email]> wrote:
|
Hi Felipe, From the exception information, it seems that you did not start the socket server, the socket source needs to connect to the socket server. Please make sure the socket server has started and is available. Thanks, vino. 2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <[hidden email]>:
|
do you mean "nc -l 9000"? If so, I did start before. the task manager running on the master can connect to the job manager. but the task manager on the slave node cannot. The second time that I start the WordCount task it recognizes only one task manager (from the master) and runs my task. But the task manager from the slave does not process anything and it is started. here is the error stack trace from the slave node: 2017-05-30 05:10:39,853 INFO org.apache.flink.runtime.state.heap.HeapKeyedStateBackend - Initializing heap keyed state backend with stream factory. 2017-05-30 05:10:39,977 INFO org.apache.flink.runtime.taskmanager.Task - Source: Socket Stream -> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED. java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) at java.lang.Thread.run(Thread.java:745) 2017-05-30 05:10:40,016 INFO org.apache.flink.runtime.taskmanager.Task - Freeing task resources for Source: Socket Stream -> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23). On Mon, Aug 6, 2018 at 2:17 PM vino yang <[hidden email]> wrote:
|
Hi,
nc exits after the first connection is closed. Are you re-running the nc command every time the job finishes? The stacktrace you copied does not indicate that a TaskManager cannot connect to the JobManager. I can only see that the SocketTextStreamFunction (from the SocketWindowWordCount job?) cannot open the connection to the address that you specified. Can you try to run examples/streaming/WordCount.jar. It is a simpler job which does not rely on external dependencies. If all the above fails, can you tell us how you submit the job? Can you post the full command? Can you also post the full JobManager & TaskManager logs? Best, Gary On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <[hidden email]> wrote:
|
yes. with this example (examples/streaming/WordCount.jar) my cluster worked. the file log/*out from the master is still empty and the file log/*out from the slave node has my result. The dashboard also shows that the job is completed. So, like you said there are some external dependencies that I didn`t include in my deploy. Do you have any clue? I am following the original quickstart (https://ci.apache.org/projects/flink/flink-docs-master/quickstart/setup_quickstart.html) Kind Regards, Felipe On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <[hidden email]> wrote:
|
Hi Felipe, You got the result? And the web UI shown the job is completed? If it throws the exception you provided, the job's status should be failed. Thanks, vino. 2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <[hidden email]>:
|
Hi Vino, the UI shows the job as completed. I had run "./bin/flink run examples/streaming/WordCount.jar" and I get no error. When I start netcat "nc -l 9000" and in other terminal I run "./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000" I have this exception. Starting execution of program ------------------------------------------------------------ The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: java.net.ConnectException: Connection refused at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:264) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:785) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:279) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025) at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836) at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87) at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56) at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) at java.lang.Thread.run(Thread.java:745) On Mon, Aug 6, 2018 at 5:54 PM vino yang <[hidden email]> wrote:
|
Hi,
Can you try submitting with: ./bin/flink run examples/streaming/SocketWindowWordCount.jar --hostname <IP> --port 9000 where IP is the IP of the node where you started nc? If not specified, the default hostname is localhost. This problematic is if the source operator is scheduled on a different physical machine. Best, Gary On Mon, Aug 6, 2018 at 6:01 PM, Felipe Gutierrez <[hidden email]> wrote:
|
Worked! this was exactly the problem. I have to set the IP otherwise it does not accept the jobs that I submit. Even if I set the IP and localhost at the /etc/hosts file and the command "ping localhost" returns my IP, it does not work. It is mandatory to use --hostname <IP>. Thanks Gary. Best Regards, Felipe On Mon, Aug 6, 2018 at 9:57 PM Gary Yao <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |