print() method does not always print on the taskmanager.out file

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

print() method does not always print on the taskmanager.out file

Felipe Gutierrez
Hello,

I am studying the parallelism of tasks on DataStream. So, I have configured Flink to execute on my machine (master node) and one virtual machine (worker node).  The master has 4 cores (taskmanager.numberOfTaskSlots: 4) and the worker only 2 cores (taskmanager.numberOfTaskSlots: 2). I don't need to set this on the 'conf/flink-conf.yaml', this was just to ensure that I am relating the properties with the right concepts.

When I create a application with parallelism of 1, 2, or 4, sometimes I can see the output of the "print()" method, other times no. I checke the output files of the task managers ("flink-flink-taskexecutor-0-master.out" or "flink-flink-taskexecutor-0-worker.out") and I cancel the job and start it again. All of sudden I can see the output on the .out file.
I was thinking that it was because I am creating a job with more parallelism that the cluster supports, but this behavior also happens when I set the parallelism of my job to less than the slots available.

I guess if I see on the Flink dashboar X Task slots available and when I deploy my Job, the Job is running and the slots available decreased according to the number of parallelims of my Job, everything should be correct, doesn't it? I also created a Dummy Sink just to print the output, but the behavior is the same.


Thanks,
Felipe
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez
Reply | Threaded
Open this post in threaded view
|

Re: print() method does not always print on the taskmanager.out file

Chesnay Schepler
This kind of sounds like a Outputstream flushing issue. Try calling "System.out.flush()" now and then in your sink and report back.

On 04/04/2019 18:04, Felipe Gutierrez wrote:
Hello,

I am studying the parallelism of tasks on DataStream. So, I have configured Flink to execute on my machine (master node) and one virtual machine (worker node).  The master has 4 cores (taskmanager.numberOfTaskSlots: 4) and the worker only 2 cores (taskmanager.numberOfTaskSlots: 2). I don't need to set this on the 'conf/flink-conf.yaml', this was just to ensure that I am relating the properties with the right concepts.

When I create a application with parallelism of 1, 2, or 4, sometimes I can see the output of the "print()" method, other times no. I checke the output files of the task managers ("flink-flink-taskexecutor-0-master.out" or "flink-flink-taskexecutor-0-worker.out") and I cancel the job and start it again. All of sudden I can see the output on the .out file.
I was thinking that it was because I am creating a job with more parallelism that the cluster supports, but this behavior also happens when I set the parallelism of my job to less than the slots available.

I guess if I see on the Flink dashboar X Task slots available and when I deploy my Job, the Job is running and the slots available decreased according to the number of parallelims of my Job, everything should be correct, doesn't it? I also created a Dummy Sink just to print the output, but the behavior is the same.


Thanks,
Felipe
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


Reply | Threaded
Open this post in threaded view
|

Re: print() method does not always print on the taskmanager.out file

Felipe Gutierrez
no. It did not work.

I also created a Sink that is a MQTT publisher (https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/mqtt/MqttSensorPublisher.java) and on my eclipse it works. When I deploy my job on my Flink cluster it does not work. It might be something wrong with my cluster configuration.

Something that I did was comment the line "# 127.0.1.1 ubuntu16-worker01" on the "/etc/hosts" file in order to the JobManager find the TaskManager. I commented on this line also on the master node. The master is my machine and the worker is a virtual machine.


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Fri, Apr 5, 2019 at 2:50 PM Chesnay Schepler <[hidden email]> wrote:
This kind of sounds like a Outputstream flushing issue. Try calling "System.out.flush()" now and then in your sink and report back.

On 04/04/2019 18:04, Felipe Gutierrez wrote:
Hello,

I am studying the parallelism of tasks on DataStream. So, I have configured Flink to execute on my machine (master node) and one virtual machine (worker node).  The master has 4 cores (taskmanager.numberOfTaskSlots: 4) and the worker only 2 cores (taskmanager.numberOfTaskSlots: 2). I don't need to set this on the 'conf/flink-conf.yaml', this was just to ensure that I am relating the properties with the right concepts.

When I create a application with parallelism of 1, 2, or 4, sometimes I can see the output of the "print()" method, other times no. I checke the output files of the task managers ("flink-flink-taskexecutor-0-master.out" or "flink-flink-taskexecutor-0-worker.out") and I cancel the job and start it again. All of sudden I can see the output on the .out file.
I was thinking that it was because I am creating a job with more parallelism that the cluster supports, but this behavior also happens when I set the parallelism of my job to less than the slots available.

I guess if I see on the Flink dashboar X Task slots available and when I deploy my Job, the Job is running and the slots available decreased according to the number of parallelims of my Job, everything should be correct, doesn't it? I also created a Dummy Sink just to print the output, but the behavior is the same.


Thanks,
Felipe
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


Reply | Threaded
Open this post in threaded view
|

Re: print() method does not always print on the taskmanager.out file

Felipe Gutierrez
I guess there is something to do with the parallelism of the cluster. When I set "taskmanager.numberOfTaskSlots" to 1 and do not use "setParallelism()" I can see the logs. And on Eclipse I can see the logs.

Does anybody have a clue?
Thanks
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Fri, Apr 5, 2019 at 5:10 PM Felipe Gutierrez <[hidden email]> wrote:
no. It did not work.

I also created a Sink that is a MQTT publisher (https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/mqtt/MqttSensorPublisher.java) and on my eclipse it works. When I deploy my job on my Flink cluster it does not work. It might be something wrong with my cluster configuration.

Something that I did was comment the line "# 127.0.1.1 ubuntu16-worker01" on the "/etc/hosts" file in order to the JobManager find the TaskManager. I commented on this line also on the master node. The master is my machine and the worker is a virtual machine.


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Fri, Apr 5, 2019 at 2:50 PM Chesnay Schepler <[hidden email]> wrote:
This kind of sounds like a Outputstream flushing issue. Try calling "System.out.flush()" now and then in your sink and report back.

On 04/04/2019 18:04, Felipe Gutierrez wrote:
Hello,

I am studying the parallelism of tasks on DataStream. So, I have configured Flink to execute on my machine (master node) and one virtual machine (worker node).  The master has 4 cores (taskmanager.numberOfTaskSlots: 4) and the worker only 2 cores (taskmanager.numberOfTaskSlots: 2). I don't need to set this on the 'conf/flink-conf.yaml', this was just to ensure that I am relating the properties with the right concepts.

When I create a application with parallelism of 1, 2, or 4, sometimes I can see the output of the "print()" method, other times no. I checke the output files of the task managers ("flink-flink-taskexecutor-0-master.out" or "flink-flink-taskexecutor-0-worker.out") and I cancel the job and start it again. All of sudden I can see the output on the .out file.
I was thinking that it was because I am creating a job with more parallelism that the cluster supports, but this behavior also happens when I set the parallelism of my job to less than the slots available.

I guess if I see on the Flink dashboar X Task slots available and when I deploy my Job, the Job is running and the slots available decreased according to the number of parallelims of my Job, everything should be correct, doesn't it? I also created a Dummy Sink just to print the output, but the behavior is the same.


Thanks,
Felipe
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


Reply | Threaded
Open this post in threaded view
|

Re: print() method does not always print on the taskmanager.out file

Felipe Gutierrez
Ok, I figured out what was happening. I was not passing the IP of the virtual machine which generates the source using MQTT protocol. So, I was seeing results only if the operator was placed on the machine that was generating the data (the virtual machine). If the operator was placed on the other machine I cannot see the output.

Thanks for your help anyway Chesnay!
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Fri, Apr 5, 2019 at 6:01 PM Felipe Gutierrez <[hidden email]> wrote:
I guess there is something to do with the parallelism of the cluster. When I set "taskmanager.numberOfTaskSlots" to 1 and do not use "setParallelism()" I can see the logs. And on Eclipse I can see the logs.

Does anybody have a clue?
Thanks
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Fri, Apr 5, 2019 at 5:10 PM Felipe Gutierrez <[hidden email]> wrote:
no. It did not work.

I also created a Sink that is a MQTT publisher (https://github.com/felipegutierrez/explore-flink/blob/master/src/main/java/org/sense/flink/mqtt/MqttSensorPublisher.java) and on my eclipse it works. When I deploy my job on my Flink cluster it does not work. It might be something wrong with my cluster configuration.

Something that I did was comment the line "# 127.0.1.1 ubuntu16-worker01" on the "/etc/hosts" file in order to the JobManager find the TaskManager. I commented on this line also on the master node. The master is my machine and the worker is a virtual machine.


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Fri, Apr 5, 2019 at 2:50 PM Chesnay Schepler <[hidden email]> wrote:
This kind of sounds like a Outputstream flushing issue. Try calling "System.out.flush()" now and then in your sink and report back.

On 04/04/2019 18:04, Felipe Gutierrez wrote:
Hello,

I am studying the parallelism of tasks on DataStream. So, I have configured Flink to execute on my machine (master node) and one virtual machine (worker node).  The master has 4 cores (taskmanager.numberOfTaskSlots: 4) and the worker only 2 cores (taskmanager.numberOfTaskSlots: 2). I don't need to set this on the 'conf/flink-conf.yaml', this was just to ensure that I am relating the properties with the right concepts.

When I create a application with parallelism of 1, 2, or 4, sometimes I can see the output of the "print()" method, other times no. I checke the output files of the task managers ("flink-flink-taskexecutor-0-master.out" or "flink-flink-taskexecutor-0-worker.out") and I cancel the job and start it again. All of sudden I can see the output on the .out file.
I was thinking that it was because I am creating a job with more parallelism that the cluster supports, but this behavior also happens when I set the parallelism of my job to less than the slots available.

I guess if I see on the Flink dashboar X Task slots available and when I deploy my Job, the Job is running and the slots available decreased according to the number of parallelims of my Job, everything should be correct, doesn't it? I also created a Dummy Sink just to print the output, but the behavior is the same.


Thanks,
Felipe
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez