How to submit two jobs sequentially and view their outputs in .out file?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How to submit two jobs sequentially and view their outputs in .out file?

Komal Mariam
Dear all,

Thank you for your help regarding my previous queries. Unfortunately, I'm stuck with another one and will really appreciate your input. 

I can't seem to produce any outputs in "flink-taskexecutor-0.out" from my second job after submitting the first one in my 3-node-flink standalone cluster.

Say I want to test out two jobs sequentially. (I do not want to run them concurrently/in parallel).

After submitting "job1.jar " via command line, I press "Ctrl + C" to exit from it (as it runs infinitely). After that I
try to submit a second jar file having the same properties (group-id, topic, etc) with the only difference being the query written in main function.

The first job produces relevant outputs in "flink-taskexecutor-0.out" but the second one doesn't.

The only way I can see the output produced is if I restart the cluster after job1 and then submit job2 as it produces another .out file.

But I want to submit 2 jobs sequentially and see their outputs without having to restart my cluster. Is there any way to do this?

Additional info:
For both jobs I'm using DataStream API and I have set:
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Best Regards,
Komal
Reply | Threaded
Open this post in threaded view
|

Re: How to submit two jobs sequentially and view their outputs in .out file?

vino yang
Hi Komal,

Since you use the Flink standalone deployment mode, the tasks of the jobs which print information to the STDOUT may randomly deploy in any task manager of the cluster. Did you check other Task Managers out file?

Best,
Vino

Komal Mariam <[hidden email]> 于2019年11月22日周五 下午6:59写道:
Dear all,

Thank you for your help regarding my previous queries. Unfortunately, I'm stuck with another one and will really appreciate your input. 

I can't seem to produce any outputs in "flink-taskexecutor-0.out" from my second job after submitting the first one in my 3-node-flink standalone cluster.

Say I want to test out two jobs sequentially. (I do not want to run them concurrently/in parallel).

After submitting "job1.jar " via command line, I press "Ctrl + C" to exit from it (as it runs infinitely). After that I
try to submit a second jar file having the same properties (group-id, topic, etc) with the only difference being the query written in main function.

The first job produces relevant outputs in "flink-taskexecutor-0.out" but the second one doesn't.

The only way I can see the output produced is if I restart the cluster after job1 and then submit job2 as it produces another .out file.

But I want to submit 2 jobs sequentially and see their outputs without having to restart my cluster. Is there any way to do this?

Additional info:
For both jobs I'm using DataStream API and I have set:
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Best Regards,
Komal
Reply | Threaded
Open this post in threaded view
|

Re: How to submit two jobs sequentially and view their outputs in .out file?

Komal Mariam
Hi Theo,

I want to interrupt/cancel my current job as it has produced the desired results even though it runs infinitely,  and the next one requires full resources. 

Due to some technical issue we cannot access the web UI so just working with the CLI, for now. 

I found a less crude way by running the command ./bin/flink cancel <job id>  specified by the commands listed here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html

------------

Hello Vino,

Thank you! That's exactly what's happening. Is there any way to force it write to a specific .out of a TaskManager?


Best Regards,
Komal



On Mon, 25 Nov 2019 at 11:10, vino yang <[hidden email]> wrote:
Hi Komal,

Since you use the Flink standalone deployment mode, the tasks of the jobs which print information to the STDOUT may randomly deploy in any task manager of the cluster. Did you check other Task Managers out file?

Best,
Vino

Komal Mariam <[hidden email]> 于2019年11月22日周五 下午6:59写道:
Dear all,

Thank you for your help regarding my previous queries. Unfortunately, I'm stuck with another one and will really appreciate your input. 

I can't seem to produce any outputs in "flink-taskexecutor-0.out" from my second job after submitting the first one in my 3-node-flink standalone cluster.

Say I want to test out two jobs sequentially. (I do not want to run them concurrently/in parallel).

After submitting "job1.jar " via command line, I press "Ctrl + C" to exit from it (as it runs infinitely). After that I
try to submit a second jar file having the same properties (group-id, topic, etc) with the only difference being the query written in main function.

The first job produces relevant outputs in "flink-taskexecutor-0.out" but the second one doesn't.

The only way I can see the output produced is if I restart the cluster after job1 and then submit job2 as it produces another .out file.

But I want to submit 2 jobs sequentially and see their outputs without having to restart my cluster. Is there any way to do this?

Additional info:
For both jobs I'm using DataStream API and I have set:
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Best Regards,
Komal
Reply | Threaded
Open this post in threaded view
|

Re: How to submit two jobs sequentially and view their outputs in .out file?

vino yang
Hi Komal,

> Thank you! That's exactly what's happening. Is there any way to force it write to a specific .out of a TaskManager?

No, I am curious why the two jobs depend on stdout? Can we introduce another coordinator other than stdout? IMO, this mechanism is not always available.

Best,
Vino

Komal Mariam <[hidden email]> 于2019年11月25日周一 上午10:46写道:
Hi Theo,

I want to interrupt/cancel my current job as it has produced the desired results even though it runs infinitely,  and the next one requires full resources. 

Due to some technical issue we cannot access the web UI so just working with the CLI, for now. 

I found a less crude way by running the command ./bin/flink cancel <job id>  specified by the commands listed here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html

------------

Hello Vino,

Thank you! That's exactly what's happening. Is there any way to force it write to a specific .out of a TaskManager?


Best Regards,
Komal



On Mon, 25 Nov 2019 at 11:10, vino yang <[hidden email]> wrote:
Hi Komal,

Since you use the Flink standalone deployment mode, the tasks of the jobs which print information to the STDOUT may randomly deploy in any task manager of the cluster. Did you check other Task Managers out file?

Best,
Vino

Komal Mariam <[hidden email]> 于2019年11月22日周五 下午6:59写道:
Dear all,

Thank you for your help regarding my previous queries. Unfortunately, I'm stuck with another one and will really appreciate your input. 

I can't seem to produce any outputs in "flink-taskexecutor-0.out" from my second job after submitting the first one in my 3-node-flink standalone cluster.

Say I want to test out two jobs sequentially. (I do not want to run them concurrently/in parallel).

After submitting "job1.jar " via command line, I press "Ctrl + C" to exit from it (as it runs infinitely). After that I
try to submit a second jar file having the same properties (group-id, topic, etc) with the only difference being the query written in main function.

The first job produces relevant outputs in "flink-taskexecutor-0.out" but the second one doesn't.

The only way I can see the output produced is if I restart the cluster after job1 and then submit job2 as it produces another .out file.

But I want to submit 2 jobs sequentially and see their outputs without having to restart my cluster. Is there any way to do this?

Additional info:
For both jobs I'm using DataStream API and I have set:
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Best Regards,
Komal
Reply | Threaded
Open this post in threaded view
|

Re: How to submit two jobs sequentially and view their outputs in .out file?

Piotr Nowojski-3
Hi,

I would suggest the same thing as Vino did: it might be possible to use stdout somehow, but it’s a better idea to coordinate in some other way. Produce some (side?) output with a control message from one job once it finishes, that will control the second job.

Piotrek

On 25 Nov 2019, at 09:38, vino yang <[hidden email]> wrote:

Hi Komal,

> Thank you! That's exactly what's happening. Is there any way to force it write to a specific .out of a TaskManager?

No, I am curious why the two jobs depend on stdout? Can we introduce another coordinator other than stdout? IMO, this mechanism is not always available.

Best,
Vino

Komal Mariam <[hidden email]> 于2019年11月25日周一 上午10:46写道:
Hi Theo,

I want to interrupt/cancel my current job as it has produced the desired results even though it runs infinitely,  and the next one requires full resources. 

Due to some technical issue we cannot access the web UI so just working with the CLI, for now. 

I found a less crude way by running the command ./bin/flink cancel <job id>  specified by the commands listed here: https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html

------------

Hello Vino,

Thank you! That's exactly what's happening. Is there any way to force it write to a specific .out of a TaskManager?


Best Regards,
Komal



On Mon, 25 Nov 2019 at 11:10, vino yang <[hidden email]> wrote:
Hi Komal,

Since you use the Flink standalone deployment mode, the tasks of the jobs which print information to the STDOUT may randomly deploy in any task manager of the cluster. Did you check other Task Managers out file?

Best,
Vino

Komal Mariam <[hidden email]> 于2019年11月22日周五 下午6:59写道:
Dear all,

Thank you for your help regarding my previous queries. Unfortunately, I'm stuck with another one and will really appreciate your input. 

I can't seem to produce any outputs in "flink-taskexecutor-0.out" from my second job after submitting the first one in my 3-node-flink standalone cluster.

Say I want to test out two jobs sequentially. (I do not want to run them concurrently/in parallel).

After submitting "job1.jar " via command line, I press "Ctrl + C" to exit from it (as it runs infinitely). After that I
try to submit a second jar file having the same properties (group-id, topic, etc) with the only difference being the query written in main function.

The first job produces relevant outputs in "flink-taskexecutor-0.out" but the second one doesn't.

The only way I can see the output produced is if I restart the cluster after job1 and then submit job2 as it produces another .out file.

But I want to submit 2 jobs sequentially and see their outputs without having to restart my cluster. Is there any way to do this?

Additional info:
For both jobs I'm using DataStream API and I have set:
 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Best Regards,
Komal