Stopping a job

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Stopping a job

M Singh
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks


Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

Arvid Heise-3

On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

M Singh
Hi Arvid:

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

Does this mean that FlinkKinesisConsumer is not stoppable ?

Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :



On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:



On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

Arvid Heise-3
Yes, it seems as if FlinkKinesisConsumer does not implement it.

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.


On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:
Hi Arvid:

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

Does this mean that FlinkKinesisConsumer is not stoppable ?

Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :



On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:



On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

M Singh

Hi Arvid:   

Thanks for the links.  

A few questions:

1. Is there any particular interface in 1.9+ that identifies the source as stoppable ?
2. Is there any distinction b/w stop and cancel  in 1.9+ ?
3. Is there any list of sources which are documented as stoppable besides the one listed in your SO link ?
4. In 1.9+ there is flink stop command and a flink cancel command. (https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#stop).  So it appears that flink stop will take a savepoint and the call cancel, and cancel will just cancel the job (looks like cancel with savepoint is deprecated in 1.10).  

Thanks again for your help.



On Saturday, June 6, 2020, 02:18:57 PM EDT, Arvid Heise <[hidden email]> wrote:


Yes, it seems as if FlinkKinesisConsumer does not implement it.

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.


On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:
Hi Arvid:

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

Does this mean that FlinkKinesisConsumer is not stoppable ?

Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :



On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:



On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

Arvid Heise-3
It was before I joined the dev team, so the following are kind of speculative:

The concept of stoppable functions never really took off as it was a bit of a clumsy approach. There is no fundamental difference between stopping and cancelling on (sub)task level. Indeed if you look in the twitter source of 1.6 [1], cancel() and stop() are doing the exact same thing. I'd assume that this is probably true for all sources.

So what is the difference between cancel and stop then? It's more the way on how you terminate the whole DAG. On cancelling, you cancel() on all tasks more or less simultaneously. If you want to stop, it's more a fine-grain cancel, where you stop first the sources and then let the tasks close themselves when all upstream tasks are done. Just before closing the tasks, you also take a snapshot. Thus, the difference should not be visible in user code but only in the Flink code itself (task/checkpoint coordinator)

So for your question:
1. No, as on task level stop() and cancel() are the same thing on UDF level.
2. Yes, stop will be more graceful and creates a snapshot. [2] 
3. Not that I am aware of. In the whole flink code base, there are no more (see javadoc). You could of course check if there are some in Bahir. But it shouldn't really matter. There is no huge difference between stopping and cancelling if you wait for a checkpoint to finish.
4. Okay you answered your second question ;) Yes cancel with savepoint = stop now to make it easier for new users.


On Sun, Jun 7, 2020 at 1:04 AM M Singh <[hidden email]> wrote:

Hi Arvid:   

Thanks for the links.  

A few questions:

1. Is there any particular interface in 1.9+ that identifies the source as stoppable ?
2. Is there any distinction b/w stop and cancel  in 1.9+ ?
3. Is there any list of sources which are documented as stoppable besides the one listed in your SO link ?
4. In 1.9+ there is flink stop command and a flink cancel command. (https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#stop).  So it appears that flink stop will take a savepoint and the call cancel, and cancel will just cancel the job (looks like cancel with savepoint is deprecated in 1.10).  

Thanks again for your help.



On Saturday, June 6, 2020, 02:18:57 PM EDT, Arvid Heise <[hidden email]> wrote:


Yes, it seems as if FlinkKinesisConsumer does not implement it.

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.


On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:
Hi Arvid:

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

Does this mean that FlinkKinesisConsumer is not stoppable ?

Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :



On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:



On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

Kostas Kloudas-2
What Arvid said is correct. 
The only thing I have to add is that "stop" allows also exactly-once sinks to push out their buffered data to their final destination (e.g. Filesystem). In other words, it takes into account side-effects, so it guarantees exactly-once end-to-end, assuming that you are using exactly-once sources and sinks. Cancel with savepoint on the other hand did not necessarily and committing side-effects is was following a "best-effort" approach.

For more information you can check [1].

Cheers,
Kostas 


On Mon, Jun 8, 2020 at 10:23 AM Arvid Heise <[hidden email]> wrote:
It was before I joined the dev team, so the following are kind of speculative:

The concept of stoppable functions never really took off as it was a bit of a clumsy approach. There is no fundamental difference between stopping and cancelling on (sub)task level. Indeed if you look in the twitter source of 1.6 [1], cancel() and stop() are doing the exact same thing. I'd assume that this is probably true for all sources.

So what is the difference between cancel and stop then? It's more the way on how you terminate the whole DAG. On cancelling, you cancel() on all tasks more or less simultaneously. If you want to stop, it's more a fine-grain cancel, where you stop first the sources and then let the tasks close themselves when all upstream tasks are done. Just before closing the tasks, you also take a snapshot. Thus, the difference should not be visible in user code but only in the Flink code itself (task/checkpoint coordinator)

So for your question:
1. No, as on task level stop() and cancel() are the same thing on UDF level.
2. Yes, stop will be more graceful and creates a snapshot. [2] 
3. Not that I am aware of. In the whole flink code base, there are no more (see javadoc). You could of course check if there are some in Bahir. But it shouldn't really matter. There is no huge difference between stopping and cancelling if you wait for a checkpoint to finish.
4. Okay you answered your second question ;) Yes cancel with savepoint = stop now to make it easier for new users.


On Sun, Jun 7, 2020 at 1:04 AM M Singh <[hidden email]> wrote:

Hi Arvid:   

Thanks for the links.  

A few questions:

1. Is there any particular interface in 1.9+ that identifies the source as stoppable ?
2. Is there any distinction b/w stop and cancel  in 1.9+ ?
3. Is there any list of sources which are documented as stoppable besides the one listed in your SO link ?
4. In 1.9+ there is flink stop command and a flink cancel command. (https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#stop).  So it appears that flink stop will take a savepoint and the call cancel, and cancel will just cancel the job (looks like cancel with savepoint is deprecated in 1.10).  

Thanks again for your help.



On Saturday, June 6, 2020, 02:18:57 PM EDT, Arvid Heise <[hidden email]> wrote:


Yes, it seems as if FlinkKinesisConsumer does not implement it.

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.


On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:
Hi Arvid:

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

Does this mean that FlinkKinesisConsumer is not stoppable ?

Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :



On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:



On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:
Hi:

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]



I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

Thanks




--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   


--

Arvid Heise | Senior Java Developer


Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   
Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

Senthil Kumar

I am just stating this for completeness.

 

When a job is cancelled, Flink sends an Interrupt signal to the Thread running the Source.run method

 

For some reason (unknown to me), this does not happen when a Stop command is issued.

 

We ran into some minor issues because of said behavior.

 

From: Kostas Kloudas <[hidden email]>
Date: Monday, June 8, 2020 at 2:35 AM
To: Arvid Heise <[hidden email]>
Cc: M Singh <[hidden email]>, User-Flink <[hidden email]>
Subject: Re: Stopping a job

 

What Arvid said is correct. 

The only thing I have to add is that "stop" allows also exactly-once sinks to push out their buffered data to their final destination (e.g. Filesystem). In other words, it takes into account side-effects, so it guarantees exactly-once end-to-end, assuming that you are using exactly-once sources and sinks. Cancel with savepoint on the other hand did not necessarily and committing side-effects is was following a "best-effort" approach.

 

For more information you can check [1].

 

Cheers,

Kostas 

 

 

On Mon, Jun 8, 2020 at 10:23 AM Arvid Heise <[hidden email]> wrote:

It was before I joined the dev team, so the following are kind of speculative:

 

The concept of stoppable functions never really took off as it was a bit of a clumsy approach. There is no fundamental difference between stopping and cancelling on (sub)task level. Indeed if you look in the twitter source of 1.6 [1], cancel() and stop() are doing the exact same thing. I'd assume that this is probably true for all sources.

 

So what is the difference between cancel and stop then? It's more the way on how you terminate the whole DAG. On cancelling, you cancel() on all tasks more or less simultaneously. If you want to stop, it's more a fine-grain cancel, where you stop first the sources and then let the tasks close themselves when all upstream tasks are done. Just before closing the tasks, you also take a snapshot. Thus, the difference should not be visible in user code but only in the Flink code itself (task/checkpoint coordinator)

 

So for your question:

1. No, as on task level stop() and cancel() are the same thing on UDF level.

2. Yes, stop will be more graceful and creates a snapshot. [2] 

3. Not that I am aware of. In the whole flink code base, there are no more (see javadoc). You could of course check if there are some in Bahir. But it shouldn't really matter. There is no huge difference between stopping and cancelling if you wait for a checkpoint to finish.

4. Okay you answered your second question ;) Yes cancel with savepoint = stop now to make it easier for new users.

 

 

On Sun, Jun 7, 2020 at 1:04 AM M Singh <[hidden email]> wrote:

 

Hi Arvid:   

 

Thanks for the links.  

 

A few questions:

 

1. Is there any particular interface in 1.9+ that identifies the source as stoppable ?

2. Is there any distinction b/w stop and cancel  in 1.9+ ?

3. Is there any list of sources which are documented as stoppable besides the one listed in your SO link ?

4. In 1.9+ there is flink stop command and a flink cancel command. (https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#stop).  So it appears that flink stop will take a savepoint and the call cancel, and cancel will just cancel the job (looks like cancel with savepoint is deprecated in 1.10).  

 

Thanks again for your help.

 

 

 

On Saturday, June 6, 2020, 02:18:57 PM EDT, Arvid Heise <[hidden email]> wrote:

 

 

Yes, it seems as if FlinkKinesisConsumer does not implement it.

 

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

 

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.

 

 

On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:

Hi Arvid:

 

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

 

Does this mean that FlinkKinesisConsumer is not stoppable ?


Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :

 

 

 

On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:

 

 

Hi,

 

could you check if this SO thread [1] helps you already?

 

 

On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:

Hi:

 

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]

 

 

I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

 

Thanks

 



--

Arvid Heise | Senior Java Developer

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   

Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

M Singh
Thanks Kostas, Arvid, and Senthil for your help.

On Monday, June 8, 2020, 12:47:56 PM EDT, Senthil Kumar <[hidden email]> wrote:


I am just stating this for completeness.

 

When a job is cancelled, Flink sends an Interrupt signal to the Thread running the Source.run method

 

For some reason (unknown to me), this does not happen when a Stop command is issued.

 

We ran into some minor issues because of said behavior.

 

From: Kostas Kloudas <[hidden email]>
Date: Monday, June 8, 2020 at 2:35 AM
To: Arvid Heise <[hidden email]>
Cc: M Singh <[hidden email]>, User-Flink <[hidden email]>
Subject: Re: Stopping a job

 

What Arvid said is correct. 

The only thing I have to add is that "stop" allows also exactly-once sinks to push out their buffered data to their final destination (e.g. Filesystem). In other words, it takes into account side-effects, so it guarantees exactly-once end-to-end, assuming that you are using exactly-once sources and sinks. Cancel with savepoint on the other hand did not necessarily and committing side-effects is was following a "best-effort" approach.

 

For more information you can check [1].

 

Cheers,

Kostas 

 

 

On Mon, Jun 8, 2020 at 10:23 AM Arvid Heise <[hidden email]> wrote:

It was before I joined the dev team, so the following are kind of speculative:

 

The concept of stoppable functions never really took off as it was a bit of a clumsy approach. There is no fundamental difference between stopping and cancelling on (sub)task level. Indeed if you look in the twitter source of 1.6 [1], cancel() and stop() are doing the exact same thing. I'd assume that this is probably true for all sources.

 

So what is the difference between cancel and stop then? It's more the way on how you terminate the whole DAG. On cancelling, you cancel() on all tasks more or less simultaneously. If you want to stop, it's more a fine-grain cancel, where you stop first the sources and then let the tasks close themselves when all upstream tasks are done. Just before closing the tasks, you also take a snapshot. Thus, the difference should not be visible in user code but only in the Flink code itself (task/checkpoint coordinator)

 

So for your question:

1. No, as on task level stop() and cancel() are the same thing on UDF level.

2. Yes, stop will be more graceful and creates a snapshot. [2] 

3. Not that I am aware of. In the whole flink code base, there are no more (see javadoc). You could of course check if there are some in Bahir. But it shouldn't really matter. There is no huge difference between stopping and cancelling if you wait for a checkpoint to finish.

4. Okay you answered your second question ;) Yes cancel with savepoint = stop now to make it easier for new users.

 

 

On Sun, Jun 7, 2020 at 1:04 AM M Singh <[hidden email]> wrote:

 

Hi Arvid:   

 

Thanks for the links.  

 

A few questions:

 

1. Is there any particular interface in 1.9+ that identifies the source as stoppable ?

2. Is there any distinction b/w stop and cancel  in 1.9+ ?

3. Is there any list of sources which are documented as stoppable besides the one listed in your SO link ?

4. In 1.9+ there is flink stop command and a flink cancel command. (https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#stop).  So it appears that flink stop will take a savepoint and the call cancel, and cancel will just cancel the job (looks like cancel with savepoint is deprecated in 1.10).  

 

Thanks again for your help.

 

 

 

On Saturday, June 6, 2020, 02:18:57 PM EDT, Arvid Heise <[hidden email]> wrote:

 

 

Yes, it seems as if FlinkKinesisConsumer does not implement it.

 

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

 

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.

 

 

On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:

Hi Arvid:

 

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

 

Does this mean that FlinkKinesisConsumer is not stoppable ?


Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :

 

 

 

On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:

 

 

Hi,

 

could you check if this SO thread [1] helps you already?

 

 

On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:

Hi:

 

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]

 

 

I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

 

Thanks

 



--

Arvid Heise | Senior Java Developer

Image removed by sender.

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer

Image removed by sender.

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer

Image removed by sender.

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   

Reply | Threaded
Open this post in threaded view
|

Re: Stopping a job

Kostas Kloudas-2
Hi all, 

Just for future reference, there is an ongoing discussion on the topic at another thread found in [1].
So please post any relevant comments there :)

Cheers,
Kostas


On Tue, Jun 9, 2020 at 7:36 AM M Singh <[hidden email]> wrote:
Thanks Kostas, Arvid, and Senthil for your help.

On Monday, June 8, 2020, 12:47:56 PM EDT, Senthil Kumar <[hidden email]> wrote:


I am just stating this for completeness.

 

When a job is cancelled, Flink sends an Interrupt signal to the Thread running the Source.run method

 

For some reason (unknown to me), this does not happen when a Stop command is issued.

 

We ran into some minor issues because of said behavior.

 

From: Kostas Kloudas <[hidden email]>
Date: Monday, June 8, 2020 at 2:35 AM
To: Arvid Heise <[hidden email]>
Cc: M Singh <[hidden email]>, User-Flink <[hidden email]>
Subject: Re: Stopping a job

 

What Arvid said is correct. 

The only thing I have to add is that "stop" allows also exactly-once sinks to push out their buffered data to their final destination (e.g. Filesystem). In other words, it takes into account side-effects, so it guarantees exactly-once end-to-end, assuming that you are using exactly-once sources and sinks. Cancel with savepoint on the other hand did not necessarily and committing side-effects is was following a "best-effort" approach.

 

For more information you can check [1].

 

Cheers,

Kostas 

 

 

On Mon, Jun 8, 2020 at 10:23 AM Arvid Heise <[hidden email]> wrote:

It was before I joined the dev team, so the following are kind of speculative:

 

The concept of stoppable functions never really took off as it was a bit of a clumsy approach. There is no fundamental difference between stopping and cancelling on (sub)task level. Indeed if you look in the twitter source of 1.6 [1], cancel() and stop() are doing the exact same thing. I'd assume that this is probably true for all sources.

 

So what is the difference between cancel and stop then? It's more the way on how you terminate the whole DAG. On cancelling, you cancel() on all tasks more or less simultaneously. If you want to stop, it's more a fine-grain cancel, where you stop first the sources and then let the tasks close themselves when all upstream tasks are done. Just before closing the tasks, you also take a snapshot. Thus, the difference should not be visible in user code but only in the Flink code itself (task/checkpoint coordinator)

 

So for your question:

1. No, as on task level stop() and cancel() are the same thing on UDF level.

2. Yes, stop will be more graceful and creates a snapshot. [2] 

3. Not that I am aware of. In the whole flink code base, there are no more (see javadoc). You could of course check if there are some in Bahir. But it shouldn't really matter. There is no huge difference between stopping and cancelling if you wait for a checkpoint to finish.

4. Okay you answered your second question ;) Yes cancel with savepoint = stop now to make it easier for new users.

 

 

On Sun, Jun 7, 2020 at 1:04 AM M Singh <[hidden email]> wrote:

 

Hi Arvid:   

 

Thanks for the links.  

 

A few questions:

 

1. Is there any particular interface in 1.9+ that identifies the source as stoppable ?

2. Is there any distinction b/w stop and cancel  in 1.9+ ?

3. Is there any list of sources which are documented as stoppable besides the one listed in your SO link ?

4. In 1.9+ there is flink stop command and a flink cancel command. (https://ci.apache.org/projects/flink/flink-docs-stable/ops/cli.html#stop).  So it appears that flink stop will take a savepoint and the call cancel, and cancel will just cancel the job (looks like cancel with savepoint is deprecated in 1.10).  

 

Thanks again for your help.

 

 

 

On Saturday, June 6, 2020, 02:18:57 PM EDT, Arvid Heise <[hidden email]> wrote:

 

 

Yes, it seems as if FlinkKinesisConsumer does not implement it.

 

Here are the links to the respective javadoc [1] and code [2]. Note that in later releases (1.9+) this interface has been removed. Stop is now implemented through a cancel() on source level.

 

In general, I don't think that in a Kinesis to Kinesis use case, stop is needed anyways, since there is no additional consistency expected over a normal cancel.

 

 

On Sat, Jun 6, 2020 at 8:03 PM M Singh <[hidden email]> wrote:

Hi Arvid:

 

I check the link and it indicates that only Storm SpoutSource, TwitterSource and NifiSource support stop.   

 

Does this mean that FlinkKinesisConsumer is not stoppable ?


Also, can you please point me to the Stoppable interface mentioned in the link ?  I found the following but am not sure if TwitterSource implements it :

 

 

 

On Friday, June 5, 2020, 02:48:49 PM EDT, Arvid Heise <[hidden email]> wrote:

 

 

Hi,

 

could you check if this SO thread [1] helps you already?

 

 

On Thu, Jun 4, 2020 at 7:43 PM M Singh <[hidden email]> wrote:

Hi:

 

I am running a job which consumes data from Kinesis and send data to another Kinesis queue.  I am using an older version of Flink (1.6), and when I try to stop the job I get an exception 

 

Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.rest.util.RestClientException: [Job termination (STOP) failed: This job is not stoppable.]

 

 

I wanted to find out what is a stoppable job and it possible to make a job stoppable if is reading/writing to kinesis ?

 

Thanks

 



--

Arvid Heise | Senior Java Developer

Image removed by sender.

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer

Image removed by sender.

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng   



--

Arvid Heise | Senior Java Developer

Image removed by sender.

 

Follow us @VervericaData

--

Join Flink Forward - The Apache Flink Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji (Toni) Cheng