User program failures cause JobManager to be shutdown

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

User program failures cause JobManager to be shutdown

Dongwon Kim-2
Hi,

I tried to run a program by uploading a jar on Flink UI. When I intentionally enter a wrong parameter to my program, JobManager dies. Below is all log messages I can get from JobManager; JobManager dies as soon as spitting the second line:

2019-12-05 04:47:58,623 WARN  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    - Configuring the job submission via query parameters is deprecated. Please migrate to submitting a JSON request instead.
2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient                                   - Cannot connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.util.ArrayList` out of START_OBJECT token
 at [Source: (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException: “}”; line: 1, column: 1]
2019-12-05 04:47:59,166 INFO  org.apache.flink.runtime.blob.BlobServer                      - Stopped BLOB server at 0.0.0.0:6124

The second line is obviously from my program and it shouldn't cause JobManager to be shut down. Is it intended behavior?

Best,

Dongwon
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

r_khachatryan
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote

> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> <http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models>:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;<a href="http://0.0.0.0:6124&gt;*">http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

Dongwon Kim-2
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

r_khachatryan
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

rmetzger0
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

Dongwon Kim-2
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

Dongwon Kim-2
FYI, we've launched a session cluster where multiple jobs are managed by a job manager. If that happens, all the other jobs also fail because the job manager is shut down and all the task managers get into chaos (failing to connect to the job manager).

I just searched a way to prevent System.exit() calls from terminating JVMs and found [1]. Can it be a possible solution to the problem?

Best,
- Dongwon

On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

r_khachatryan
Hi Dongwon,

This should work but it could also interfere with Flink itself exiting in case of a fatal error.

Regards,
Roman


On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <[hidden email]> wrote:
FYI, we've launched a session cluster where multiple jobs are managed by a job manager. If that happens, all the other jobs also fail because the job manager is shut down and all the task managers get into chaos (failing to connect to the job manager).

I just searched a way to prevent System.exit() calls from terminating JVMs and found [1]. Can it be a possible solution to the problem?

Best,
- Dongwon

On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

rmetzger0
I guess we could manage the security only when calling the user's main() method.

This problem actually exists for all usercode in Flink: You can also kill TaskManagers like this.
If we are going to add something like this to Flink, I would only log that System.exit() has been called by the user code, not intercept and ignore the call.

On Fri, Dec 6, 2019 at 10:31 AM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

This should work but it could also interfere with Flink itself exiting in case of a fatal error.

Regards,
Roman


On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <[hidden email]> wrote:
FYI, we've launched a session cluster where multiple jobs are managed by a job manager. If that happens, all the other jobs also fail because the job manager is shut down and all the task managers get into chaos (failing to connect to the job manager).

I just searched a way to prevent System.exit() calls from terminating JVMs and found [1]. Can it be a possible solution to the problem?

Best,
- Dongwon

On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

Dongwon Kim-2
Hi Robert and Roman, 
Yeah, letting users know System.exit() is called would be much more appropriate than just intercepting and ignoring.

Best,
Dongwon

On Sat, Dec 7, 2019 at 11:29 PM Robert Metzger <[hidden email]> wrote:
I guess we could manage the security only when calling the user's main() method.

This problem actually exists for all usercode in Flink: You can also kill TaskManagers like this.
If we are going to add something like this to Flink, I would only log that System.exit() has been called by the user code, not intercept and ignore the call.

On Fri, Dec 6, 2019 at 10:31 AM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

This should work but it could also interfere with Flink itself exiting in case of a fatal error.

Regards,
Roman


On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <[hidden email]> wrote:
FYI, we've launched a session cluster where multiple jobs are managed by a job manager. If that happens, all the other jobs also fail because the job manager is shut down and all the task managers get into chaos (failing to connect to the job manager).

I just searched a way to prevent System.exit() calls from terminating JVMs and found [1]. Can it be a possible solution to the problem?

Best,
- Dongwon

On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

rmetzger0
Hey Dongwon,
This does not mean it will be implemented anytime soon :)

On Mon, Dec 9, 2019 at 2:25 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman, 
Yeah, letting users know System.exit() is called would be much more appropriate than just intercepting and ignoring.

Best,
Dongwon

On Sat, Dec 7, 2019 at 11:29 PM Robert Metzger <[hidden email]> wrote:
I guess we could manage the security only when calling the user's main() method.

This problem actually exists for all usercode in Flink: You can also kill TaskManagers like this.
If we are going to add something like this to Flink, I would only log that System.exit() has been called by the user code, not intercept and ignore the call.

On Fri, Dec 6, 2019 at 10:31 AM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

This should work but it could also interfere with Flink itself exiting in case of a fatal error.

Regards,
Roman


On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <[hidden email]> wrote:
FYI, we've launched a session cluster where multiple jobs are managed by a job manager. If that happens, all the other jobs also fail because the job manager is shut down and all the task managers get into chaos (failing to connect to the job manager).

I just searched a way to prevent System.exit() calls from terminating JVMs and found [1]. Can it be a possible solution to the problem?

Best,
- Dongwon

On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
Reply | Threaded
Open this post in threaded view
|

Re: User program failures cause JobManager to be shutdown

Dongwon Kim-2
Hi Robert,

Yeah, I know. For the moment, I warned my colleagues not to call System.exit() :-) But it needs to be implemented for the sake of Flink usability as you described in the issue.
Thanks a lot for taking care of this issue.

Best,

Dongwon

2019. 12. 9. 오후 9:55, Robert Metzger <[hidden email]> 작성:


Hey Dongwon,
This does not mean it will be implemented anytime soon :)

On Mon, Dec 9, 2019 at 2:25 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman, 
Yeah, letting users know System.exit() is called would be much more appropriate than just intercepting and ignoring.

Best,
Dongwon

On Sat, Dec 7, 2019 at 11:29 PM Robert Metzger <[hidden email]> wrote:
I guess we could manage the security only when calling the user's main() method.

This problem actually exists for all usercode in Flink: You can also kill TaskManagers like this.
If we are going to add something like this to Flink, I would only log that System.exit() has been called by the user code, not intercept and ignore the call.

On Fri, Dec 6, 2019 at 10:31 AM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

This should work but it could also interfere with Flink itself exiting in case of a fatal error.

Regards,
Roman


On Fri, Dec 6, 2019 at 2:54 AM Dongwon Kim <[hidden email]> wrote:
FYI, we've launched a session cluster where multiple jobs are managed by a job manager. If that happens, all the other jobs also fail because the job manager is shut down and all the task managers get into chaos (failing to connect to the job manager).

I just searched a way to prevent System.exit() calls from terminating JVMs and found [1]. Can it be a possible solution to the problem?

Best,
- Dongwon

On Fri, Dec 6, 2019 at 10:39 AM Dongwon Kim <[hidden email]> wrote:
Hi Robert and Roman,

Thank you for taking a look at this.

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

I just found that our HTTP client is programmed to call System.exit(1). I should guide not to call System.exit() in Flink applications. 

p.s. Just out of curiosity, is there no way for the web app to intercept System.exit() and prevent the job manager from being shutting down?

Best,

- Dongwon

On Fri, Dec 6, 2019 at 3:59 AM Robert Metzger <[hidden email]> wrote:
Hi Dongwon,

what is your main() method / client doing when it's receiving wrong program parameters? Does it call System.exit(), or something like that?

By the way, the http address from the error message is publicly available. Not sure if this is internal data or not.

On Thu, Dec 5, 2019 at 6:32 PM Khachatryan Roman <[hidden email]> wrote:
Hi Dongwon,

I wasn't able to reproduce your problem with Flink JobManager 1.9.1 with various kinds of errors in the job.
I suggest you try it on a fresh Flink installation without any other jobs submitted.

Regards,
Roman


On Thu, Dec 5, 2019 at 3:48 PM Dongwon Kim <[hidden email]> wrote:
Hi Roman,

We're using the latest version 1.9.1 and those two lines are all I've seen after executing the job on the web ui.

Best,

Dongwon

On Thu, Dec 5, 2019 at 11:36 PM r_khachatryan <[hidden email]> wrote:
Hi Dongwon,

Could you please provide Flink version you are running and the job manager
logs?

Regards,
Roman


eastcirclek wrote
> Hi,
>
> I tried to run a program by uploading a jar on Flink UI. When I
> intentionally enter a wrong parameter to my program, JobManager dies.
> Below
> is all log messages I can get from JobManager; JobManager dies as soon as
> spitting the second line:
>
> 2019-12-05 04:47:58,623 WARN
>>  org.apache.flink.runtime.webmonitor.handlers.JarRunHandler    -
>> Configuring the job submission via query parameters is deprecated. Please
>> migrate to submitting a JSON request instead.
>>
>>
>> *2019-12-05 04:47:59,133 ERROR com.skt.apm.http.HTTPClient
>>                   - Cannot
>> connect:http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models
>> &lt;http://52.141.38.11:8380/api/spec/poc_asset_model_01/model/imbalance/models&gt;:
>> com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot
>> deserialize instance of `java.util.ArrayList` out of START_OBJECT token
>> at
>> [Source:
>> (String)“{”code”:“GB0001”,“resource”:“msg.comm.unknown.error”,“details”:“NullPointerException:
>> “}”; line: 1, column: 1]2019-12-05 04:47:59,166 INFO
>>  org.apache.flink.runtime.blob.BlobServer                      - Stopped
>> BLOB server at 0.0.0.0:6124 &lt;http://0.0.0.0:6124&gt;*
>
>
> The second line is obviously from my program and it shouldn't cause
> JobManager to be shut down. Is it intended behavior?
>
> Best,
>
> Dongwon





--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/