best practices on getting flink job logs from Hadoop history server?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

best practices on getting flink job logs from Hadoop history server?

Yu Yang
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we are facing is to simplify flink job log access.  

The flink job logs can be accessible using "yarn logs $application_id". That approach has a few limitations: 
  1. It is not straightforward to find yarn application id based on flink job id. 
  2. It is difficult to find the corresponding container id for the flink sub tasks. 
  3. For jobs that have many tasks, it is inefficient to use "yarn logs ..."  as it mixes logs from all task managers.
Any suggestions on the best practice to get logs for completed flink job that run on yarn?  

Regards, 
-Yu


Reply | Threaded
Open this post in threaded view
|

Re: best practices on getting flink job logs from Hadoop history server?

Zhu Zhu
Hi Yu,

Regarding #2,
Currently we search task deployment log in JM log, which contains info of the container and machine the task deploys to.

Regarding #3,
You can find the application logs aggregated by machines on DFS, this path of which relies on your YARN config.
Each log may still include multiple TM logs. However it can be much smaller than the "yarn logs ..." generated log.

Thanks,
Zhu Zhu

Yu Yang <[hidden email]> 于2019年8月30日周五 下午3:58写道:
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we are facing is to simplify flink job log access.  

The flink job logs can be accessible using "yarn logs $application_id". That approach has a few limitations: 
  1. It is not straightforward to find yarn application id based on flink job id. 
  2. It is difficult to find the corresponding container id for the flink sub tasks. 
  3. For jobs that have many tasks, it is inefficient to use "yarn logs ..."  as it mixes logs from all task managers.
Any suggestions on the best practice to get logs for completed flink job that run on yarn?  

Regards, 
-Yu


Reply | Threaded
Open this post in threaded view
|

Re: best practices on getting flink job logs from Hadoop history server?

Yun Tang
Hi  Yu

If you have client job log and you could find your application id from below description:

The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill {appId}
Please also note that the temporary files of the YARN session in the home directory will not be removed.

Best
Yun Tang


From: Zhu Zhu <[hidden email]>
Sent: Friday, August 30, 2019 16:24
To: Yu Yang <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: best practices on getting flink job logs from Hadoop history server?
 
Hi Yu,

Regarding #2,
Currently we search task deployment log in JM log, which contains info of the container and machine the task deploys to.

Regarding #3,
You can find the application logs aggregated by machines on DFS, this path of which relies on your YARN config.
Each log may still include multiple TM logs. However it can be much smaller than the "yarn logs ..." generated log.

Thanks,
Zhu Zhu

Yu Yang <[hidden email]> 于2019年8月30日周五 下午3:58写道:
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we are facing is to simplify flink job log access.  

The flink job logs can be accessible using "yarn logs $application_id". That approach has a few limitations: 
  1. It is not straightforward to find yarn application id based on flink job id. 
  2. It is difficult to find the corresponding container id for the flink sub tasks. 
  3. For jobs that have many tasks, it is inefficient to use "yarn logs ..."  as it mixes logs from all task managers.
Any suggestions on the best practice to get logs for completed flink job that run on yarn?  

Regards, 
-Yu


Reply | Threaded
Open this post in threaded view
|

Re: best practices on getting flink job logs from Hadoop history server?

Yu Yang
Hi Yun Tang & Zhu Zhu, 

Thanks for the reply!  With your current approach, we will still need to search job manager log / yarn client log to find information on job id/vertex id --> yarn container id mapping. I am wondering howe we can propagate this kind of information to Flink execution graph so that it can stored under flink history server's archived execution graph. Any suggestions about that? 

-Yu

On Fri, Aug 30, 2019 at 2:21 AM Yun Tang <[hidden email]> wrote:
Hi  Yu

If you have client job log and you could find your application id from below description:

The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill {appId}
Please also note that the temporary files of the YARN session in the home directory will not be removed.

Best
Yun Tang


From: Zhu Zhu <[hidden email]>
Sent: Friday, August 30, 2019 16:24
To: Yu Yang <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: best practices on getting flink job logs from Hadoop history server?
 
Hi Yu,

Regarding #2,
Currently we search task deployment log in JM log, which contains info of the container and machine the task deploys to.

Regarding #3,
You can find the application logs aggregated by machines on DFS, this path of which relies on your YARN config.
Each log may still include multiple TM logs. However it can be much smaller than the "yarn logs ..." generated log.

Thanks,
Zhu Zhu

Yu Yang <[hidden email]> 于2019年8月30日周五 下午3:58写道:
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we are facing is to simplify flink job log access.  

The flink job logs can be accessible using "yarn logs $application_id". That approach has a few limitations: 
  1. It is not straightforward to find yarn application id based on flink job id. 
  2. It is difficult to find the corresponding container id for the flink sub tasks. 
  3. For jobs that have many tasks, it is inefficient to use "yarn logs ..."  as it mixes logs from all task managers.
Any suggestions on the best practice to get logs for completed flink job that run on yarn?  

Regards, 
-Yu


Reply | Threaded
Open this post in threaded view
|

Re: best practices on getting flink job logs from Hadoop history server?

Yang Wang
I think the best way to view the log is flink history server. 
However, it could only support jobGraph and exceptions. Maybe
the flink history server needs to be enhanced so that we could view
logs just like the cluster is running.


Best,
Yang

Yu Yang <[hidden email]> 于2019年9月6日周五 上午3:06写道:
Hi Yun Tang & Zhu Zhu, 

Thanks for the reply!  With your current approach, we will still need to search job manager log / yarn client log to find information on job id/vertex id --> yarn container id mapping. I am wondering howe we can propagate this kind of information to Flink execution graph so that it can stored under flink history server's archived execution graph. Any suggestions about that? 

-Yu

On Fri, Aug 30, 2019 at 2:21 AM Yun Tang <[hidden email]> wrote:
Hi  Yu

If you have client job log and you could find your application id from below description:

The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it:
yarn application -kill {appId}
Please also note that the temporary files of the YARN session in the home directory will not be removed.

Best
Yun Tang


From: Zhu Zhu <[hidden email]>
Sent: Friday, August 30, 2019 16:24
To: Yu Yang <[hidden email]>
Cc: user <[hidden email]>
Subject: Re: best practices on getting flink job logs from Hadoop history server?
 
Hi Yu,

Regarding #2,
Currently we search task deployment log in JM log, which contains info of the container and machine the task deploys to.

Regarding #3,
You can find the application logs aggregated by machines on DFS, this path of which relies on your YARN config.
Each log may still include multiple TM logs. However it can be much smaller than the "yarn logs ..." generated log.

Thanks,
Zhu Zhu

Yu Yang <[hidden email]> 于2019年8月30日周五 下午3:58写道:
Hi,

We run flink jobs through yarn on hadoop clusters. One challenge that we are facing is to simplify flink job log access.  

The flink job logs can be accessible using "yarn logs $application_id". That approach has a few limitations: 
  1. It is not straightforward to find yarn application id based on flink job id. 
  2. It is difficult to find the corresponding container id for the flink sub tasks. 
  3. For jobs that have many tasks, it is inefficient to use "yarn logs ..."  as it mixes logs from all task managers.
Any suggestions on the best practice to get logs for completed flink job that run on yarn?  

Regards, 
-Yu