History Server Not Showing Any Jobs - File Not Found?

classic Classic list List threaded Threaded
23 messages Options
12
Reply | Threaded
Open this post in threaded view
|

History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
Which Flink version are you using?
Have you checked the history server logs after enabling debug logging?

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas
In reply to this post by Chesnay Schepler

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' <[hidden email]>; Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

What are the contents of /local/scratch/flink_historyserver_tmpdir?
I assume there are already archives in HDFS?

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
hmm...let's see if I can reproduce the issue locally.

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
yes, exactly; I want to rule out that (somehow) HDFS is the problem.

I couldn't reproduce the issue locally myself so far.

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.
So it seems like the download gets stuck for some reason.

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
Looks like it is indeed stuck on downloading the archive.

I searched a bit in the Hadoop JIRA and found several similar instances:

It is supposed to be fixed in 2.6.0 though :/

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,
which could lead to incompatibilities and dependency clashes.
If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Okay, I will look further to see if we’re mistakenly using a version that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop in my /lib directory for flink-1.9.1.

 

flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

 

Are the files within /lib.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Looks like it is indeed stuck on downloading the archive.

 

I searched a bit in the Hadoop JIRA and found several similar instances:

 

It is supposed to be fixed in 2.6.0 though :/

 

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,

which could lead to incompatibilities and dependency clashes.

If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

 

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas
In reply to this post by Chesnay Schepler

May I also ask what version of flink-hadoop you’re using and the number of jobs you’re storing the history for? As of writing we have roughly 101,000 application history files. I’m curious to know if we’re encountering some kind of resource problem.

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' <[hidden email]>; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Okay, I will look further to see if we’re mistakenly using a version that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop in my /lib directory for flink-1.9.1.

 

flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

 

Are the files within /lib.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Looks like it is indeed stuck on downloading the archive.

 

I searched a bit in the Hadoop JIRA and found several similar instances:

 

It is supposed to be fixed in 2.6.0 though :/

 

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,

which could lead to incompatibilities and dependency clashes.

If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

 

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

Re: History Server Not Showing Any Jobs - File Not Found?

Chesnay Schepler
oh I'm not using the HistoryServer; I just wrote it ;)
Are these archives all in the same location? So we're roughly looking at 5 GB of archives then?

That could indeed "just" be a resource problem. The HistoryServer eagerly downloads all archives, and not on-demand.
The next step would be to move some of the archives into a separate HDFS directory and try again.

(Note that by configuring "historyserver.web.tmpdir" to some permanent directory subsequent (re)starts of the HistorySserver can re-use this directory; so you only have to download things once)

On 29/05/2020 00:43, Hailu, Andreas wrote:

May I also ask what version of flink-hadoop you’re using and the number of jobs you’re storing the history for? As of writing we have roughly 101,000 application history files. I’m curious to know if we’re encountering some kind of resource problem.

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Okay, I will look further to see if we’re mistakenly using a version that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop in my /lib directory for flink-1.9.1.

 

flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

 

Are the files within /lib.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Looks like it is indeed stuck on downloading the archive.

 

I searched a bit in the Hadoop JIRA and found several similar instances:

 

It is supposed to be fixed in 2.6.0 though :/

 

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,

which could lead to incompatibilities and dependency clashes.

If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

 

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices


Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas

Yes, these are all in the same directory, and we’re at 67G right now. I’ll try with incrementally smaller directories and let you know what I find.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Friday, May 29, 2020 3:11 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

oh I'm not using the HistoryServer; I just wrote it ;)

Are these archives all in the same location? So we're roughly looking at 5 GB of archives then?

 

That could indeed "just" be a resource problem. The HistoryServer eagerly downloads all archives, and not on-demand.

The next step would be to move some of the archives into a separate HDFS directory and try again.

 

(Note that by configuring "historyserver.web.tmpdir" to some permanent directory subsequent (re)starts of the HistorySserver can re-use this directory; so you only have to download things once)

 

On 29/05/2020 00:43, Hailu, Andreas wrote:

May I also ask what version of flink-hadoop you’re using and the number of jobs you’re storing the history for? As of writing we have roughly 101,000 application history files. I’m curious to know if we’re encountering some kind of resource problem.

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Okay, I will look further to see if we’re mistakenly using a version that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop in my /lib directory for flink-1.9.1.

 

flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

 

Are the files within /lib.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Looks like it is indeed stuck on downloading the archive.

 

I searched a bit in the Hadoop JIRA and found several similar instances:

 

It is supposed to be fixed in 2.6.0 though :/

 

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,

which could lead to incompatibilities and dependency clashes.

If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

 

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
Reply | Threaded
Open this post in threaded view
|

RE: History Server Not Showing Any Jobs - File Not Found?

Hailu, Andreas
In reply to this post by Chesnay Schepler

So I created a new HDFS directory with just 1 archive and pointed the server to monitor that directory, et voila – I’m able to see the applications in the UI. So it must have been really churning trying to fetch all of those initial archives J

 

I have a couple of follow up questions if you please:

1.      What is the upper limit of the number of archives the history server can support? Does it attempt to download every archive and load them all into memory?

2.      Retention: we have on the order of 100K applications per day in our production environment. Is there any native retention of policy? E.g. only keep the latest X archives in the dir - or is this something we need to manage ourselves?

 

Thanks.

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Friday, May 29, 2020 8:46 AM
To: 'Chesnay Schepler' <[hidden email]>; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Yes, these are all in the same directory, and we’re at 67G right now. I’ll try with incrementally smaller directories and let you know what I find.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Friday, May 29, 2020 3:11 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

oh I'm not using the HistoryServer; I just wrote it ;)

Are these archives all in the same location? So we're roughly looking at 5 GB of archives then?

 

That could indeed "just" be a resource problem. The HistoryServer eagerly downloads all archives, and not on-demand.

The next step would be to move some of the archives into a separate HDFS directory and try again.

 

(Note that by configuring "historyserver.web.tmpdir" to some permanent directory subsequent (re)starts of the HistorySserver can re-use this directory; so you only have to download things once)

 

On 29/05/2020 00:43, Hailu, Andreas wrote:

May I also ask what version of flink-hadoop you’re using and the number of jobs you’re storing the history for? As of writing we have roughly 101,000 application history files. I’m curious to know if we’re encountering some kind of resource problem.

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Thursday, May 28, 2020 12:18 PM
To: 'Chesnay Schepler' [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Okay, I will look further to see if we’re mistakenly using a version that’s pre-2.6.0. However, I don’t see flink-shaded-hadoop in my /lib directory for flink-1.9.1.

 

flink-dist_2.11-1.9.1.jar

flink-table-blink_2.11-1.9.1.jar

flink-table_2.11-1.9.1.jar

log4j-1.2.17.jar

slf4j-log4j12-1.7.15.jar

 

Are the files within /lib.

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Thursday, May 28, 2020 11:00 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Looks like it is indeed stuck on downloading the archive.

 

I searched a bit in the Hadoop JIRA and found several similar instances:

 

It is supposed to be fixed in 2.6.0 though :/

 

If hadoop is available from the HADOOP_CLASSPATH and flink-shaded-hadoop in /lib then you basically don't know what Hadoop version is actually being used,

which could lead to incompatibilities and dependency clashes.

If flink-shaded-hadoop 2.4/2.5 is on the classpath, maybe that is being used and runs into HDFS-7005.

 

On 28/05/2020 16:27, Hailu, Andreas wrote:

Just created a dump, here’s what I see:

 

"Flink-HistoryServer-ArchiveFetcher-thread-1" #19 daemon prio=5 os_prio=0 tid=0x00007f93a5a2c000 nid=0x5692 runnable [0x00007f934a0d3000]

   java.lang.Thread.State: RUNNABLE

        at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

        at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)

        at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)

        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

        - locked <0x00000005df986960> (a sun.nio.ch.Util$2)

        - locked <0x00000005df986948> (a java.util.Collections$UnmodifiableSet)

        - locked <0x00000005df928390> (a sun.nio.ch.EPollSelectorImpl)

        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)

        at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

        at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:201)

        at org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:152)

        - locked <0x00000005ceade5e0> (a org.apache.hadoop.hdfs.RemoteBlockReader2)

        at org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:781)

        at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:837)

        - eliminated <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:897)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

       at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:945)

        - locked <0x00000005cead3688> (a org.apache.hadoop.hdfs.DFSInputStream)

        at java.io.DataInputStream.read(DataInputStream.java:149)

        at org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)

        at java.io.InputStream.read(InputStream.java:101)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:69)

        at org.apache.flink.util.IOUtils.copyBytes(IOUtils.java:91)

        at org.apache.flink.runtime.history.FsJobArchivist.getArchivedJsons(FsJobArchivist.java:110)

        at org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher$JobArchiveFetcherTask.run(HistoryServerArchiveFetcher.java:169)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)

        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

        at java.lang.Thread.run(Thread.java:745)

 

What problems could the flink-shaded-hadoop jar being included introduce?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Thursday, May 28, 2020 9:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If it were a class-loading issue I would think that we'd see an exception of some kind. Maybe double-check that flink-shaded-hadoop is not in the lib directory. (usually I would ask for the full classpath that the HS is started with, but as it turns out this isn't getting logged :( (FLINK-18008))

 

The fact that overview.json and jobs/overview.json are missing indicates that something goes wrong directly on startup. What is supposed to happens is that the HS starts, fetches all currently available archives and then creates these files.

So it seems like the download gets stuck for some reason.

 

Can you use jstack to create a thread dump, and see what the Flink-HistoryServer-ArchiveFetcher is doing?

 

I will also file a JIRA for adding more logging statements, like when fetching starts/stops.

 

On 27/05/2020 20:57, Hailu, Andreas wrote:

Hi Chesney, apologies for not getting back to you sooner here. So I did what you suggested - I downloaded a few files from my jobmanager.archive.fs.dir HDFS directory to a locally available directory named /local/scratch/hailua_p2epdlsuat/historyserver/archived/. I then changed my historyserver.archive.fs.dir to file:///local/scratch/hailua_p2epdlsuat/historyserver/archived/ and that seemed to work. I’m able to see the history of the applications I downloaded. So this points to a problem with sourcing the history from HDFS.

 

Do you think this could be classpath related? This is what we use for our HADOOP_CLASSPATH var:

/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-hdfs/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-mapreduce/lib/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/*:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop-yarn/lib/*:/gns/software/ep/da/dataproc/dataproc-prod/lakeRmProxy.jar:/gns/software/infra/big-data/hadoop/hdp-2.6.5.0/hadoop/bin::/gns/mw/dbclient/postgres/jdbc/pg-jdbc-9.3.v01/postgresql-9.3-1100-jdbc4.jar

 

You can see we have references to Hadoop mapred/yarn/hdfs libs in there.

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Sunday, May 3, 2020 6:00 PM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

yes, exactly; I want to rule out that (somehow) HDFS is the problem.

 

I couldn't reproduce the issue locally myself so far.

 

On 01/05/2020 22:31, Hailu, Andreas wrote:

Hi Chesnay, yes – they were created using Flink 1.9.1 as we’ve only just started to archive them in the past couple weeks. Could you clarify on how you want to try local filesystem archives? As in changing jobmanager.archive.fs.dir and historyserver.web.tmpdir to the same local directory?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Wednesday, April 29, 2020 8:26 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

hmm...let's see if I can reproduce the issue locally.

 

Are the archives from the same version the history server runs on? (Which I supposed would be 1.9.1?)

 

Just for the sake of narrowing things down, it would also be interesting to check if it works with the archives residing in the local filesystem.

 

On 27/04/2020 18:35, Hailu, Andreas wrote:

bash-4.1$ ls -l /local/scratch/flink_historyserver_tmpdir/

total 8

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:43 flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

drwxrwxr-x 3 p2epdlsuat p2epdlsuat 4096 Apr 21 10:22 flink-web-history-95b3f928-c60f-4351-9926-766c6ad3ee76

 

There are just two directories in here. I don’t see cache directories from my attempts today, which is interesting. Looking a little deeper into them:

 

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9

total 1756

drwxrwxr-x 2 p2epdlsuat p2epdlsuat 1789952 Apr 21 10:44 jobs

bash-4.1$ ls -lr /local/scratch/flink_historyserver_tmpdir/flink-web-history-7fbb97cc-9f38-4844-9bcf-6272fe6828e9/jobs

total 0

-rw-rw-r-- 1 p2epdlsuat p2epdlsuat 0 Apr 21 10:43 overview.json

 

There are indeed archives already in HDFS – I’ve included some in my initial mail, but here they are again just for reference:

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

 

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 10:28 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

If historyserver.web.tmpdir is not set then java.io.tmpdir is used, so that should be fine.

 

What are the contents of /local/scratch/flink_historyserver_tmpdir?

I assume there are already archives in HDFS?

 

On 27/04/2020 16:02, Hailu, Andreas wrote:

My machine’s /tmp directory is not large enough to support the archived files, so I changed my java.io.tmpdir to be in some other location which is significantly larger. I hadn’t set anything for historyserver.web.tmpdir, so I suspect it was still pointing at /tmp. I just tried setting historyserver.web.tmpdir to the same location as my java.io.tmpdir location, but I’m afraid I’m still seeing the following issue:

 

2020-04-27 09:37:42,904 [nioEventLoopGroup-3-4] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /overview.json from classloader

2020-04-27 09:37:42,906 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

 

flink-conf.yaml for reference:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

historyserver.web.tmpdir: /local/scratch/flink_historyserver_tmpdir/

 

Did you have anything else in mind when you said pointing somewhere funny?

 

// ah

 

From: Chesnay Schepler [hidden email]
Sent: Monday, April 27, 2020 5:56 AM
To: Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

overview.json is a generated file that is placed in the local directory controlled by historyserver.web.tmpdir.

Have you configured this option to point to some non-local filesystem? (Or if not, is the java.io.tmpdir property pointing somewhere funny?)

On 24/04/2020 18:24, Hailu, Andreas wrote:

I’m having a further look at the code in HistoryServerStaticFileServerHandler - is there an assumption about where overview.json is supposed to be located?

 

// ah

 

From: Hailu, Andreas [Engineering]
Sent: Wednesday, April 22, 2020 1:32 PM
To: 'Chesnay Schepler' [hidden email]; Hailu, Andreas [Engineering] [hidden email]; [hidden email]
Subject: RE: History Server Not Showing Any Jobs - File Not Found?

 

Hi Chesnay, thanks for responding. We’re using Flink 1.9.1. I enabled DEBUG level logging and this is something relevant I see:

 

2020-04-22 13:25:52,566 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - Connecting to datanode 10.79.252.101:1019

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false

2020-04-22 13:25:52,567 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG SaslDataTransferClient - SASL client skipping handshake in secured configuration with privileged port for addr = /10.79.252.101, datanodeId = DatanodeI

nfoWithStorage[10.79.252.101:1019,DS-7f4ec55d-7c5f-4a0e-b817-d9e635480b21,DISK]

2020-04-22 13:25:52,571 [Flink-HistoryServer-ArchiveFetcher-thread-1] DEBUG DFSInputStream - DFSInputStream has been closed already

2020-04-22 13:25:52,573 [nioEventLoopGroup-3-6] DEBUG HistoryServerStaticFileServerHandler - Unable to load requested file /jobs/overview.json from classloader

2020-04-22 13:25:52,576 [IPC Parameter Sending Thread #0] DEBUG Client$Connection$3 - IPC Client (1578587450) connection to d279536-002.dc.gs.com/10.59.61.87:8020 from [hidden email] sending #1391

 

Aside from that, it looks like a lot of logging around datanodes and block location metadata. Did I miss something in my classpath, perhaps? If so, do you have a suggestion on what I could try?

 

// ah

 

From: Chesnay Schepler <[hidden email]>
Sent: Wednesday, April 22, 2020 2:16 AM
To: Hailu, Andreas [Engineering] <[hidden email]>; [hidden email]
Subject: Re: History Server Not Showing Any Jobs - File Not Found?

 

Which Flink version are you using?

Have you checked the history server logs after enabling debug logging?

 

On 21/04/2020 17:16, Hailu, Andreas [Engineering] wrote:

Hi,

 

I’m trying to set up the History Server, but none of my applications are showing up in the Web UI. Looking at the console, I see that all of the calls to /overview return the following 404 response: {"errors":["File not found."]}.

 

I’ve set up my configuration as follows:

 

JobManager Archive directory:

jobmanager.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

-bash-4.1$ hdfs dfs -ls /user/p2epda/lake/delp_qa/flink_hs

Found 44282 items

-rw-r-----   3 delp datalake_admin_dev      50569 2020-03-21 23:17 /user/p2epda/lake/delp_qa/flink_hs/000144dba9dc0f235768a46b2f26e936

-rw-r-----   3 delp datalake_admin_dev      49578 2020-03-03 08:45 /user/p2epda/lake/delp_qa/flink_hs/000347625d8128ee3fd0b672018e38a5

-rw-r-----   3 delp datalake_admin_dev      50842 2020-03-24 15:19 /user/p2epda/lake/delp_qa/flink_hs/0004be6ce01ba9677d1eb619ad0fa757

...

...

 

History Server will fetch the archived jobs from the same location:

historyserver.archive.fs.dir: hdfs:///user/p2epda/lake/delp_qa/flink_hs/

 

So I’m able to confirm that there are indeed archived applications that I should be able to view in the histserver. I’m not able to find out what file the overview service is looking for from the repository – any suggestions as to what I could look into next?

 

Best,

Andreas

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 

 



Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices

 




Your Personal Data: We may collect and process information about you that may be subject to data protection laws. For more information about how we use and disclose your personal data, how we protect your information, our legal basis to use your information, your rights and who you can contact, please refer to: www.gs.com/privacy-notices
12