Hi, I'm facing an issue similar to https://issues.apache.org/jira/browse/FLINK-14074 Job starts and then yarn logs report "Could not resolve ResourceManager address akka.tcp://flink" A fragment from yarn logs looks like this: LazyFromSourcesSchedulingStrategy] 16:54:21,279 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - Job Flink Java Job at Thu Mar 26 16:54:09 CET 2020 (9817283f911d83a6d278cc39d17d6b11) switched from state CREATED to RUNNING. 16:54:21,287 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN DataSource (MailEvent; EMC; 2019-12-01 - 2020-01-01; null - 1578182400000) -> FlatMap (SplitDuplicate) -> FlatMap (Create MailEvent) -> Filter (EventDateTimeRangeFilter) -> Filter (TrackingStatusesFilter) -> FlatMap (Get mail item by EMC event) -> Map (Map IntraregionalVolumeItem data set from EMC events) (1/3) (5482b0e6ae1d64d9b0918ec15599211f) switched from CREATED to SCHEDULED. 16:54:21,287 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN DataSource (MailEvent; EMC; 2019-12-01 - 2020-01-01; null - 1578182400000) -> FlatMap (SplitDuplicate) -> FlatMap (Create MailEvent) -> Filter (EventDateTimeRangeFilter) -> Filter (TrackingStatusesFilter) -> FlatMap (Get mail item by EMC event) -> Map (Map IntraregionalVolumeItem data set from EMC events) (2/3) (5c993710423eea47ae66f833b2999530) switched from CREATED to SCHEDULED. 16:54:21,287 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN DataSource (MailEvent; EMC; 2019-12-01 - 2020-01-01; null - 1578182400000) -> FlatMap (SplitDuplicate) -> FlatMap (Create MailEvent) -> Filter (EventDateTimeRangeFilter) -> Filter (TrackingStatusesFilter) -> FlatMap (Get mail item by EMC event) -> Map (Map IntraregionalVolumeItem data set from EMC events) (3/3) (23cfa30fba857b2c75ba76a21c7d4972) switched from CREATED to SCHEDULED. 16:54:21,287 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN DataSource (MailEvent; EMD; 2019-12-01 - 2020-01-01; null - 1578182400000) -> FlatMap (SplitDuplicate) -> FlatMap (Create MailEvent) -> Filter (EventDateTimeRangeFilter) -> Filter (TrackingStatusesFilter) -> FlatMap (Get mail item by EMD event) -> Map (Map IntraregionalVolumeItem data set from EMD events) (1/3) (7cc8a395b87e82000184724eb1698ace) switched from CREATED to SCHEDULED. 16:54:21,288 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN DataSource (MailEvent; EMD; 2019-12-01 - 2020-01-01; null - 1578182400000) -> FlatMap (SplitDuplicate) -> FlatMap (Create MailEvent) -> Filter (EventDateTimeRangeFilter) -> Filter (TrackingStatusesFilter) -> FlatMap (Get mail item by EMD event) -> Map (Map IntraregionalVolumeItem data set from EMD events) (2/3) (5edfe3d1f509856d17fa0da078cb3f7e) switched from CREATED to SCHEDULED. 16:54:21,288 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - CHAIN DataSource (MailEvent; EMD; 2019-12-01 - 2020-01-01; null - 1578182400000) -> FlatMap (SplitDuplicate) -> FlatMap (Create MailEvent) -> Filter (EventDateTimeRangeFilter) -> Filter (TrackingStatusesFilter) -> FlatMap (Get mail item by EMD event) -> Map (Map IntraregionalVolumeItem data set from EMD events) (3/3) (dd3397f889a3fad1acf4c59f59a93d92) switched from CREATED to SCHEDULED. 16:54:21,297 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{b4c6e7357e4620bf2e997c46d7723eb1}] 16:54:21,301 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{841bbb79b01b5e0d9ae749a03f65c303}] 16:54:21,301 INFO org.apache.flink.runtime.jobmaster.slotpool.SlotPoolImpl - Cannot serve slot request, no ResourceManager connected. Adding as pending request [SlotRequestId{496120465d541ea9fd2ffcec89e2ac3b}] 16:54:21,304 INFO org.apache.flink.runtime.jobmaster.JobMaster - Connecting to ResourceManager akka.tcp://flink@...:43757/user/resourcemanager(00000000000000000000000000000000) 16:54:21,307 INFO org.apache.flink.runtime.jobmaster.JobMaster - Could not resolve ResourceManager address akka.tcp://flink@prod-bigd-dn11:43757/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@...:43757/user/resourcemanager.. 16:54:31,322 INFO org.apache.flink.runtime.jobmaster.JobMaster - Could not resolve ResourceManager address akka.tcp://flink@prod-bigd-dn11:43757/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@prod-bigd-dn11:43757/user/resourcemanager.. What can cause following problems? Cannot serve slot request, no ResourceManager connected Could not resolve ResourceManager address akka.tcp://flink@prod-bigd-dn11:43757 Regards, Vitaliy |
Hi Vitaliy, >> Cannot serve slot request, no ResourceManager connected This is not a problem, just that the JM needs RM to be connected to send slot requests. >> Could not resolve ResourceManager address akka.tcp://flink@prod-bigd-dn11:43757/user/resourcemanager This should be the root cause. Would you check whether the hostname prod-bigd-dn11 is resolvable? And whether the port 43757 of that machine is permitted to be accessed? Thanks, Zhu Zhu Vitaliy Semochkin <[hidden email]> 于2020年3月27日周五 上午1:54写道:
|
Hello Zhu, The host can be resolved and there are no filewalls in the cluster, so all ports are opened. Regards, Vitaliy On Fri, Mar 27, 2020 at 8:32 AM Zhu Zhu <[hidden email]> wrote:
|
Could you also check the jobmanager logs whether the flink akka is also bound to and listening at the hostname "prod-bigd-dn11"? Otherwise, all the package from taskmanager will be discarded. Best, Yang Vitaliy Semochkin <[hidden email]> 于2020年3月27日周五 下午3:35写道:
|
Free forum by Nabble | Edit this page |