Thanks for sharing the thread dump.
It shows that the source thread is indeed back-pressured
(checkpoint lock is held by a thread which is trying to emit but
unable to acquire any free buffers).
The lock is per task, so there can be several locks per TM.
@ChangZhuo Chen (陳昌倬) , in the thread you mentioned it is most likely
the same issue (but I can't tell for sure without a full thread dump)
Regards,
Roman
On Tue, Mar 16, 2021 at 3:00 PM ChangZhuo Chen (陳昌倬) <
[hidden email]> wrote:
>
> On Tue, Mar 16, 2021 at 02:32:54AM +0000, Alexey Trenikhun wrote:
> > Hi Roman,
> > I took thread dump:
> > "Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=200 BLOCKED on java.lang.Object@5366a0e2 owned by "Legacy Source Thread - Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=202
> > at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$SynchronizedStreamTaskActionExecutor.runThrowing(StreamTaskActionExecutor.java:92)
> > - blocked on java.lang.Object@5366a0e2
> > at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:90)
> > at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:317)
> > at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:189)
> > at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:617)
> > at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:581)
> > at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755)
> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570)
> >
> > "Legacy Source Thread - Source: digital-itx-eastus2 -> Filter (6/6)#0" Id=202 WAITING on java.util.concurrent.CompletableFuture$Signaller@6915c7ef
> > at sun.misc.Unsafe.park(Native Method)
> > - waiting on java.util.concurrent.CompletableFuture$Signaller@6915c7ef
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1707)
> > at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
> > at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
> > at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
> > at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestMemorySegmentBlocking(LocalBufferPool.java:319)
> > at org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBuilderBlocking(LocalBufferPool.java:291)
> >
> > Is it checkpoint lock? Is checkpoint lock per task or per TM? I see multiple threads in SynchronizedStreamTaskActionExecutor.runThrowing blocked on different Objects.
>
> Hi,
>
> This call stack is similar to our case as described in [0]. Maybe they
> are the same issue?
>
> [0]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-to-debug-checkpoint-savepoint-stuck-in-Flink-1-12-2-td42103.html
>
>
> --
> ChangZhuo Chen (陳昌倬) czchen@{czchen,debian}.org
>
http://czchen.info/
> Key fingerprint = BA04 346D C2E1 FE63 C790 8793 CC65 B0CD EC27 5D5B