How Flink's async-io-api parameters work?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How Flink's async-io-api parameters work?

Diwakar Jha

Hello,

I'm trying to solve a problem using async-io-api. I'm running a flinkjob which has a sleep time of 120sec during restart. After every deployment i see around 10K file load time out. I suspect it is because of the TimeOut parameter being 30 sec. I played with that number and found out that 150sec gives 0 file read timeout. I tried changing Capacity as well (from 100 to 1000) but that didn't make any difference. That's surprising to me since I thought that when the flinkjob restarts (from savepoint) it has a backlog (of 120 sec) of file read requests and increasing Capacity will make file read faster but that's not the case.


I have a question
a) for the async-io-api, why is changing TimeOut is working instead of Capacity?

Here is the Configuration of AsyncDataSteam.

Before
DataStream<RawData> fileStream = AsyncDataStream
.
unorderedWait(datasetStream,
new AsyncS3Load(),
30,
TimeUnit.SECONDS,
100)

After
DataStream<RawData> fileStream = AsyncDataStream
.unorderedWait(datasetStream,
new AsyncS3Load(),
150,
TimeUnit.SECONDS,
100)

Thanks!