SourceFunction cannot run in Batch Mode

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SourceFunction cannot run in Batch Mode

oscar.chen
Hi,

Currently, we want to use batch execution mode [0] to consume historical data and rebuild states for our streaming application. 
The Flink app will be run on-demand and close after complete all the file processing.
We implement a SourceFuntion [1] to consume bounded parquet files from GCS. However, the function will be detected as Batch Mode.

Our question is, how to implement a SourceFunction as a Bounded DataStream?

Thanks! 
Oscar




Reply | Threaded
Open this post in threaded view
|

Re: SourceFunction cannot run in Batch Mode

oscar.chen
Sorry, there are some typos that may be misleading.

The SourceFunction will be detected as Streaming Mode.

陳樺威 <[hidden email]> 於 2021年6月3日 週四 下午1:29寫道:
Hi,

Currently, we want to use batch execution mode [0] to consume historical data and rebuild states for our streaming application. 
The Flink app will be run on-demand and close after complete all the file processing.
We implement a SourceFuntion [1] to consume bounded parquet files from GCS. However, the function will be detected as Batch Mode.

Our question is, how to implement a SourceFunction as a Bounded DataStream?

Thanks! 
Oscar




Reply | Threaded
Open this post in threaded view
|

Re: SourceFunction cannot run in Batch Mode

Ingo Bürk
In reply to this post by oscar.chen
Hi Oscar,

I think you'll find your answers in [1], have a look at Yun's response a couple emails down. Basically, SourceFunction is the legacy source stack, and ideally you'd instead implement your source using the FLIP-27 stack[2] where you can directly define the boundedness, but he also mentioned a workaround.


Regards
Ingo


On Thu, Jun 3, 2021 at 7:29 AM 陳樺威 <[hidden email]> wrote:
Hi,

Currently, we want to use batch execution mode [0] to consume historical data and rebuild states for our streaming application. 
The Flink app will be run on-demand and close after complete all the file processing.
We implement a SourceFuntion [1] to consume bounded parquet files from GCS. However, the function will be detected as Batch Mode.

Our question is, how to implement a SourceFunction as a Bounded DataStream?

Thanks! 
Oscar