(DEPRECATED) Apache Flink User Mailing List archive.

Re: Colocating Compute

Posted by Satyam Shekhar on
URL: http://deprecated-apache-flink-user-mailing-list-archive.369.s1.nabble.com/Colocating-Compute-tp37038p37096.html

Hi Dawid,

I am currently on Flink v1.10. Do streaming pipelines support unbounded InputFormat in v1.10? My current setup uses SourceFunction for streaming pipeline and InputFormat for batch queries.

I see the documentation for Flink v1.11 describe concepts for Split, SourceReader, and SplitEnumerator to enable streaming queries on unbounded splits. Is that the direction you were pointing to?

Regards,

Satyam

On Thu, Jul 30, 2020 at 6:03 AM Dawid Wysakowicz <[hidden email]> wrote:

Hi Satyam,

I think you can use the InputSplitAssigner also for streaming pipelines
through an InputFormat. You can use
StreamExecutionEnvironment#createInput or for SQL you can write your
source according to the documentation here:
https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sourceSinks.html#dynamic-table-source

If you do not want to use an InputFormat I think there is no easy way to
do it now.

Best,

Dawid

On 29/07/2020 13:53, Satyam Shekhar wrote:
> Hello,
>
> I am using Flink v1.10 in a distributed environment to run SQL queries
> on batch and streaming data.
>
> In my setup, data is sharded and distributed across the cluster. Each
> shard receives streaming updates from some external source. I wish to
> minimize data movement during query evaluation for performance
> reasons. For that, I need some construct to advise Flink planner to
> bind splits (shard) to the host where it is located.
>
> I have come across InputSplitAssigner which gives me levers to
> influence compute colocation for batch queries. Is there a way to do
> the same for streaming queries as well?
>
> Regards,
> Satyam