Supporting Multi-tenancy in a Flink

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Supporting Multi-tenancy in a Flink

Aparup Banerjee (apbanerj)
We are building a Stream processing system using Apache beam on top of Flink using the Flink Runner. Our pipelines take Kafka streams as sources , and can write to multiple sinks. The system needs to be tenant aware. Tenants can share same Kafka topic. Tenants can write their own pipelines. We am providing a small framework to write pipelines (on top of beam),  so we have control of what data stream is available to pipeline developer. I am looking for some strategies for following : 

  1. How can I partition / group the data in a way that pipeline developers don’t need to care about tenancy , but the data integrity is maintained ?
  2. Ways in which I can assign compute(work nodes for e.g) to different jobs based on Tenant configuration.

Thanks,
Aparup
Reply | Threaded
Open this post in threaded view
|

Re: Supporting Multi-tenancy in a Flink

Ufuk Celebi
On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj)
<[hidden email]> wrote:

> We are building a Stream processing system using Apache beam on top of Flink
> using the Flink Runner. Our pipelines take Kafka streams as sources , and
> can write to multiple sinks. The system needs to be tenant aware. Tenants
> can share same Kafka topic. Tenants can write their own pipelines. We am
> providing a small framework to write pipelines (on top of beam),  so we have
> control of what data stream is available to pipeline developer. I am looking
> for some strategies for following :
>
> How can I partition / group the data in a way that pipeline developers don’t
> need to care about tenancy , but the data integrity is maintained ?
> Ways in which I can assign compute(work nodes for e.g) to different jobs
> based on Tenant configuration.

There is no built-in support for this in Flink, but King.com worked on
something similar using custom operators. You can check out the blog
post here: https://techblog.king.com/rbea-scalable-real-time-analytics-king/

I'm pulling in Gyula (cc'd) who worked on the implementation at King...
Reply | Threaded
Open this post in threaded view
|

Re: Supporting Multi-tenancy in a Flink

Aparup Banerjee (apbanerj)
Thanks.

Hi Gyula anything , you can share on this?

Aparup




On 7/26/16, 4:38 AM, "Ufuk Celebi" <[hidden email]> wrote:

>On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj)
><[hidden email]> wrote:
>> We are building a Stream processing system using Apache beam on top of Flink
>> using the Flink Runner. Our pipelines take Kafka streams as sources , and
>> can write to multiple sinks. The system needs to be tenant aware. Tenants
>> can share same Kafka topic. Tenants can write their own pipelines. We am
>> providing a small framework to write pipelines (on top of beam),  so we have
>> control of what data stream is available to pipeline developer. I am looking
>> for some strategies for following :
>>
>> How can I partition / group the data in a way that pipeline developers don’t
>> need to care about tenancy , but the data integrity is maintained ?
>> Ways in which I can assign compute(work nodes for e.g) to different jobs
>> based on Tenant configuration.
>
>There is no built-in support for this in Flink, but King.com worked on
>something similar using custom operators. You can check out the blog
>post here: https://techblog.king.com/rbea-scalable-real-time-analytics-king/
>
>I'm pulling in Gyula (cc'd) who worked on the implementation at King...
Reply | Threaded
Open this post in threaded view
|

Re: Supporting Multi-tenancy in a Flink

Gyula Fóra-2
Hi,

Well, what we are doing at King is trying to solve a similar problem. It would be great if you could read the blogpost because it goes into detail about the actual implementation but let me recap here quickly:

We are building a stream processing system that data scientists and other developers share at King in a way that they can use it through a simple web interface without knowing any operational details. The stream processing system itself is One complex Flink job that receives both events and executes the user scripts/jobs that are written in a higher lever dsl.

The DSL is designed so that we can execute the operations in a fixed streaming topology instead of having to dynamically deploy new jobs for every new scripts. Both scripts and events are sent through Kafka so this makes our backend Flink job naturally multi-tenant. This is of course not always appropriate as there is no resource isolation between individual scripts but we can work around this by dedicating backend jobs to different teams.

Let me know if this helps!
Gyula

Aparup Banerjee (apbanerj) <[hidden email]> ezt írta (időpont: 2016. júl. 26., K, 14:50):
Thanks.

Hi Gyula anything , you can share on this?

Aparup




On 7/26/16, 4:38 AM, "Ufuk Celebi" <[hidden email]> wrote:

>On Mon, Jul 25, 2016 at 5:38 AM, Aparup Banerjee (apbanerj)
><[hidden email]> wrote:
>> We are building a Stream processing system using Apache beam on top of Flink
>> using the Flink Runner. Our pipelines take Kafka streams as sources , and
>> can write to multiple sinks. The system needs to be tenant aware. Tenants
>> can share same Kafka topic. Tenants can write their own pipelines. We am
>> providing a small framework to write pipelines (on top of beam),  so we have
>> control of what data stream is available to pipeline developer. I am looking
>> for some strategies for following :
>>
>> How can I partition / group the data in a way that pipeline developers don’t
>> need to care about tenancy , but the data integrity is maintained ?
>> Ways in which I can assign compute(work nodes for e.g) to different jobs
>> based on Tenant configuration.
>
>There is no built-in support for this in Flink, but King.com worked on
>something similar using custom operators. You can check out the blog
>post here: https://techblog.king.com/rbea-scalable-real-time-analytics-king/
>
>I'm pulling in Gyula (cc'd) who worked on the implementation at King...