Security in Flink

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Security in Flink

Sourav Mazumder
Hi,

Can anyone point me to ant documentation on support for Security in Flink ?

The type of information I'm looking for are -

1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
3. Support for SSL/encryption for data exchanged happening across the slave nodes
4. Support for pluggable authentication with existing solution like LDAP

If not there today is there a roadmap for these security features ?

Regards,
Sourav
Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

Stephan Ewen
Hi Sourav!

There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.

If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).

With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).


What we have on the roadmap for the coming months it the following:
  - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
  - Encrypt the coordination messages between client/master/workers.
Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).


Let us know if that answers your questions, and if that meets your requirements.

Greetings,
Stephan


On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Can anyone point me to ant documentation on support for Security in Flink ?

The type of information I'm looking for are -

1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
3. Support for SSL/encryption for data exchanged happening across the slave nodes
4. Support for pluggable authentication with existing solution like LDAP

If not there today is there a roadmap for these security features ?

Regards,
Sourav

Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

Sourav Mazumder
Thanks Steven for your details response. Things are more clear to me now.

A follow up Qs -
Looks like most of the security support depends on Hadoop ? What happens if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not there) ?

Regards,
Sourav

On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
Hi Sourav!

There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.

If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).

With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).


What we have on the roadmap for the coming months it the following:
  - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
  - Encrypt the coordination messages between client/master/workers.
Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).


Let us know if that answers your questions, and if that meets your requirements.

Greetings,
Stephan


On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Can anyone point me to ant documentation on support for Security in Flink ?

The type of information I'm looking for are -

1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
3. Support for SSL/encryption for data exchanged happening across the slave nodes
4. Support for pluggable authentication with existing solution like LDAP

If not there today is there a roadmap for these security features ?

Regards,
Sourav


Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

tambunanw
Hi Stephen, 

Do you have any plan on which encryption method and mechanism will be used on Flink ? Could you share about the detail on this ? 

We have very strict requirement from client that every communication need to be encryption. So any detail would be really appreciated for answering their security concern. 


Cheers

On Mon, Jan 11, 2016 at 9:46 PM, Sourav Mazumder <[hidden email]> wrote:
Thanks Steven for your details response. Things are more clear to me now.

A follow up Qs -
Looks like most of the security support depends on Hadoop ? What happens if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not there) ?

Regards,
Sourav

On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
Hi Sourav!

There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.

If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).

With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).


What we have on the roadmap for the coming months it the following:
  - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
  - Encrypt the coordination messages between client/master/workers.
Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).


Let us know if that answers your questions, and if that meets your requirements.

Greetings,
Stephan


On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Can anyone point me to ant documentation on support for Security in Flink ?

The type of information I'm looking for are -

1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
3. Support for SSL/encryption for data exchanged happening across the slave nodes
4. Support for pluggable authentication with existing solution like LDAP

If not there today is there a roadmap for these security features ?

Regards,
Sourav





--
Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

Ufuk Celebi
Hey Welly!

I’m not aware of any concrete plans, but is it possible that you share your requirements on a high level?

– Ufuk

> On 12 Jan 2016, at 08:24, Welly Tambunan <[hidden email]> wrote:
>
> Hi Stephen,
>
> Do you have any plan on which encryption method and mechanism will be used on Flink ? Could you share about the detail on this ?
>
> We have very strict requirement from client that every communication need to be encryption. So any detail would be really appreciated for answering their security concern.
>
>
> Cheers
>
> On Mon, Jan 11, 2016 at 9:46 PM, Sourav Mazumder <[hidden email]> wrote:
> Thanks Steven for your details response. Things are more clear to me now.
>
> A follow up Qs -
> Looks like most of the security support depends on Hadoop ? What happens if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not there) ?
>
> Regards,
> Sourav
>
> On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
> Hi Sourav!
>
> There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.
>
> If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).
>
> With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).
>
>
> What we have on the roadmap for the coming months it the following:
>   - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
>   - Encrypt the coordination messages between client/master/workers.
> Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).
>
>
> Let us know if that answers your questions, and if that meets your requirements.
>
> Greetings,
> Stephan
>
>
> On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
> Hi,
>
> Can anyone point me to ant documentation on support for Security in Flink ?
>
> The type of information I'm looking for are -
>
> 1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
> 2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
> 3. Support for SSL/encryption for data exchanged happening across the slave nodes
> 4. Support for pluggable authentication with existing solution like LDAP
>
> If not there today is there a roadmap for these security features ?
>
> Regards,
> Sourav
>
>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com

Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

Stephan Ewen
Hi Welly!

In the end, all remote communication in Flink will go through Netty (Flink direct shuffles do, and Akka uses Netty as well).

Netty authenticates connections and encrypts data via SSL, implementing Java's SSLContext.

As far as I know, the available algorithms for encryption / signatures are the usual ones in the TLS standard (can also be configured via JVM arguments).

Greetings,
Stephan


On Tue, Jan 12, 2016 at 9:38 AM, Ufuk Celebi <[hidden email]> wrote:
Hey Welly!

I’m not aware of any concrete plans, but is it possible that you share your requirements on a high level?

– Ufuk

> On 12 Jan 2016, at 08:24, Welly Tambunan <[hidden email]> wrote:
>
> Hi Stephen,
>
> Do you have any plan on which encryption method and mechanism will be used on Flink ? Could you share about the detail on this ?
>
> We have very strict requirement from client that every communication need to be encryption. So any detail would be really appreciated for answering their security concern.
>
>
> Cheers
>
> On Mon, Jan 11, 2016 at 9:46 PM, Sourav Mazumder <[hidden email]> wrote:
> Thanks Steven for your details response. Things are more clear to me now.
>
> A follow up Qs -
> Looks like most of the security support depends on Hadoop ? What happens if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not there) ?
>
> Regards,
> Sourav
>
> On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
> Hi Sourav!
>
> There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.
>
> If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).
>
> With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).
>
>
> What we have on the roadmap for the coming months it the following:
>   - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
>   - Encrypt the coordination messages between client/master/workers.
> Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).
>
>
> Let us know if that answers your questions, and if that meets your requirements.
>
> Greetings,
> Stephan
>
>
> On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
> Hi,
>
> Can anyone point me to ant documentation on support for Security in Flink ?
>
> The type of information I'm looking for are -
>
> 1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
> 2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
> 3. Support for SSL/encryption for data exchanged happening across the slave nodes
> 4. Support for pluggable authentication with existing solution like LDAP
>
> If not there today is there a roadmap for these security features ?
>
> Regards,
> Sourav
>
>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com


Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

Stephan Ewen
In reply to this post by Sourav Mazumder
Hi Sourav!

If you want to use Flink in a cluster where neither Hadoop/YARN (not soon Mesos) is available, then I assume you have installed Flink in a standalone mode on the cluster already.

There is no support in Flink currently to manage user authentication. Few thoughts on how that may evolve

1) It should be not too hard to add authentication to the web dashboard. That way, if the cluster is otherwise blocked off (the master's RPC ports are firewalled), one would have restricted job starts.

2) We plan to add authenticated / encrypted connections soon. With that, the client that submits the program would need to have access to the keystore or key and the corresponding password to connect.

Greetings,
Stephan



On Mon, Jan 11, 2016 at 3:46 PM, Sourav Mazumder <[hidden email]> wrote:
Thanks Steven for your details response. Things are more clear to me now.

A follow up Qs -
Looks like most of the security support depends on Hadoop ? What happens if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not there) ?

Regards,
Sourav

On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
Hi Sourav!

There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.

If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).

With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).


What we have on the roadmap for the coming months it the following:
  - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
  - Encrypt the coordination messages between client/master/workers.
Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).


Let us know if that answers your questions, and if that meets your requirements.

Greetings,
Stephan


On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Can anyone point me to ant documentation on support for Security in Flink ?

The type of information I'm looking for are -

1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
3. Support for SSL/encryption for data exchanged happening across the slave nodes
4. Support for pluggable authentication with existing solution like LDAP

If not there today is there a roadmap for these security features ?

Regards,
Sourav



Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

tambunanw
Hi Stephan, 

Thanks a lot for the explanation. 

Is there any timeline on when this will be released ? I guess this one will be the important for our case if we want Flink to be deployed in production. 

Cheers

On Tue, Jan 12, 2016 at 6:19 PM, Stephan Ewen <[hidden email]> wrote:
Hi Sourav!

If you want to use Flink in a cluster where neither Hadoop/YARN (not soon Mesos) is available, then I assume you have installed Flink in a standalone mode on the cluster already.

There is no support in Flink currently to manage user authentication. Few thoughts on how that may evolve

1) It should be not too hard to add authentication to the web dashboard. That way, if the cluster is otherwise blocked off (the master's RPC ports are firewalled), one would have restricted job starts.

2) We plan to add authenticated / encrypted connections soon. With that, the client that submits the program would need to have access to the keystore or key and the corresponding password to connect.

Greetings,
Stephan



On Mon, Jan 11, 2016 at 3:46 PM, Sourav Mazumder <[hidden email]> wrote:
Thanks Steven for your details response. Things are more clear to me now.

A follow up Qs -
Looks like most of the security support depends on Hadoop ? What happens if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not there) ?

Regards,
Sourav

On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
Hi Sourav!

There is user-authentication support in Flink via the Hadoop / Kerberos infrastructure. If you run Flink on YARN, it should seamlessly work that Flink acquires the Kerberos tokens of the user that submits programs, and authenticate itself at YARN, HDFS, and HBase with that.

If you run Flink standalone, Flink can still authenticate at HDFS/HBase via Kerberos, with a bit of manual help by the user (running kinit on the workers).

With Kafka 0.9 and Flink's upcoming connector (https://github.com/apache/flink/pull/1489), streaming programs can authenticate themselves as stream brokers via SSL (and read via encrypted connections).


What we have on the roadmap for the coming months it the following:
  - Encrypt in-flight data streams that are exchanged between worker nodes (TaskManagers).
  - Encrypt the coordination messages between client/master/workers.
Note that these refer to encryption between Flink's own components only, which would use transient keys generated just for a specific job or session (hence would not need any user involvement).


Let us know if that answers your questions, and if that meets your requirements.

Greetings,
Stephan


On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder <[hidden email]> wrote:
Hi,

Can anyone point me to ant documentation on support for Security in Flink ?

The type of information I'm looking for are -

1. How do I do user level authentication to ensure that a job is submitted/deleted/modified by the right user ? Is it possible though the web client ?
2. Authentication across multiple slave nodes (where the task managers are running) and driver program so that they can communicate with each other
3. Support for SSL/encryption for data exchanged happening across the slave nodes
4. Support for pluggable authentication with existing solution like LDAP

If not there today is there a roadmap for these security features ?

Regards,
Sourav






--
Reply | Threaded
Open this post in threaded view
|

Re: Security in Flink

Maximilian Michels
Hi Welly,

There is no fixed timeline yet but we plan to make progress in terms
of authentication and encryption after the 1.0.0 release.

Cheers,
Max

On Wed, Jan 13, 2016 at 8:34 AM, Welly Tambunan <[hidden email]> wrote:

> Hi Stephan,
>
> Thanks a lot for the explanation.
>
> Is there any timeline on when this will be released ? I guess this one will
> be the important for our case if we want Flink to be deployed in production.
>
> Cheers
>
> On Tue, Jan 12, 2016 at 6:19 PM, Stephan Ewen <[hidden email]> wrote:
>>
>> Hi Sourav!
>>
>> If you want to use Flink in a cluster where neither Hadoop/YARN (not soon
>> Mesos) is available, then I assume you have installed Flink in a standalone
>> mode on the cluster already.
>>
>> There is no support in Flink currently to manage user authentication. Few
>> thoughts on how that may evolve
>>
>> 1) It should be not too hard to add authentication to the web dashboard.
>> That way, if the cluster is otherwise blocked off (the master's RPC ports
>> are firewalled), one would have restricted job starts.
>>
>> 2) We plan to add authenticated / encrypted connections soon. With that,
>> the client that submits the program would need to have access to the
>> keystore or key and the corresponding password to connect.
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Mon, Jan 11, 2016 at 3:46 PM, Sourav Mazumder
>> <[hidden email]> wrote:
>>>
>>> Thanks Steven for your details response. Things are more clear to me now.
>>>
>>> A follow up Qs -
>>> Looks like most of the security support depends on Hadoop ? What happens
>>> if anyone wants to use Flink with Hadoop (in a cluster where Hadoop is not
>>> there) ?
>>>
>>> Regards,
>>> Sourav
>>>
>>> On Sun, Jan 10, 2016 at 12:41 PM, Stephan Ewen <[hidden email]> wrote:
>>>>
>>>> Hi Sourav!
>>>>
>>>> There is user-authentication support in Flink via the Hadoop / Kerberos
>>>> infrastructure. If you run Flink on YARN, it should seamlessly work that
>>>> Flink acquires the Kerberos tokens of the user that submits programs, and
>>>> authenticate itself at YARN, HDFS, and HBase with that.
>>>>
>>>> If you run Flink standalone, Flink can still authenticate at HDFS/HBase
>>>> via Kerberos, with a bit of manual help by the user (running kinit on the
>>>> workers).
>>>>
>>>> With Kafka 0.9 and Flink's upcoming connector
>>>> (https://github.com/apache/flink/pull/1489), streaming programs can
>>>> authenticate themselves as stream brokers via SSL (and read via encrypted
>>>> connections).
>>>>
>>>>
>>>> What we have on the roadmap for the coming months it the following:
>>>>   - Encrypt in-flight data streams that are exchanged between worker
>>>> nodes (TaskManagers).
>>>>   - Encrypt the coordination messages between client/master/workers.
>>>> Note that these refer to encryption between Flink's own components only,
>>>> which would use transient keys generated just for a specific job or session
>>>> (hence would not need any user involvement).
>>>>
>>>>
>>>> Let us know if that answers your questions, and if that meets your
>>>> requirements.
>>>>
>>>> Greetings,
>>>> Stephan
>>>>
>>>>
>>>> On Fri, Jan 8, 2016 at 3:23 PM, Sourav Mazumder
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Can anyone point me to ant documentation on support for Security in
>>>>> Flink ?
>>>>>
>>>>> The type of information I'm looking for are -
>>>>>
>>>>> 1. How do I do user level authentication to ensure that a job is
>>>>> submitted/deleted/modified by the right user ? Is it possible though the web
>>>>> client ?
>>>>> 2. Authentication across multiple slave nodes (where the task managers
>>>>> are running) and driver program so that they can communicate with each other
>>>>> 3. Support for SSL/encryption for data exchanged happening across the
>>>>> slave nodes
>>>>> 4. Support for pluggable authentication with existing solution like
>>>>> LDAP
>>>>>
>>>>> If not there today is there a roadmap for these security features ?
>>>>>
>>>>> Regards,
>>>>> Sourav
>>>>
>>>>
>>>
>>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com