[ANNOUNCE] Weekly Community Update 2019/24

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Weekly Community Update 2019/24

Konstantin Knauf-2
Dear community,

last year Till did a great job on summarizing recent developments in the Flink community in a "Weekly community update" thread. I found this very helpful and would like to revive this tradition with a focus on topics & threads which are particularly relevant to the wider community of Flink users.

As we haven't had such an update for some time (since December 2018), I find it impossible to cover everything that's currently going on in this email. I'll try to include most ongoing discussions and FLIPs over the course of the next weeks to catch up. Afterwards I am going to go back to only focus on news since the last update.

You are welcome to share any additional news and updates with the community in this thread.

Flink Development
===============

* [releases] The community is currently working on a Flink 1.8.1 release [1]. The first release candidate should be ready soon (one critical bug to fix as of writing, FLINK-12863).
* [releases] Kurt and Gordon stepped up as release managers for Flink 1.9 and started a thread [2] to sync on the status of various development threads targeted for Flink 1.9. Check it out to see if the feature you are waiting for is likely to make it or not.
* [savepoints] Gordon, Kostas and Congxian have recently started a discussion [3] on unifying the savepoint format across StateBackends, which will enable users to switch between StateBackends when recovering from a Savepoint. The related discussion on introducing Stop-With-Checkpoint [4] initiated by Yu Li is closely related and worth a read to understand the long term vision.
* [savepoints] Seth and Gordon have started a discussion to add a State Processing API ("Savepoint Connector"), which will allow reading & modifying existing Savepoints as well as creating new Savepoints from scratch with the DataSet API. The feature is targeted for Flink 1.9.0 as a new *library*.
* [python-support] Back in April we had a discussion on the mailing list about adding Python Support to the Table API [6]. This support will likely be available in Flink 1.9 (without UDFs and later with UDF support as well). Therefore, Stephan has started a discussion [7] to deprecate the current Python API in Flink 1.9. This has gotten a lot of positive feedback and the only open question as of writing is whether to only deprecate it or to remove it directly.

Notable Bugs
===========

In this section I am going to list some recently discovered bugs, which might be relevant to a larger audience. I'll try to explain them to the best of my knowledge, but no guarantees.

* [FLINK-12296] [1.6.4] [1.7.2] [1.8.0] State can be silently lost when recovering a job with two stateful Operators within the same Operator Chain. This can only be the case when using reinterpretAsKeyedStream and the bug only affects the RocksDBStatebackend with incremetal checkpointing. Fixed in 1.7.3, 1.9.0 and 1.8.1. [8]
* [FLINK-12688] [1.6.4] [1.7.2] [1.8.0] A race condition while initializing the TypeSerializer within a StateDescriptor could lead to rare NullPointerExceptions when a StateDescriptor is shared between threads. Fixed in 1.7.3, 1.9.0 and 1.8.1. [9]
* [FLINK-12653] [1.6.4] [1.7.2] 1.8.0 ] After rescaling a job recovery might fail if some state was only registered in a subset of all Sub-Tasks. This only affects the FileSystemStatebackend. Unresolved. [10]
* [FLINK-11820] [1.7.2] [1.8.0] The SimpleStringSchema of the FlinkKafkaConsumer fails on "null" records. Unresolved, but PR available. [11]
* [FLINK-11162] [1.6.4] [1.7.2] [ 1.8.0] Due to the way the checkpoint directory is cleaned up by the CheckpointCoordinator tasks might fail during materialization of a checkpoint if another task has previously declined the same checkpoint already. The resolution is part of a larger rework of how checkpoint failures are managed and seems to be targeted for Flink 1.9.0. [12]
* [FLINK-10317] [1.6.4] [1.7.2] [1.8.0] Admittedly not a new bug, but still unresolved and discussed: limiting the Java Metaspace size for Flink processes by default. It is not clear right now whether limiting the MetaspaceSize is a good idea. This ticket is a good starting point when running into OOME Metapspace to look for tickets regarding classloader leaks. [13]
* [FLINK-11107] [1.7.2] [1.8.1] When using the MemoryStateBackend with HA, Flink creates many useless (since checkpoints are not externalized) random checkpoint directories under the high-availability directory, which might render the cluster eventually unusable. Fixed in 1.8.1 and 1.9.0. [14]


Events, Blog Posts, Misc
====================

* Nico has recently published the first part [15] of a series of blogposts on Flink's network stack.
* There are a couple of meetups coming up in the next weeks:
    * 2019/06/24: Cloud Native Meetup in Aarhus with a Flink talk by Lasse Nedergard (TrackUnit) [16]
    * 2019/06/25: Bay Area Apache Flink Meetup with talks by Zendesk, Parag Kesar and Ben Liu (Pinterest) and Ken Krugler (Scale Unlimited) [17]
    * 2019/07/01: Paris Apache Beam Meetup with a Flink talk by myself (Ververica) [18]
    * 2019/07/05: Apache Flink Meetup Munich with talks by Steffen Hausmann (AWS) and Michel David (Ryte) [19]

--

Konstantin Knauf | Solutions Architect

+49 160 91394525


--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Weekly Community Update 2019/24

tison
Hi Konstantin and all,

Thank Konstantin very much for reviving this tradition! It reminds
me of the joyful time I can easily catch up interesting ongoing threads.
Thanks for Till's work, too.

Besides exciting updates and news above, I'd like to pick up
some other threads you guys may be interested in.

* xiaogang has recently started a discussion[1] on allowing
at-most-once delivery in case of failures, which adapts Flink
to more scenarios.

* vino has raised a discussion[2] on supporting local aggregation
in Flink, which was received a lot of positive feedbacks and now
there is a ongoing FLIP-44 thread[3].



Konstantin Knauf <[hidden email]> 于2019年6月17日周一 上午12:10写道:
Dear community,

last year Till did a great job on summarizing recent developments in the Flink community in a "Weekly community update" thread. I found this very helpful and would like to revive this tradition with a focus on topics & threads which are particularly relevant to the wider community of Flink users.

As we haven't had such an update for some time (since December 2018), I find it impossible to cover everything that's currently going on in this email. I'll try to include most ongoing discussions and FLIPs over the course of the next weeks to catch up. Afterwards I am going to go back to only focus on news since the last update.

You are welcome to share any additional news and updates with the community in this thread.

Flink Development
===============

* [releases] The community is currently working on a Flink 1.8.1 release [1]. The first release candidate should be ready soon (one critical bug to fix as of writing, FLINK-12863).
* [releases] Kurt and Gordon stepped up as release managers for Flink 1.9 and started a thread [2] to sync on the status of various development threads targeted for Flink 1.9. Check it out to see if the feature you are waiting for is likely to make it or not.
* [savepoints] Gordon, Kostas and Congxian have recently started a discussion [3] on unifying the savepoint format across StateBackends, which will enable users to switch between StateBackends when recovering from a Savepoint. The related discussion on introducing Stop-With-Checkpoint [4] initiated by Yu Li is closely related and worth a read to understand the long term vision.
* [savepoints] Seth and Gordon have started a discussion to add a State Processing API ("Savepoint Connector"), which will allow reading & modifying existing Savepoints as well as creating new Savepoints from scratch with the DataSet API. The feature is targeted for Flink 1.9.0 as a new *library*.
* [python-support] Back in April we had a discussion on the mailing list about adding Python Support to the Table API [6]. This support will likely be available in Flink 1.9 (without UDFs and later with UDF support as well). Therefore, Stephan has started a discussion [7] to deprecate the current Python API in Flink 1.9. This has gotten a lot of positive feedback and the only open question as of writing is whether to only deprecate it or to remove it directly.

Notable Bugs
===========

In this section I am going to list some recently discovered bugs, which might be relevant to a larger audience. I'll try to explain them to the best of my knowledge, but no guarantees.

* [FLINK-12296] [1.6.4] [1.7.2] [1.8.0] State can be silently lost when recovering a job with two stateful Operators within the same Operator Chain. This can only be the case when using reinterpretAsKeyedStream and the bug only affects the RocksDBStatebackend with incremetal checkpointing. Fixed in 1.7.3, 1.9.0 and 1.8.1. [8]
* [FLINK-12688] [1.6.4] [1.7.2] [1.8.0] A race condition while initializing the TypeSerializer within a StateDescriptor could lead to rare NullPointerExceptions when a StateDescriptor is shared between threads. Fixed in 1.7.3, 1.9.0 and 1.8.1. [9]
* [FLINK-12653] [1.6.4] [1.7.2] 1.8.0 ] After rescaling a job recovery might fail if some state was only registered in a subset of all Sub-Tasks. This only affects the FileSystemStatebackend. Unresolved. [10]
* [FLINK-11820] [1.7.2] [1.8.0] The SimpleStringSchema of the FlinkKafkaConsumer fails on "null" records. Unresolved, but PR available. [11]
* [FLINK-11162] [1.6.4] [1.7.2] [ 1.8.0] Due to the way the checkpoint directory is cleaned up by the CheckpointCoordinator tasks might fail during materialization of a checkpoint if another task has previously declined the same checkpoint already. The resolution is part of a larger rework of how checkpoint failures are managed and seems to be targeted for Flink 1.9.0. [12]
* [FLINK-10317] [1.6.4] [1.7.2] [1.8.0] Admittedly not a new bug, but still unresolved and discussed: limiting the Java Metaspace size for Flink processes by default. It is not clear right now whether limiting the MetaspaceSize is a good idea. This ticket is a good starting point when running into OOME Metapspace to look for tickets regarding classloader leaks. [13]
* [FLINK-11107] [1.7.2] [1.8.1] When using the MemoryStateBackend with HA, Flink creates many useless (since checkpoints are not externalized) random checkpoint directories under the high-availability directory, which might render the cluster eventually unusable. Fixed in 1.8.1 and 1.9.0. [14]


Events, Blog Posts, Misc
====================

* Nico has recently published the first part [15] of a series of blogposts on Flink's network stack.
* There are a couple of meetups coming up in the next weeks:
    * 2019/06/24: Cloud Native Meetup in Aarhus with a Flink talk by Lasse Nedergard (TrackUnit) [16]
    * 2019/06/25: Bay Area Apache Flink Meetup with talks by Zendesk, Parag Kesar and Ben Liu (Pinterest) and Ken Krugler (Scale Unlimited) [17]
    * 2019/07/01: Paris Apache Beam Meetup with a Flink talk by myself (Ververica) [18]
    * 2019/07/05: Apache Flink Meetup Munich with talks by Steffen Hausmann (AWS) and Michel David (Ryte) [19]

--

Konstantin Knauf | Solutions Architect

+49 160 91394525


--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Weekly Community Update 2019/24

Konstantin Knauf-2
Hi Zili,

thank you for adding these threads :) I would have otherwise picked them up next week, just couldn't put everything into one email.

Cheers,

Konstantin

On Sun, Jun 16, 2019 at 11:07 PM Zili Chen <[hidden email]> wrote:
Hi Konstantin and all,

Thank Konstantin very much for reviving this tradition! It reminds
me of the joyful time I can easily catch up interesting ongoing threads.
Thanks for Till's work, too.

Besides exciting updates and news above, I'd like to pick up
some other threads you guys may be interested in.

* xiaogang has recently started a discussion[1] on allowing
at-most-once delivery in case of failures, which adapts Flink
to more scenarios.

* vino has raised a discussion[2] on supporting local aggregation
in Flink, which was received a lot of positive feedbacks and now
there is a ongoing FLIP-44 thread[3].



Konstantin Knauf <[hidden email]> 于2019年6月17日周一 上午12:10写道:
Dear community,

last year Till did a great job on summarizing recent developments in the Flink community in a "Weekly community update" thread. I found this very helpful and would like to revive this tradition with a focus on topics & threads which are particularly relevant to the wider community of Flink users.

As we haven't had such an update for some time (since December 2018), I find it impossible to cover everything that's currently going on in this email. I'll try to include most ongoing discussions and FLIPs over the course of the next weeks to catch up. Afterwards I am going to go back to only focus on news since the last update.

You are welcome to share any additional news and updates with the community in this thread.

Flink Development
===============

* [releases] The community is currently working on a Flink 1.8.1 release [1]. The first release candidate should be ready soon (one critical bug to fix as of writing, FLINK-12863).
* [releases] Kurt and Gordon stepped up as release managers for Flink 1.9 and started a thread [2] to sync on the status of various development threads targeted for Flink 1.9. Check it out to see if the feature you are waiting for is likely to make it or not.
* [savepoints] Gordon, Kostas and Congxian have recently started a discussion [3] on unifying the savepoint format across StateBackends, which will enable users to switch between StateBackends when recovering from a Savepoint. The related discussion on introducing Stop-With-Checkpoint [4] initiated by Yu Li is closely related and worth a read to understand the long term vision.
* [savepoints] Seth and Gordon have started a discussion to add a State Processing API ("Savepoint Connector"), which will allow reading & modifying existing Savepoints as well as creating new Savepoints from scratch with the DataSet API. The feature is targeted for Flink 1.9.0 as a new *library*.
* [python-support] Back in April we had a discussion on the mailing list about adding Python Support to the Table API [6]. This support will likely be available in Flink 1.9 (without UDFs and later with UDF support as well). Therefore, Stephan has started a discussion [7] to deprecate the current Python API in Flink 1.9. This has gotten a lot of positive feedback and the only open question as of writing is whether to only deprecate it or to remove it directly.

Notable Bugs
===========

In this section I am going to list some recently discovered bugs, which might be relevant to a larger audience. I'll try to explain them to the best of my knowledge, but no guarantees.

* [FLINK-12296] [1.6.4] [1.7.2] [1.8.0] State can be silently lost when recovering a job with two stateful Operators within the same Operator Chain. This can only be the case when using reinterpretAsKeyedStream and the bug only affects the RocksDBStatebackend with incremetal checkpointing. Fixed in 1.7.3, 1.9.0 and 1.8.1. [8]
* [FLINK-12688] [1.6.4] [1.7.2] [1.8.0] A race condition while initializing the TypeSerializer within a StateDescriptor could lead to rare NullPointerExceptions when a StateDescriptor is shared between threads. Fixed in 1.7.3, 1.9.0 and 1.8.1. [9]
* [FLINK-12653] [1.6.4] [1.7.2] 1.8.0 ] After rescaling a job recovery might fail if some state was only registered in a subset of all Sub-Tasks. This only affects the FileSystemStatebackend. Unresolved. [10]
* [FLINK-11820] [1.7.2] [1.8.0] The SimpleStringSchema of the FlinkKafkaConsumer fails on "null" records. Unresolved, but PR available. [11]
* [FLINK-11162] [1.6.4] [1.7.2] [ 1.8.0] Due to the way the checkpoint directory is cleaned up by the CheckpointCoordinator tasks might fail during materialization of a checkpoint if another task has previously declined the same checkpoint already. The resolution is part of a larger rework of how checkpoint failures are managed and seems to be targeted for Flink 1.9.0. [12]
* [FLINK-10317] [1.6.4] [1.7.2] [1.8.0] Admittedly not a new bug, but still unresolved and discussed: limiting the Java Metaspace size for Flink processes by default. It is not clear right now whether limiting the MetaspaceSize is a good idea. This ticket is a good starting point when running into OOME Metapspace to look for tickets regarding classloader leaks. [13]
* [FLINK-11107] [1.7.2] [1.8.1] When using the MemoryStateBackend with HA, Flink creates many useless (since checkpoints are not externalized) random checkpoint directories under the high-availability directory, which might render the cluster eventually unusable. Fixed in 1.8.1 and 1.9.0. [14]


Events, Blog Posts, Misc
====================

* Nico has recently published the first part [15] of a series of blogposts on Flink's network stack.
* There are a couple of meetups coming up in the next weeks:
    * 2019/06/24: Cloud Native Meetup in Aarhus with a Flink talk by Lasse Nedergard (TrackUnit) [16]
    * 2019/06/25: Bay Area Apache Flink Meetup with talks by Zendesk, Parag Kesar and Ben Liu (Pinterest) and Ken Krugler (Scale Unlimited) [17]
    * 2019/07/01: Paris Apache Beam Meetup with a Flink talk by myself (Ververica) [18]
    * 2019/07/05: Apache Flink Meetup Munich with talks by Steffen Hausmann (AWS) and Michel David (Ryte) [19]

--

Konstantin Knauf | Solutions Architect

+49 160 91394525


--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen   


--

Konstantin Knauf | Solutions Architect

+49 160 91394525


Planned Absences: 20. - 21.06.2019, 10.08.2019 - 31.08.2019, 05.09. - 06.09.2010


--

Data Artisans GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--

Data Artisans GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen