[ANNOUNCE] Weekly community update #12

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[ANNOUNCE] Weekly community update #12

Till Rohrmann
Dear community,

I've noticed that Flink has grown quite a bit in the past. As a consequence it can be quite challenging to stay up to date. Especially for community members who don't follow Flink's MLs on a daily basis.

In order to keep a bigger part of the community in the loop, I wanted to try out a weekly update letter where I update the community with what happened from my perspective. Since I also don't know everything I want to encourage others to post updates about things they deem important and relevant for the community to this thread.

# Weekly update #12:

## Flink 1.5 release:
- The Flink community is still working on the Flink 1.5 release. Hopefully Flink 1.5 can be released in the next weeks.
- The main work concentrated last week on stabilizing Flip-6 and adding more automated tests [1]. The Flink community appreciates every helping hand with adding more end to end tests.
- Consequently, the committed changes mainly consisted of bug fixes and test hardening.
- By the end of this week, we hope to have a RC ready which can be used for easier release testing. Given the big changes (network stack and Flip-6), the RC will most likely still contain some rough edges. In order to smooth them out, it would be good if we run Flink 1.5 in as many different scenarios as possible.

## Flink 1.3.3. has been released
- Flink 1.3.3 containing an important fix for properly handling checkpoints in case of a DFS problem has been released. We highly recommend that all users running Flink 1.3.2 upgrade swiftly to Flink 1.3.3.

## Misc:
- Shuyi opened a discussion about improving Flink's security [2]. If you are interested and want to help with the next steps please engage in the discussion.

PS: Don't worry that you've missed the first 11 weekly community updates. It's just this week's number.


Cheers,
Till
Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Weekly community update #12

Stephan Ewen
Great initiative, highly appreciated, Till!


On Mon, Mar 19, 2018 at 7:06 PM, Till Rohrmann <[hidden email]> wrote:
Dear community,

I've noticed that Flink has grown quite a bit in the past. As a consequence it can be quite challenging to stay up to date. Especially for community members who don't follow Flink's MLs on a daily basis.

In order to keep a bigger part of the community in the loop, I wanted to try out a weekly update letter where I update the community with what happened from my perspective. Since I also don't know everything I want to encourage others to post updates about things they deem important and relevant for the community to this thread.

# Weekly update #12:

## Flink 1.5 release:
- The Flink community is still working on the Flink 1.5 release. Hopefully Flink 1.5 can be released in the next weeks.
- The main work concentrated last week on stabilizing Flip-6 and adding more automated tests [1]. The Flink community appreciates every helping hand with adding more end to end tests.
- Consequently, the committed changes mainly consisted of bug fixes and test hardening.
- By the end of this week, we hope to have a RC ready which can be used for easier release testing. Given the big changes (network stack and Flip-6), the RC will most likely still contain some rough edges. In order to smooth them out, it would be good if we run Flink 1.5 in as many different scenarios as possible.

## Flink 1.3.3. has been released
- Flink 1.3.3 containing an important fix for properly handling checkpoints in case of a DFS problem has been released. We highly recommend that all users running Flink 1.3.2 upgrade swiftly to Flink 1.3.3.

## Misc:
- Shuyi opened a discussion about improving Flink's security [2]. If you are interested and want to help with the next steps please engage in the discussion.

PS: Don't worry that you've missed the first 11 weekly community updates. It's just this week's number.


Cheers,
Till

Reply | Threaded
Open this post in threaded view
|

Re: [ANNOUNCE] Weekly community update #12

Till Rohrmann
Eron pointed out to me that the Flink improvement proposal 6 (short Flip-6) deserves some more comments since not everyone will be aware of what it actually means. I totally agree and will try to give a bit more context for everyone interested.

Flip-6 is intended to solve some of the Flink's shortcomings with respect to resource management and improve its deployment flexibility. We have seen in the past that Flink's legacy abstraction was not well suited to support an ever increasing set of different deployments. 

Flink started with the standalone mode which runs Flink on a bare-metal cluster. Soon it became evident that people would like to run Flink on top of cluster resource managers such as Yarn or Mesos. Consequently, support for Yarn was added. Until then, there was only a single execution mode which is now called the session mode. The session mode allows you to run multiple jobs on the same Flink cluster at the cost of no resource isolation. With Yarn we added a new per-job mode which starts a Flink cluster for each job and gives you resource isolation. The next step was the integration with Mesos and now many people want to run Flink in a containerized environment (Docker and Kubernetes). 

On top of that, a much sought after feature since quite some time is that Flink should be able to dynamically allocate more resources in order to scale jobs up and release resources if they are not used to capacity. That way one won't waste resources or under provision the Flink cluster if facing changing workloads.

Since Flink has grown over time and some of the requirements weren't clear from the very beginning, it seemed quite difficult to make Flink work in all the different settings with support for dynamic scaling. So in order to make Flink future proof deployment-wise and adding support for full resource elasticity the community started the Flip-6 effort.

Flip-6 split the existing architecture up into 4 components: JobMaster, TaskExecutor, ResourceManager and Dispatcher. The JobMaster is now responsible for running a single job. The TaskExecutor remained more or less the same and is responsible for executing tasks which the JobMaster deploys to it. The ResourceManager is the integration component with an external system like Yarn and Mesos. Its task is to allocate new containers/tasks to spawn new TaskExecutors if need be. The Dispatcher is the component responsible for receiving new jobs and spawning a new JobMaster to execute them.

The idea now is to use these building blocks to implement the session as well as the per-job mode in the different deployment scenarios.

Flink 1.5 will run per default on the new Flip-6 architecture and supports Yarn, Mesos as well as the standalone mode. Thus, it supports the same deployment mode which Flink 1.4 supported. Additionally, it should now be easier to run Flink in a containerized environment since the client now communicates via REST calls with the Flink cluster. On Yarn and Mesos it will also allow to dynamically allocate and free resources which enables rescaling of jobs.

The next logical step would be to provide a better K8 integration which allows K8 to add and remove pods which are then automatically used by Flink.

For more information you can take a look at [1] which gives an overview about the architecture or simply reach out to me.


On Tue, Mar 20, 2018 at 9:35 PM, Stephan Ewen <[hidden email]> wrote:
Great initiative, highly appreciated, Till!


On Mon, Mar 19, 2018 at 7:06 PM, Till Rohrmann <[hidden email]> wrote:

> Dear community,
>
> I've noticed that Flink has grown quite a bit in the past. As a
> consequence it can be quite challenging to stay up to date. Especially for
> community members who don't follow Flink's MLs on a daily basis.
>
> In order to keep a bigger part of the community in the loop, I wanted to
> try out a weekly update letter where I update the community with what
> happened from my perspective. Since I also don't know everything I want to
> encourage others to post updates about things they deem important and
> relevant for the community to this thread.
>
> # Weekly update #12:
>
> ## Flink 1.5 release:
> - The Flink community is still working on the Flink 1.5 release. Hopefully
> Flink 1.5 can be released in the next weeks.
> - The main work concentrated last week on stabilizing Flip-6 and adding
> more automated tests [1]. The Flink community appreciates every helping
> hand with adding more end to end tests.
> - Consequently, the committed changes mainly consisted of bug fixes and
> test hardening.
> - By the end of this week, we hope to have a RC ready which can be used
> for easier release testing. Given the big changes (network stack and
> Flip-6), the RC will most likely still contain some rough edges. In order
> to smooth them out, it would be good if we run Flink 1.5 in as many
> different scenarios as possible.
>
> ## Flink 1.3.3. has been released
> - Flink 1.3.3 containing an important fix for properly handling
> checkpoints in case of a DFS problem has been released. We highly recommend
> that all users running Flink 1.3.2 upgrade swiftly to Flink 1.3.3.
>
> ## Misc:
> - Shuyi opened a discussion about improving Flink's security [2]. If you
> are interested and want to help with the next steps please engage in the
> discussion.
>
> PS: Don't worry that you've missed the first 11 weekly community updates.
> It's just this week's number.
>
> [1] http://apache-flink-mailing-list-archive.1008284.n3.
> nabble.com/ANNOUNCE-Flink-1-5-release-testing-effort-td21646.html
> [2] http://apache-flink-mailing-list-archive.1008284.
> n3.nabble.com/DISCUSS-Flink-security-improvements-td21068.html
>
> Cheers,
> Till
>