[SURVEY] Remove Mesos support

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: [BULK]Re: [SURVEY] Remove Mesos support

rmetzger0
+1



On Mon, Mar 29, 2021 at 5:44 AM Yangze Guo <[hidden email]> wrote:
+1

Best,
Yangze Guo

On Mon, Mar 29, 2021 at 11:31 AM Xintong Song <[hidden email]> wrote:
>
> +1
> It's already a matter of fact for a while that we no longer port new features to the Mesos deployment.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Fri, Mar 26, 2021 at 10:37 PM Till Rohrmann <[hidden email]> wrote:
>>
>> +1 for officially deprecating this component for the 1.13 release.
>>
>> Cheers,
>> Till
>>
>> On Thu, Mar 25, 2021 at 1:49 PM Konstantin Knauf <[hidden email]> wrote:
>>>
>>> Hi Matthias,
>>>
>>> Thank you for following up on this. +1 to officially deprecate Mesos in the code and documentation, too. It will be confusing for users if this diverges from the roadmap.
>>>
>>> Cheers,
>>>
>>> Konstantin
>>>
>>> On Thu, Mar 25, 2021 at 12:23 PM Matthias Pohl <[hidden email]> wrote:
>>>>
>>>> Hi everyone,
>>>> considering the upcoming release of Flink 1.13, I wanted to revive the
>>>> discussion about the Mesos support ones more. Mesos is also already listed
>>>> as deprecated in Flink's overall roadmap [1]. Maybe, it's time to align the
>>>> documentation accordingly to make it more explicit?
>>>>
>>>> What do you think?
>>>>
>>>> Best,
>>>> Matthias
>>>>
>>>> [1] https://flink.apache.org/roadmap.html#feature-radar
>>>>
>>>> On Wed, Oct 28, 2020 at 9:40 AM Till Rohrmann <[hidden email]> wrote:
>>>>
>>>> > Hi Oleksandr,
>>>> >
>>>> > yes you are right. The biggest problem is at the moment the lack of test
>>>> > coverage and thereby confidence to make changes. We have some e2e tests
>>>> > which you can find here [1]. These tests are, however, quite coarse grained
>>>> > and are missing a lot of cases. One idea would be to add a Mesos e2e test
>>>> > based on Flink's end-to-end test framework [2]. I think what needs to be
>>>> > done there is to add a Mesos resource and a way to submit jobs to a Mesos
>>>> > cluster to write e2e tests.
>>>> >
>>>> > [1] https://github.com/apache/flink/tree/master/flink-jepsen
>>>> > [2]
>>>> > https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common
>>>> >
>>>> > Cheers,
>>>> > Till
>>>> >
>>>> > On Tue, Oct 27, 2020 at 12:29 PM Oleksandr Nitavskyi <
>>>> > [hidden email]> wrote:
>>>> >
>>>> >> Hello Xintong,
>>>> >>
>>>> >> Thanks for the insights and support.
>>>> >>
>>>> >> Browsing the Mesos backlog and didn't identify anything critical, which
>>>> >> is left there.
>>>> >>
>>>> >> I see that there are were quite a lot of contributions to the Flink Mesos
>>>> >> in the recent version:
>>>> >> https://github.com/apache/flink/commits/master/flink-mesos.
>>>> >> We plan to validate the current Flink master (or release 1.12 branch) our
>>>> >> Mesos setup. In case of any issues, we will try to propose changes.
>>>> >> My feeling is that our test results shouldn't affect the Flink 1.12
>>>> >> release cycle. And if any potential commits will land into the 1.12.1 it
>>>> >> should be totally fine.
>>>> >>
>>>> >> In the future, we would be glad to help you guys with any
>>>> >> maintenance-related questions. One of the highest priorities around this
>>>> >> component seems to be the development of the full e2e test.
>>>> >>
>>>> >> Kind Regards
>>>> >> Oleksandr Nitavskyi
>>>> >> ________________________________
>>>> >> From: Xintong Song <[hidden email]>
>>>> >> Sent: Tuesday, October 27, 2020 7:14 AM
>>>> >> To: dev <[hidden email]>; user <[hidden email]>
>>>> >> Cc: Piyush Narang <[hidden email]>
>>>> >> Subject: [BULK]Re: [SURVEY] Remove Mesos support
>>>> >>
>>>> >> Hi Piyush,
>>>> >>
>>>> >> Thanks a lot for sharing the information. It would be a great relief that
>>>> >> you are good with Flink on Mesos as is.
>>>> >>
>>>> >> As for the jira issues, I believe the most essential ones should have
>>>> >> already been resolved. You may find some remaining open issues here [1],
>>>> >> but not all of them are necessary if we decide to keep Flink on Mesos as is.
>>>> >>
>>>> >> At the moment and in the short future, I think helps are mostly needed on
>>>> >> testing the upcoming release 1.12 with Mesos use cases. The community is
>>>> >> currently actively preparing the new release, and hopefully we could come
>>>> >> up with a release candidate early next month. It would be greatly
>>>> >> appreciated if you fork as experienced Flink on Mesos users can help with
>>>> >> verifying the release candidates.
>>>> >>
>>>> >>
>>>> >> Thank you~
>>>> >>
>>>> >> Xintong Song
>>>> >>
>>>> >> [1]
>>>> >> https://issues.apache.org/jira/browse/FLINK-17402?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20%22Deployment%20%2F%20Mesos%22%20AND%20status%20%3D%20Open
>>>> >> <
>>>> >> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-17402%3Fjql%3Dproject%2520%253D%2520FLINK%2520AND%2520component%2520%253D%2520%2522Deployment%2520%252F%2520Mesos%2522%2520AND%2520status%2520%253D%2520Open&data=04%7C01%7Co.nitavskyi%40criteo.com%7C3585e1f25bdf4e091af808d87a3f92db%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C0%7C637393760750820881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hytJFQE0MCPzMLiQTQTdbg3GVckX5M3r1NPRGrRV8j4%3D&reserved=0
>>>> >> >
>>>> >>
>>>> >> On Tue, Oct 27, 2020 at 2:58 AM Piyush Narang <[hidden email]
>>>> >> <mailto:[hidden email]>> wrote:
>>>> >>
>>>> >> Hi Xintong,
>>>> >>
>>>> >>
>>>> >>
>>>> >> Do you have any jiras that cover any of the items on 1 or 2? I can reach
>>>> >> out to folks internally and see if I can get some folks to commit to
>>>> >> helping out.
>>>> >>
>>>> >>
>>>> >>
>>>> >> To cover the other qs:
>>>> >>
>>>> >>   *   Yes, we’ve not got a plan at the moment to get off Mesos. We use
>>>> >> Yarn for some our Flink workloads when we can. Mesos is only used when we
>>>> >> need streaming capabilities in our WW dcs (as our Yarn is centralized in
>>>> >> one DC)
>>>> >>   *   We’re currently on Flink 1.9 (old planner). We have a plan to bump
>>>> >> to 1.11 / 1.12 this quarter.
>>>> >>   *   We typically upgrade once every 6 months to a year (not every
>>>> >> release). We’d like to speed up the cadence but we’re not there yet.
>>>> >>   *   We’d largely be good with keeping Flink on Mesos as-is and
>>>> >> functional while missing out on some of the newer features. We understand
>>>> >> the pain on the communities side and we can take on the work if we see some
>>>> >> fancy improvement in Flink on Yarn / K8s that we want in Mesos to put in
>>>> >> the request to port it over.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks,
>>>> >>
>>>> >>
>>>> >>
>>>> >> -- Piyush
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> From: Xintong Song <[hidden email]<mailto:[hidden email]>>
>>>> >> Date: Sunday, October 25, 2020 at 10:57 PM
>>>> >> To: dev <[hidden email]<mailto:[hidden email]>>, user <
>>>> >> [hidden email]<mailto:[hidden email]>>
>>>> >> Cc: Lasse Nedergaard <[hidden email]<mailto:
>>>> >> [hidden email]>>, <[hidden email]<mailto:
>>>> >> [hidden email]>>
>>>> >> Subject: Re: [SURVEY] Remove Mesos support
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks for sharing the information with us, Piyush an Lasse.
>>>> >>
>>>> >>
>>>> >>
>>>> >> @Piyush
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thanks for offering the help. IMO, there are currently several problems
>>>> >> that make supporting Flink on Mesos challenging for us.
>>>> >>
>>>> >>   1.  Lack of Mesos experts. AFAIK, there are very few people (if not
>>>> >> none) among the active contributors in this community that are familiar
>>>> >> with Mesos and can help with development on this component.
>>>> >>   2.  Absence of tests. Mesos does not provide a testing cluster, like
>>>> >> `MiniYARNCluster`, making it hard to test interactions between Flink and
>>>> >> Mesos. We have only a few very simple e2e tests running on Mesos deployed
>>>> >> in a docker, covering the most fundamental workflows. We are not sure how
>>>> >> well those tests work, especially against some potential corner cases.
>>>> >>   3.  Divergence from other deployment. Because of 1 and 2, the new
>>>> >> efforts (features, maintenance, refactors) tend to exclude Mesos if
>>>> >> possible. When the new efforts have to touch the Mesos related components
>>>> >> (e.g., changes to the common resource manager interfaces), we have to be
>>>> >> very careful and make as few changes as possible, to avoid accidentally
>>>> >> breaking anything that we are not familiar with. As a result, the component
>>>> >> diverges a lot from other deployment components (K8s/Yarn), which makes it
>>>> >> harder to maintain.
>>>> >>
>>>> >> It would be greatly appreciated if you can help with either of the above
>>>> >> issues.
>>>> >>
>>>> >>
>>>> >>
>>>> >> Additionally, I have a few questions concerning your use cases at Criteo.
>>>> >> IIUC, you are going to stay on Mesos in the foreseeable future, while
>>>> >> keeping the Flink version up-to-date? What Flink version are you currently
>>>> >> using? How often do you upgrade (e.g., every release)? Would you be good
>>>> >> with keeping the Flink on Mesos component as it is (means that deployment
>>>> >> and resource management improvements may not be ported to Mesos), while
>>>> >> keeping other components up-to-date (e.g., improvements from programming
>>>> >> APIs, operators, state backens, etc.)?
>>>> >>
>>>> >>
>>>> >>
>>>> >> Thank you~
>>>> >>
>>>> >> Xintong Song
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Sat, Oct 24, 2020 at 2:48 AM Lasse Nedergaard <
>>>> >> [hidden email]<mailto:[hidden email]>>
>>>> >> wrote:
>>>> >>
>>>> >> Hi
>>>> >>
>>>> >>
>>>> >>
>>>> >> At Trackunit We have been using Mesos for long time but have now moved to
>>>> >> k8s.
>>>> >>
>>>> >> Med venlig hilsen / Best regards
>>>> >>
>>>> >> Lasse Nedergaard
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >>
>>>> >> Den 23. okt. 2020 kl. 17.01 skrev Robert Metzger <[hidden email]
>>>> >> <mailto:[hidden email]>>:
>>>> >>
>>>> >> 
>>>> >>
>>>> >> Hey Piyush,
>>>> >>
>>>> >> thanks a lot for raising this concern. I believe we should keep Mesos in
>>>> >> Flink then in the foreseeable future.
>>>> >>
>>>> >> Your offer to help is much appreciated. We'll let you know once there is
>>>> >> something.
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang <[hidden email]
>>>> >> <mailto:[hidden email]>> wrote:
>>>> >>
>>>> >> Thanks Kostas. If there's items we can help with, I'm sure we'd be able
>>>> >> to find folks who would be excited to contribute / help in any way.
>>>> >>
>>>> >> -- Piyush
>>>> >>
>>>> >>
>>>> >> On 10/23/20, 10:25 AM, "Kostas Kloudas" <[hidden email]<mailto:
>>>> >> [hidden email]>> wrote:
>>>> >>
>>>> >>     Thanks Piyush for the message.
>>>> >>     After this, I revoke my +1. I agree with the previous opinions that we
>>>> >>     cannot drop code that is actively used by users, especially if it
>>>> >>     something that deep in the stack as support for cluster management
>>>> >>     framework.
>>>> >>
>>>> >>     Cheers,
>>>> >>     Kostas
>>>> >>
>>>> >>     On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang <[hidden email]
>>>> >> <mailto:[hidden email]>> wrote:
>>>> >>     >
>>>> >>     > Hi folks,
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > We at Criteo are active users of the Flink on Mesos resource
>>>> >> management component. We are pretty heavy users of Mesos for scheduling
>>>> >> workloads on our edge datacenters and we do want to continue to be able to
>>>> >> run some of our Flink topologies (to compute machine learning short term
>>>> >> features) on those DCs. If possible our vote would be not to drop Mesos
>>>> >> support as that will tie us to an old release / have to maintain a fork as
>>>> >> we’re not planning to migrate off Mesos anytime soon. Is the burden
>>>> >> something that can be helped with by the community? (Or are you referring
>>>> >> to having to ensure PRs handle the Mesos piece as well when they touch the
>>>> >> resource managers?)
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > Thanks,
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > -- Piyush
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > From: Till Rohrmann <[hidden email]<mailto:
>>>> >> [hidden email]>>
>>>> >>     > Date: Friday, October 23, 2020 at 8:19 AM
>>>> >>     > To: Xintong Song <[hidden email]<mailto:
>>>> >> [hidden email]>>
>>>> >>     > Cc: dev <[hidden email]<mailto:[hidden email]>>, user <
>>>> >> [hidden email]<mailto:[hidden email]>>
>>>> >>     > Subject: Re: [SURVEY] Remove Mesos support
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > Thanks for starting this survey Robert! I second Konstantin and
>>>> >> Xintong in the sense that our Mesos user's opinions should matter most
>>>> >> here. If our community is no longer using the Mesos integration, then I
>>>> >> would be +1 for removing it in order to decrease the maintenance burden.
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > Cheers,
>>>> >>     >
>>>> >>     > Till
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song <[hidden email]
>>>> >> <mailto:[hidden email]>> wrote:
>>>> >>     >
>>>> >>     > +1 for adding a warning in 1.12 about planning to remove Mesos
>>>> >> support.
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > With my developer hat on, removing the Mesos support would
>>>> >> definitely reduce the maintaining overhead for the deployment and resource
>>>> >> management related components. On the other hand, the Flink on Mesos users'
>>>> >> voices definitely matter a lot for this community. Either way, it would be
>>>> >> good to draw users attention to this discussion early.
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > Thank you~
>>>> >>     >
>>>> >>     > Xintong Song
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     >
>>>> >>     > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf <[hidden email]
>>>> >> <mailto:[hidden email]>> wrote:
>>>> >>     >
>>>> >>     > Hi Robert,
>>>> >>     >
>>>> >>     > +1 to the plan you outlined. If we were to drop support in Flink
>>>> >> 1.13+, we
>>>> >>     > would still support it in Flink 1.12- with bug fixes for some time
>>>> >> so that
>>>> >>     > users have time to move on.
>>>> >>     >
>>>> >>     > It would certainly be very interesting to hear from current Flink
>>>> >> on Mesos
>>>> >>     > users, on how they see the evolution of this part of the ecosystem.
>>>> >>     >
>>>> >>     > Best,
>>>> >>     >
>>>> >>     > Konstantin
>>>> >>
>>>> >
>>>
>>>
>>>
>>> --
>>>
>>> Konstantin Knauf
>>>
>>> https://twitter.com/snntrable
>>>
>>> https://github.com/knaufk
Reply | Threaded
Open this post in threaded view
|

Re: [BULK]Re: [SURVEY] Remove Mesos support

Matthias
Thanks for everyone's feedback. I'm gonna initiate a vote in a separate thread.

On Mon, Mar 29, 2021 at 9:18 AM Robert Metzger <[hidden email]> wrote:
+1



On Mon, Mar 29, 2021 at 5:44 AM Yangze Guo <[hidden email]> wrote:

> +1
>
> Best,
> Yangze Guo
>
> On Mon, Mar 29, 2021 at 11:31 AM Xintong Song <[hidden email]>
> wrote:
> >
> > +1
> > It's already a matter of fact for a while that we no longer port new
> features to the Mesos deployment.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Fri, Mar 26, 2021 at 10:37 PM Till Rohrmann <[hidden email]>
> wrote:
> >>
> >> +1 for officially deprecating this component for the 1.13 release.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Thu, Mar 25, 2021 at 1:49 PM Konstantin Knauf <[hidden email]>
> wrote:
> >>>
> >>> Hi Matthias,
> >>>
> >>> Thank you for following up on this. +1 to officially deprecate Mesos
> in the code and documentation, too. It will be confusing for users if this
> diverges from the roadmap.
> >>>
> >>> Cheers,
> >>>
> >>> Konstantin
> >>>
> >>> On Thu, Mar 25, 2021 at 12:23 PM Matthias Pohl <[hidden email]>
> wrote:
> >>>>
> >>>> Hi everyone,
> >>>> considering the upcoming release of Flink 1.13, I wanted to revive the
> >>>> discussion about the Mesos support ones more. Mesos is also already
> listed
> >>>> as deprecated in Flink's overall roadmap [1]. Maybe, it's time to
> align the
> >>>> documentation accordingly to make it more explicit?
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Best,
> >>>> Matthias
> >>>>
> >>>> [1] https://flink.apache.org/roadmap.html#feature-radar
> >>>>
> >>>> On Wed, Oct 28, 2020 at 9:40 AM Till Rohrmann <[hidden email]>
> wrote:
> >>>>
> >>>> > Hi Oleksandr,
> >>>> >
> >>>> > yes you are right. The biggest problem is at the moment the lack of
> test
> >>>> > coverage and thereby confidence to make changes. We have some e2e
> tests
> >>>> > which you can find here [1]. These tests are, however, quite coarse
> grained
> >>>> > and are missing a lot of cases. One idea would be to add a Mesos
> e2e test
> >>>> > based on Flink's end-to-end test framework [2]. I think what needs
> to be
> >>>> > done there is to add a Mesos resource and a way to submit jobs to a
> Mesos
> >>>> > cluster to write e2e tests.
> >>>> >
> >>>> > [1] https://github.com/apache/flink/tree/master/flink-jepsen
> >>>> > [2]
> >>>> >
> https://github.com/apache/flink/tree/master/flink-end-to-end-tests/flink-end-to-end-tests-common
> >>>> >
> >>>> > Cheers,
> >>>> > Till
> >>>> >
> >>>> > On Tue, Oct 27, 2020 at 12:29 PM Oleksandr Nitavskyi <
> >>>> > [hidden email]> wrote:
> >>>> >
> >>>> >> Hello Xintong,
> >>>> >>
> >>>> >> Thanks for the insights and support.
> >>>> >>
> >>>> >> Browsing the Mesos backlog and didn't identify anything critical,
> which
> >>>> >> is left there.
> >>>> >>
> >>>> >> I see that there are were quite a lot of contributions to the
> Flink Mesos
> >>>> >> in the recent version:
> >>>> >> https://github.com/apache/flink/commits/master/flink-mesos.
> >>>> >> We plan to validate the current Flink master (or release 1.12
> branch) our
> >>>> >> Mesos setup. In case of any issues, we will try to propose changes.
> >>>> >> My feeling is that our test results shouldn't affect the Flink 1.12
> >>>> >> release cycle. And if any potential commits will land into the
> 1.12.1 it
> >>>> >> should be totally fine.
> >>>> >>
> >>>> >> In the future, we would be glad to help you guys with any
> >>>> >> maintenance-related questions. One of the highest priorities
> around this
> >>>> >> component seems to be the development of the full e2e test.
> >>>> >>
> >>>> >> Kind Regards
> >>>> >> Oleksandr Nitavskyi
> >>>> >> ________________________________
> >>>> >> From: Xintong Song <[hidden email]>
> >>>> >> Sent: Tuesday, October 27, 2020 7:14 AM
> >>>> >> To: dev <[hidden email]>; user <[hidden email]>
> >>>> >> Cc: Piyush Narang <[hidden email]>
> >>>> >> Subject: [BULK]Re: [SURVEY] Remove Mesos support
> >>>> >>
> >>>> >> Hi Piyush,
> >>>> >>
> >>>> >> Thanks a lot for sharing the information. It would be a great
> relief that
> >>>> >> you are good with Flink on Mesos as is.
> >>>> >>
> >>>> >> As for the jira issues, I believe the most essential ones should
> have
> >>>> >> already been resolved. You may find some remaining open issues
> here [1],
> >>>> >> but not all of them are necessary if we decide to keep Flink on
> Mesos as is.
> >>>> >>
> >>>> >> At the moment and in the short future, I think helps are mostly
> needed on
> >>>> >> testing the upcoming release 1.12 with Mesos use cases. The
> community is
> >>>> >> currently actively preparing the new release, and hopefully we
> could come
> >>>> >> up with a release candidate early next month. It would be greatly
> >>>> >> appreciated if you fork as experienced Flink on Mesos users can
> help with
> >>>> >> verifying the release candidates.
> >>>> >>
> >>>> >>
> >>>> >> Thank you~
> >>>> >>
> >>>> >> Xintong Song
> >>>> >>
> >>>> >> [1]
> >>>> >>
> https://issues.apache.org/jira/browse/FLINK-17402?jql=project%20%3D%20FLINK%20AND%20component%20%3D%20%22Deployment%20%2F%20Mesos%22%20AND%20status%20%3D%20Open
> >>>> >> <
> >>>> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FFLINK-17402%3Fjql%3Dproject%2520%253D%2520FLINK%2520AND%2520component%2520%253D%2520%2522Deployment%2520%252F%2520Mesos%2522%2520AND%2520status%2520%253D%2520Open&data=04%7C01%7Co.nitavskyi%40criteo.com%7C3585e1f25bdf4e091af808d87a3f92db%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C0%7C637393760750820881%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=hytJFQE0MCPzMLiQTQTdbg3GVckX5M3r1NPRGrRV8j4%3D&reserved=0
> >>>> >> >
> >>>> >>
> >>>> >> On Tue, Oct 27, 2020 at 2:58 AM Piyush Narang <[hidden email]
> >>>> >> <mailto:[hidden email]>> wrote:
> >>>> >>
> >>>> >> Hi Xintong,
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Do you have any jiras that cover any of the items on 1 or 2? I can
> reach
> >>>> >> out to folks internally and see if I can get some folks to commit
> to
> >>>> >> helping out.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> To cover the other qs:
> >>>> >>
> >>>> >>   *   Yes, we’ve not got a plan at the moment to get off Mesos. We
> use
> >>>> >> Yarn for some our Flink workloads when we can. Mesos is only used
> when we
> >>>> >> need streaming capabilities in our WW dcs (as our Yarn is
> centralized in
> >>>> >> one DC)
> >>>> >>   *   We’re currently on Flink 1.9 (old planner). We have a plan
> to bump
> >>>> >> to 1.11 / 1.12 this quarter.
> >>>> >>   *   We typically upgrade once every 6 months to a year (not every
> >>>> >> release). We’d like to speed up the cadence but we’re not there
> yet.
> >>>> >>   *   We’d largely be good with keeping Flink on Mesos as-is and
> >>>> >> functional while missing out on some of the newer features. We
> understand
> >>>> >> the pain on the communities side and we can take on the work if we
> see some
> >>>> >> fancy improvement in Flink on Yarn / K8s that we want in Mesos to
> put in
> >>>> >> the request to port it over.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Thanks,
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> -- Piyush
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> From: Xintong Song <[hidden email]<mailto:
> [hidden email]>>
> >>>> >> Date: Sunday, October 25, 2020 at 10:57 PM
> >>>> >> To: dev <[hidden email]<mailto:[hidden email]>>, user
> <
> >>>> >> [hidden email]<mailto:[hidden email]>>
> >>>> >> Cc: Lasse Nedergaard <[hidden email]<mailto:
> >>>> >> [hidden email]>>, <[hidden email]<mailto:
> >>>> >> [hidden email]>>
> >>>> >> Subject: Re: [SURVEY] Remove Mesos support
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Thanks for sharing the information with us, Piyush an Lasse.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> @Piyush
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Thanks for offering the help. IMO, there are currently several
> problems
> >>>> >> that make supporting Flink on Mesos challenging for us.
> >>>> >>
> >>>> >>   1.  Lack of Mesos experts. AFAIK, there are very few people (if
> not
> >>>> >> none) among the active contributors in this community that are
> familiar
> >>>> >> with Mesos and can help with development on this component.
> >>>> >>   2.  Absence of tests. Mesos does not provide a testing cluster,
> like
> >>>> >> `MiniYARNCluster`, making it hard to test interactions between
> Flink and
> >>>> >> Mesos. We have only a few very simple e2e tests running on Mesos
> deployed
> >>>> >> in a docker, covering the most fundamental workflows. We are not
> sure how
> >>>> >> well those tests work, especially against some potential corner
> cases.
> >>>> >>   3.  Divergence from other deployment. Because of 1 and 2, the new
> >>>> >> efforts (features, maintenance, refactors) tend to exclude Mesos if
> >>>> >> possible. When the new efforts have to touch the Mesos related
> components
> >>>> >> (e.g., changes to the common resource manager interfaces), we have
> to be
> >>>> >> very careful and make as few changes as possible, to avoid
> accidentally
> >>>> >> breaking anything that we are not familiar with. As a result, the
> component
> >>>> >> diverges a lot from other deployment components (K8s/Yarn), which
> makes it
> >>>> >> harder to maintain.
> >>>> >>
> >>>> >> It would be greatly appreciated if you can help with either of the
> above
> >>>> >> issues.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Additionally, I have a few questions concerning your use cases at
> Criteo.
> >>>> >> IIUC, you are going to stay on Mesos in the foreseeable future,
> while
> >>>> >> keeping the Flink version up-to-date? What Flink version are you
> currently
> >>>> >> using? How often do you upgrade (e.g., every release)? Would you
> be good
> >>>> >> with keeping the Flink on Mesos component as it is (means that
> deployment
> >>>> >> and resource management improvements may not be ported to Mesos),
> while
> >>>> >> keeping other components up-to-date (e.g., improvements from
> programming
> >>>> >> APIs, operators, state backens, etc.)?
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Thank you~
> >>>> >>
> >>>> >> Xintong Song
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On Sat, Oct 24, 2020 at 2:48 AM Lasse Nedergaard <
> >>>> >> [hidden email]<mailto:
> [hidden email]>>
> >>>> >> wrote:
> >>>> >>
> >>>> >> Hi
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> At Trackunit We have been using Mesos for long time but have now
> moved to
> >>>> >> k8s.
> >>>> >>
> >>>> >> Med venlig hilsen / Best regards
> >>>> >>
> >>>> >> Lasse Nedergaard
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> Den 23. okt. 2020 kl. 17.01 skrev Robert Metzger <
> [hidden email]
> >>>> >> <mailto:[hidden email]>>:
> >>>> >>
> >>>> >> 
> >>>> >>
> >>>> >> Hey Piyush,
> >>>> >>
> >>>> >> thanks a lot for raising this concern. I believe we should keep
> Mesos in
> >>>> >> Flink then in the foreseeable future.
> >>>> >>
> >>>> >> Your offer to help is much appreciated. We'll let you know once
> there is
> >>>> >> something.
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> On Fri, Oct 23, 2020 at 4:28 PM Piyush Narang <[hidden email]
> >>>> >> <mailto:[hidden email]>> wrote:
> >>>> >>
> >>>> >> Thanks Kostas. If there's items we can help with, I'm sure we'd be
> able
> >>>> >> to find folks who would be excited to contribute / help in any way.
> >>>> >>
> >>>> >> -- Piyush
> >>>> >>
> >>>> >>
> >>>> >> On 10/23/20, 10:25 AM, "Kostas Kloudas" <[hidden email]
> <mailto:
> >>>> >> [hidden email]>> wrote:
> >>>> >>
> >>>> >>     Thanks Piyush for the message.
> >>>> >>     After this, I revoke my +1. I agree with the previous opinions
> that we
> >>>> >>     cannot drop code that is actively used by users, especially if
> it
> >>>> >>     something that deep in the stack as support for cluster
> management
> >>>> >>     framework.
> >>>> >>
> >>>> >>     Cheers,
> >>>> >>     Kostas
> >>>> >>
> >>>> >>     On Fri, Oct 23, 2020 at 4:15 PM Piyush Narang <
> [hidden email]
> >>>> >> <mailto:[hidden email]>> wrote:
> >>>> >>     >
> >>>> >>     > Hi folks,
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > We at Criteo are active users of the Flink on Mesos resource
> >>>> >> management component. We are pretty heavy users of Mesos for
> scheduling
> >>>> >> workloads on our edge datacenters and we do want to continue to be
> able to
> >>>> >> run some of our Flink topologies (to compute machine learning
> short term
> >>>> >> features) on those DCs. If possible our vote would be not to drop
> Mesos
> >>>> >> support as that will tie us to an old release / have to maintain a
> fork as
> >>>> >> we’re not planning to migrate off Mesos anytime soon. Is the burden
> >>>> >> something that can be helped with by the community? (Or are you
> referring
> >>>> >> to having to ensure PRs handle the Mesos piece as well when they
> touch the
> >>>> >> resource managers?)
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > Thanks,
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > -- Piyush
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > From: Till Rohrmann <[hidden email]<mailto:
> >>>> >> [hidden email]>>
> >>>> >>     > Date: Friday, October 23, 2020 at 8:19 AM
> >>>> >>     > To: Xintong Song <[hidden email]<mailto:
> >>>> >> [hidden email]>>
> >>>> >>     > Cc: dev <[hidden email]<mailto:[hidden email]>>,
> user <
> >>>> >> [hidden email]<mailto:[hidden email]>>
> >>>> >>     > Subject: Re: [SURVEY] Remove Mesos support
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > Thanks for starting this survey Robert! I second Konstantin
> and
> >>>> >> Xintong in the sense that our Mesos user's opinions should matter
> most
> >>>> >> here. If our community is no longer using the Mesos integration,
> then I
> >>>> >> would be +1 for removing it in order to decrease the maintenance
> burden.
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > Cheers,
> >>>> >>     >
> >>>> >>     > Till
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > On Fri, Oct 23, 2020 at 2:03 PM Xintong Song <
> [hidden email]
> >>>> >> <mailto:[hidden email]>> wrote:
> >>>> >>     >
> >>>> >>     > +1 for adding a warning in 1.12 about planning to remove
> Mesos
> >>>> >> support.
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > With my developer hat on, removing the Mesos support would
> >>>> >> definitely reduce the maintaining overhead for the deployment and
> resource
> >>>> >> management related components. On the other hand, the Flink on
> Mesos users'
> >>>> >> voices definitely matter a lot for this community. Either way, it
> would be
> >>>> >> good to draw users attention to this discussion early.
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > Thank you~
> >>>> >>     >
> >>>> >>     > Xintong Song
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     >
> >>>> >>     > On Fri, Oct 23, 2020 at 7:53 PM Konstantin Knauf <
> [hidden email]
> >>>> >> <mailto:[hidden email]>> wrote:
> >>>> >>     >
> >>>> >>     > Hi Robert,
> >>>> >>     >
> >>>> >>     > +1 to the plan you outlined. If we were to drop support in
> Flink
> >>>> >> 1.13+, we
> >>>> >>     > would still support it in Flink 1.12- with bug fixes for
> some time
> >>>> >> so that
> >>>> >>     > users have time to move on.
> >>>> >>     >
> >>>> >>     > It would certainly be very interesting to hear from current
> Flink
> >>>> >> on Mesos
> >>>> >>     > users, on how they see the evolution of this part of the
> ecosystem.
> >>>> >>     >
> >>>> >>     > Best,
> >>>> >>     >
> >>>> >>     > Konstantin
> >>>> >>
> >>>> >
> >>>
> >>>
> >>>
> >>> --
> >>>
> >>> Konstantin Knauf
> >>>
> >>> https://twitter.com/snntrable
> >>>
> >>> https://github.com/knaufk
>
12