Thanks Aljoscha for starting this discussion. The described problem brings us indeed a bit into a pickle. Even with option 1) I think it is somewhat API breaking because everyone who used lambdas without types needs to add them now. Consequently, I only see two real options out of the ones you've proposed: 1) Disambiguate the API (either by removing reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit problematic because then all Scala API users who have implemented a GroupReduceFunction need to convert it into a Scala lambda. Moreover, I think it will be problematic with RichGroupReduceFunction which you need to get access to the RuntimeContext. Maintaining two master branches puts a lot of burden onto the developers to always keep the two branches in sync. Ideally I would like to avoid this. I also played a little bit around with implicit conversions to add the lambda methods in Scala 2.11 on demand, but I was not able to get it work smoothly. Cheers, Till On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> wrote: The second alternative, with the addition of methods that take functions |
I'd rather not maintain 2 master branches. Beyond the maintenance
overhead I'm wondering about the benefit, as the API break still has to happen at some point. @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? If this is the only blocker I suggest to make the breaking change in 1.8. On 05.10.2018 10:31, Till Rohrmann wrote: > Thanks Aljoscha for starting this discussion. The described problem brings > us indeed a bit into a pickle. Even with option 1) I think it is somewhat > API breaking because everyone who used lambdas without types needs to add > them now. Consequently, I only see two real options out of the ones you've > proposed: > > 1) Disambiguate the API (either by removing > reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) > 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out > > Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit > problematic because then all Scala API users who have implemented a > GroupReduceFunction need to convert it into a Scala lambda. Moreover, I > think it will be problematic with RichGroupReduceFunction which you need to > get access to the RuntimeContext. > > Maintaining two master branches puts a lot of burden onto the developers to > always keep the two branches in sync. Ideally I would like to avoid this. > > I also played a little bit around with implicit conversions to add the > lambda methods in Scala 2.11 on demand, but I was not able to get it work > smoothly. > > I'm cross posting this thread to user as well to get some more user > feedback. > > Cheers, > Till > > On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> > wrote: > >> The second alternative, with the addition of methods that take functions >> with Scala types, seems the most sensible. I wonder if there is a need >> then to maintain the *J Java parameter methods, or whether users could just >> access the functionality by converting the Scala DataStreams to Java via >> .javaStream and whatever the equivalent is for DataSets. >> >> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <[hidden email]> >> wrote: >> >>> Hi, >>> >>> I'm currently working on >> https://issues.apache.org/jira/browse/FLINK-7811, >>> with the goal of adding support for Scala 2.12. There is a bit of a >> hurdle >>> and I have to explain some context first. >>> >>> With Scala 2.12, lambdas are implemented using the lambda mechanism of >>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This >>> means that the following two method definitions can both take a lambda: >>> >>> def map[R](mapper: MapFunction[T, R]): DataSet[R] >>> def map[R](fun: T => R): DataSet[R] >>> >>> The Scala compiler gives precedence to the lambda version when you call >>> map() with a lambda in simple cases, so it works here. You could still >> call >>> map() with a lambda if the lambda version of the method weren't here >>> because they are now considered the same. For Scala 2.11 we need both >>> signatures, though, to allow calling with a lambda and with a >> MapFunction. >>> The problem is with more complicated method signatures, like: >>> >>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit): >>> DataSet[R] >>> >>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R] >>> >>> (for reference, GroupReduceFunction is a SAM with void >>> reduce(java.lang.Iterable<T> values, Collector<O> out)) >>> >>> These two signatures are not the same but similar enough for the Scala >>> 2.12 compiler to "get confused". In Scala 2.11, I could call >> reduceGroup() >>> with a lambda that doesn't have parameter type definitions and things >> would >>> be fine. With Scala 2.12 I can't do that because the compiler can't >> figure >>> out which method to call and requires explicit type definitions on the >>> lambda parameters. >>> >>> I see some solutions for this: >>> >>> 1. Keep the methods as is, this would force people to always explicitly >>> specify parameter types on their lambdas. >>> >>> 2. Rename the second method to reduceGroupJ() to signal that it takes a >>> user function that takes Java-style interfaces (the first parameter is >>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This >>> disambiguates the code, users can use lambdas without specifying explicit >>> parameter types but breaks the API. >>> >>> One effect of 2. would be that we can add a reduceGroup() method that >>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus >>> it would allow people to implement user functions without having to cast >>> the various Iterator/Iterable parameters. >>> >>> Either way, people would have to adapt their code when moving to Scala >>> 2.12 in some way, depending on what style of methods they use. >>> >>> There is also solution 2.5: >>> >>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the >>> old method names for Scala 2.11. This would require some infrastructure >> and >>> I don't yet know how it can be done in a sane way. >>> >>> What do you think? I personally would be in favour of 2. but it breaks >> the >>> existing API. >>> >>> Best, >>> Aljoscha >>> >>> >>> >>> |
I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784
> On 8. Oct 2018, at 09:56, Chesnay Schepler <[hidden email]> wrote: > > I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm > wondering about the benefit, as the API break still has to happen at some point. > > @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? > If this is the only blocker I suggest to make the breaking change in 1.8. > > On 05.10.2018 10:31, Till Rohrmann wrote: >> Thanks Aljoscha for starting this discussion. The described problem brings >> us indeed a bit into a pickle. Even with option 1) I think it is somewhat >> API breaking because everyone who used lambdas without types needs to add >> them now. Consequently, I only see two real options out of the ones you've >> proposed: >> >> 1) Disambiguate the API (either by removing >> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) >> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out >> >> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit >> problematic because then all Scala API users who have implemented a >> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I >> think it will be problematic with RichGroupReduceFunction which you need to >> get access to the RuntimeContext. >> >> Maintaining two master branches puts a lot of burden onto the developers to >> always keep the two branches in sync. Ideally I would like to avoid this. >> >> I also played a little bit around with implicit conversions to add the >> lambda methods in Scala 2.11 on demand, but I was not able to get it work >> smoothly. >> >> I'm cross posting this thread to user as well to get some more user >> feedback. >> >> Cheers, >> Till >> >> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> >> wrote: >> >>> The second alternative, with the addition of methods that take functions >>> with Scala types, seems the most sensible. I wonder if there is a need >>> then to maintain the *J Java parameter methods, or whether users could just >>> access the functionality by converting the Scala DataStreams to Java via >>> .javaStream and whatever the equivalent is for DataSets. >>> >>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <[hidden email]> >>> wrote: >>> >>>> Hi, >>>> >>>> I'm currently working on >>> https://issues.apache.org/jira/browse/FLINK-7811, >>>> with the goal of adding support for Scala 2.12. There is a bit of a >>> hurdle >>>> and I have to explain some context first. >>>> >>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of >>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This >>>> means that the following two method definitions can both take a lambda: >>>> >>>> def map[R](mapper: MapFunction[T, R]): DataSet[R] >>>> def map[R](fun: T => R): DataSet[R] >>>> >>>> The Scala compiler gives precedence to the lambda version when you call >>>> map() with a lambda in simple cases, so it works here. You could still >>> call >>>> map() with a lambda if the lambda version of the method weren't here >>>> because they are now considered the same. For Scala 2.11 we need both >>>> signatures, though, to allow calling with a lambda and with a >>> MapFunction. >>>> The problem is with more complicated method signatures, like: >>>> >>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit): >>>> DataSet[R] >>>> >>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R] >>>> >>>> (for reference, GroupReduceFunction is a SAM with void >>>> reduce(java.lang.Iterable<T> values, Collector<O> out)) >>>> >>>> These two signatures are not the same but similar enough for the Scala >>>> 2.12 compiler to "get confused". In Scala 2.11, I could call >>> reduceGroup() >>>> with a lambda that doesn't have parameter type definitions and things >>> would >>>> be fine. With Scala 2.12 I can't do that because the compiler can't >>> figure >>>> out which method to call and requires explicit type definitions on the >>>> lambda parameters. >>>> >>>> I see some solutions for this: >>>> >>>> 1. Keep the methods as is, this would force people to always explicitly >>>> specify parameter types on their lambdas. >>>> >>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a >>>> user function that takes Java-style interfaces (the first parameter is >>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This >>>> disambiguates the code, users can use lambdas without specifying explicit >>>> parameter types but breaks the API. >>>> >>>> One effect of 2. would be that we can add a reduceGroup() method that >>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus >>>> it would allow people to implement user functions without having to cast >>>> the various Iterator/Iterable parameters. >>>> >>>> Either way, people would have to adapt their code when moving to Scala >>>> 2.12 in some way, depending on what style of methods they use. >>>> >>>> There is also solution 2.5: >>>> >>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the >>>> old method names for Scala 2.11. This would require some infrastructure >>> and >>>> I don't yet know how it can be done in a sane way. >>>> >>>> What do you think? I personally would be in favour of 2. but it breaks >>> the >>>> existing API. >>>> >>>> Best, >>>> Aljoscha >>>> >>>> >>>> >>>> > |
And the remaining parts would only be about breaking the API?
On 08.10.2018 12:24, Aljoscha Krettek wrote: > I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784 > >> On 8. Oct 2018, at 09:56, Chesnay Schepler <[hidden email]> wrote: >> >> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm >> wondering about the benefit, as the API break still has to happen at some point. >> >> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? >> If this is the only blocker I suggest to make the breaking change in 1.8. >> >> On 05.10.2018 10:31, Till Rohrmann wrote: >>> Thanks Aljoscha for starting this discussion. The described problem brings >>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat >>> API breaking because everyone who used lambdas without types needs to add >>> them now. Consequently, I only see two real options out of the ones you've >>> proposed: >>> >>> 1) Disambiguate the API (either by removing >>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) >>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out >>> >>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit >>> problematic because then all Scala API users who have implemented a >>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I >>> think it will be problematic with RichGroupReduceFunction which you need to >>> get access to the RuntimeContext. >>> >>> Maintaining two master branches puts a lot of burden onto the developers to >>> always keep the two branches in sync. Ideally I would like to avoid this. >>> >>> I also played a little bit around with implicit conversions to add the >>> lambda methods in Scala 2.11 on demand, but I was not able to get it work >>> smoothly. >>> >>> I'm cross posting this thread to user as well to get some more user >>> feedback. >>> >>> Cheers, >>> Till >>> >>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> >>> wrote: >>> >>>> The second alternative, with the addition of methods that take functions >>>> with Scala types, seems the most sensible. I wonder if there is a need >>>> then to maintain the *J Java parameter methods, or whether users could just >>>> access the functionality by converting the Scala DataStreams to Java via >>>> .javaStream and whatever the equivalent is for DataSets. >>>> >>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <[hidden email]> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> I'm currently working on >>>> https://issues.apache.org/jira/browse/FLINK-7811, >>>>> with the goal of adding support for Scala 2.12. There is a bit of a >>>> hurdle >>>>> and I have to explain some context first. >>>>> >>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of >>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This >>>>> means that the following two method definitions can both take a lambda: >>>>> >>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R] >>>>> def map[R](fun: T => R): DataSet[R] >>>>> >>>>> The Scala compiler gives precedence to the lambda version when you call >>>>> map() with a lambda in simple cases, so it works here. You could still >>>> call >>>>> map() with a lambda if the lambda version of the method weren't here >>>>> because they are now considered the same. For Scala 2.11 we need both >>>>> signatures, though, to allow calling with a lambda and with a >>>> MapFunction. >>>>> The problem is with more complicated method signatures, like: >>>>> >>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit): >>>>> DataSet[R] >>>>> >>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R] >>>>> >>>>> (for reference, GroupReduceFunction is a SAM with void >>>>> reduce(java.lang.Iterable<T> values, Collector<O> out)) >>>>> >>>>> These two signatures are not the same but similar enough for the Scala >>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call >>>> reduceGroup() >>>>> with a lambda that doesn't have parameter type definitions and things >>>> would >>>>> be fine. With Scala 2.12 I can't do that because the compiler can't >>>> figure >>>>> out which method to call and requires explicit type definitions on the >>>>> lambda parameters. >>>>> >>>>> I see some solutions for this: >>>>> >>>>> 1. Keep the methods as is, this would force people to always explicitly >>>>> specify parameter types on their lambdas. >>>>> >>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a >>>>> user function that takes Java-style interfaces (the first parameter is >>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This >>>>> disambiguates the code, users can use lambdas without specifying explicit >>>>> parameter types but breaks the API. >>>>> >>>>> One effect of 2. would be that we can add a reduceGroup() method that >>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus >>>>> it would allow people to implement user functions without having to cast >>>>> the various Iterator/Iterable parameters. >>>>> >>>>> Either way, people would have to adapt their code when moving to Scala >>>>> 2.12 in some way, depending on what style of methods they use. >>>>> >>>>> There is also solution 2.5: >>>>> >>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the >>>>> old method names for Scala 2.11. This would require some infrastructure >>>> and >>>>> I don't yet know how it can be done in a sane way. >>>>> >>>>> What do you think? I personally would be in favour of 2. but it breaks >>>> the >>>>> existing API. >>>>> >>>>> Best, >>>>> Aljoscha >>>>> >>>>> >>>>> >>>>> > |
Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
> On 8. Oct 2018, at 13:00, Chesnay Schepler <[hidden email]> wrote: > > And the remaining parts would only be about breaking the API? > > On 08.10.2018 12:24, Aljoscha Krettek wrote: >> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784 >> >>> On 8. Oct 2018, at 09:56, Chesnay Schepler <[hidden email]> wrote: >>> >>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm >>> wondering about the benefit, as the API break still has to happen at some point. >>> >>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? >>> If this is the only blocker I suggest to make the breaking change in 1.8. >>> >>> On 05.10.2018 10:31, Till Rohrmann wrote: >>>> Thanks Aljoscha for starting this discussion. The described problem brings >>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat >>>> API breaking because everyone who used lambdas without types needs to add >>>> them now. Consequently, I only see two real options out of the ones you've >>>> proposed: >>>> >>>> 1) Disambiguate the API (either by removing >>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) >>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out >>>> >>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit >>>> problematic because then all Scala API users who have implemented a >>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I >>>> think it will be problematic with RichGroupReduceFunction which you need to >>>> get access to the RuntimeContext. >>>> >>>> Maintaining two master branches puts a lot of burden onto the developers to >>>> always keep the two branches in sync. Ideally I would like to avoid this. >>>> >>>> I also played a little bit around with implicit conversions to add the >>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work >>>> smoothly. >>>> >>>> I'm cross posting this thread to user as well to get some more user >>>> feedback. >>>> >>>> Cheers, >>>> Till >>>> >>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> >>>> wrote: >>>> >>>>> The second alternative, with the addition of methods that take functions >>>>> with Scala types, seems the most sensible. I wonder if there is a need >>>>> then to maintain the *J Java parameter methods, or whether users could just >>>>> access the functionality by converting the Scala DataStreams to Java via >>>>> .javaStream and whatever the equivalent is for DataSets. >>>>> >>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <[hidden email]> >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I'm currently working on >>>>> https://issues.apache.org/jira/browse/FLINK-7811, >>>>>> with the goal of adding support for Scala 2.12. There is a bit of a >>>>> hurdle >>>>>> and I have to explain some context first. >>>>>> >>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of >>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This >>>>>> means that the following two method definitions can both take a lambda: >>>>>> >>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R] >>>>>> def map[R](fun: T => R): DataSet[R] >>>>>> >>>>>> The Scala compiler gives precedence to the lambda version when you call >>>>>> map() with a lambda in simple cases, so it works here. You could still >>>>> call >>>>>> map() with a lambda if the lambda version of the method weren't here >>>>>> because they are now considered the same. For Scala 2.11 we need both >>>>>> signatures, though, to allow calling with a lambda and with a >>>>> MapFunction. >>>>>> The problem is with more complicated method signatures, like: >>>>>> >>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit): >>>>>> DataSet[R] >>>>>> >>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R] >>>>>> >>>>>> (for reference, GroupReduceFunction is a SAM with void >>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out)) >>>>>> >>>>>> These two signatures are not the same but similar enough for the Scala >>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call >>>>> reduceGroup() >>>>>> with a lambda that doesn't have parameter type definitions and things >>>>> would >>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't >>>>> figure >>>>>> out which method to call and requires explicit type definitions on the >>>>>> lambda parameters. >>>>>> >>>>>> I see some solutions for this: >>>>>> >>>>>> 1. Keep the methods as is, this would force people to always explicitly >>>>>> specify parameter types on their lambdas. >>>>>> >>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a >>>>>> user function that takes Java-style interfaces (the first parameter is >>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This >>>>>> disambiguates the code, users can use lambdas without specifying explicit >>>>>> parameter types but breaks the API. >>>>>> >>>>>> One effect of 2. would be that we can add a reduceGroup() method that >>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus >>>>>> it would allow people to implement user functions without having to cast >>>>>> the various Iterator/Iterable parameters. >>>>>> >>>>>> Either way, people would have to adapt their code when moving to Scala >>>>>> 2.12 in some way, depending on what style of methods they use. >>>>>> >>>>>> There is also solution 2.5: >>>>>> >>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the >>>>>> old method names for Scala 2.11. This would require some infrastructure >>>>> and >>>>>> I don't yet know how it can be done in a sane way. >>>>>> >>>>>> What do you think? I personally would be in favour of 2. but it breaks >>>>> the >>>>>> existing API. >>>>>> >>>>>> Best, >>>>>> Aljoscha >>>>>> >>>>>> >>>>>> >>>>>> >> > |
The infrastructure would only be required if we opt for releasing 2.11
and 2.12 builds simultaneously, correct? On 08.10.2018 15:04, Aljoscha Krettek wrote: > Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release. > >> On 8. Oct 2018, at 13:00, Chesnay Schepler <[hidden email]> wrote: >> >> And the remaining parts would only be about breaking the API? >> >> On 08.10.2018 12:24, Aljoscha Krettek wrote: >>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784 >>> >>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <[hidden email]> wrote: >>>> >>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm >>>> wondering about the benefit, as the API break still has to happen at some point. >>>> >>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? >>>> If this is the only blocker I suggest to make the breaking change in 1.8. >>>> >>>> On 05.10.2018 10:31, Till Rohrmann wrote: >>>>> Thanks Aljoscha for starting this discussion. The described problem brings >>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat >>>>> API breaking because everyone who used lambdas without types needs to add >>>>> them now. Consequently, I only see two real options out of the ones you've >>>>> proposed: >>>>> >>>>> 1) Disambiguate the API (either by removing >>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) >>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out >>>>> >>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit >>>>> problematic because then all Scala API users who have implemented a >>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I >>>>> think it will be problematic with RichGroupReduceFunction which you need to >>>>> get access to the RuntimeContext. >>>>> >>>>> Maintaining two master branches puts a lot of burden onto the developers to >>>>> always keep the two branches in sync. Ideally I would like to avoid this. >>>>> >>>>> I also played a little bit around with implicit conversions to add the >>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work >>>>> smoothly. >>>>> >>>>> I'm cross posting this thread to user as well to get some more user >>>>> feedback. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> >>>>> wrote: >>>>> >>>>>> The second alternative, with the addition of methods that take functions >>>>>> with Scala types, seems the most sensible. I wonder if there is a need >>>>>> then to maintain the *J Java parameter methods, or whether users could just >>>>>> access the functionality by converting the Scala DataStreams to Java via >>>>>> .javaStream and whatever the equivalent is for DataSets. >>>>>> >>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <[hidden email]> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm currently working on >>>>>> https://issues.apache.org/jira/browse/FLINK-7811, >>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a >>>>>> hurdle >>>>>>> and I have to explain some context first. >>>>>>> >>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of >>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This >>>>>>> means that the following two method definitions can both take a lambda: >>>>>>> >>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R] >>>>>>> def map[R](fun: T => R): DataSet[R] >>>>>>> >>>>>>> The Scala compiler gives precedence to the lambda version when you call >>>>>>> map() with a lambda in simple cases, so it works here. You could still >>>>>> call >>>>>>> map() with a lambda if the lambda version of the method weren't here >>>>>>> because they are now considered the same. For Scala 2.11 we need both >>>>>>> signatures, though, to allow calling with a lambda and with a >>>>>> MapFunction. >>>>>>> The problem is with more complicated method signatures, like: >>>>>>> >>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit): >>>>>>> DataSet[R] >>>>>>> >>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R] >>>>>>> >>>>>>> (for reference, GroupReduceFunction is a SAM with void >>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out)) >>>>>>> >>>>>>> These two signatures are not the same but similar enough for the Scala >>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call >>>>>> reduceGroup() >>>>>>> with a lambda that doesn't have parameter type definitions and things >>>>>> would >>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't >>>>>> figure >>>>>>> out which method to call and requires explicit type definitions on the >>>>>>> lambda parameters. >>>>>>> >>>>>>> I see some solutions for this: >>>>>>> >>>>>>> 1. Keep the methods as is, this would force people to always explicitly >>>>>>> specify parameter types on their lambdas. >>>>>>> >>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a >>>>>>> user function that takes Java-style interfaces (the first parameter is >>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This >>>>>>> disambiguates the code, users can use lambdas without specifying explicit >>>>>>> parameter types but breaks the API. >>>>>>> >>>>>>> One effect of 2. would be that we can add a reduceGroup() method that >>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus >>>>>>> it would allow people to implement user functions without having to cast >>>>>>> the various Iterator/Iterable parameters. >>>>>>> >>>>>>> Either way, people would have to adapt their code when moving to Scala >>>>>>> 2.12 in some way, depending on what style of methods they use. >>>>>>> >>>>>>> There is also solution 2.5: >>>>>>> >>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the >>>>>>> old method names for Scala 2.11. This would require some infrastructure >>>>>> and >>>>>>> I don't yet know how it can be done in a sane way. >>>>>>> >>>>>>> What do you think? I personally would be in favour of 2. but it breaks >>>>>> the >>>>>>> existing API. >>>>>>> >>>>>>> Best, >>>>>>> Aljoscha >>>>>>> >>>>>>> >>>>>>> >>>>>>> > |
Yes, but I think we would pretty much have to do that. I don't think we can stop doing 2.11 releases.
> On 8. Oct 2018, at 15:37, Chesnay Schepler <[hidden email]> wrote: > > The infrastructure would only be required if we opt for releasing 2.11 and 2.12 builds simultaneously, correct? > > On 08.10.2018 15:04, Aljoscha Krettek wrote: >> Breaking the API (or not breaking it but requiring explicit types when using Scala 2.12) and the Maven infrastructure to actually build a 2.12 release. >> >>> On 8. Oct 2018, at 13:00, Chesnay Schepler <[hidden email]> wrote: >>> >>> And the remaining parts would only be about breaking the API? >>> >>> On 08.10.2018 12:24, Aljoscha Krettek wrote: >>>> I have an open PR that does everything we can do for preparing the code base for Scala 2.12 without breaking the API: https://github.com/apache/flink/pull/6784 >>>> >>>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <[hidden email]> wrote: >>>>> >>>>> I'd rather not maintain 2 master branches. Beyond the maintenance overhead I'm >>>>> wondering about the benefit, as the API break still has to happen at some point. >>>>> >>>>> @Aljoscha how much work for supporting scala 2.12 can be merged without breaking the API? >>>>> If this is the only blocker I suggest to make the breaking change in 1.8. >>>>> >>>>> On 05.10.2018 10:31, Till Rohrmann wrote: >>>>>> Thanks Aljoscha for starting this discussion. The described problem brings >>>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat >>>>>> API breaking because everyone who used lambdas without types needs to add >>>>>> them now. Consequently, I only see two real options out of the ones you've >>>>>> proposed: >>>>>> >>>>>> 1) Disambiguate the API (either by removing >>>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ) >>>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely out >>>>>> >>>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit >>>>>> problematic because then all Scala API users who have implemented a >>>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I >>>>>> think it will be problematic with RichGroupReduceFunction which you need to >>>>>> get access to the RuntimeContext. >>>>>> >>>>>> Maintaining two master branches puts a lot of burden onto the developers to >>>>>> always keep the two branches in sync. Ideally I would like to avoid this. >>>>>> >>>>>> I also played a little bit around with implicit conversions to add the >>>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work >>>>>> smoothly. >>>>>> >>>>>> I'm cross posting this thread to user as well to get some more user >>>>>> feedback. >>>>>> >>>>>> Cheers, >>>>>> Till >>>>>> >>>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <[hidden email]> >>>>>> wrote: >>>>>> >>>>>>> The second alternative, with the addition of methods that take functions >>>>>>> with Scala types, seems the most sensible. I wonder if there is a need >>>>>>> then to maintain the *J Java parameter methods, or whether users could just >>>>>>> access the functionality by converting the Scala DataStreams to Java via >>>>>>> .javaStream and whatever the equivalent is for DataSets. >>>>>>> >>>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <[hidden email]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I'm currently working on >>>>>>> https://issues.apache.org/jira/browse/FLINK-7811, >>>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a >>>>>>> hurdle >>>>>>>> and I have to explain some context first. >>>>>>>> >>>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of >>>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This >>>>>>>> means that the following two method definitions can both take a lambda: >>>>>>>> >>>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R] >>>>>>>> def map[R](fun: T => R): DataSet[R] >>>>>>>> >>>>>>>> The Scala compiler gives precedence to the lambda version when you call >>>>>>>> map() with a lambda in simple cases, so it works here. You could still >>>>>>> call >>>>>>>> map() with a lambda if the lambda version of the method weren't here >>>>>>>> because they are now considered the same. For Scala 2.11 we need both >>>>>>>> signatures, though, to allow calling with a lambda and with a >>>>>>> MapFunction. >>>>>>>> The problem is with more complicated method signatures, like: >>>>>>>> >>>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit): >>>>>>>> DataSet[R] >>>>>>>> >>>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R] >>>>>>>> >>>>>>>> (for reference, GroupReduceFunction is a SAM with void >>>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out)) >>>>>>>> >>>>>>>> These two signatures are not the same but similar enough for the Scala >>>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call >>>>>>> reduceGroup() >>>>>>>> with a lambda that doesn't have parameter type definitions and things >>>>>>> would >>>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't >>>>>>> figure >>>>>>>> out which method to call and requires explicit type definitions on the >>>>>>>> lambda parameters. >>>>>>>> >>>>>>>> I see some solutions for this: >>>>>>>> >>>>>>>> 1. Keep the methods as is, this would force people to always explicitly >>>>>>>> specify parameter types on their lambdas. >>>>>>>> >>>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a >>>>>>>> user function that takes Java-style interfaces (the first parameter is >>>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This >>>>>>>> disambiguates the code, users can use lambdas without specifying explicit >>>>>>>> parameter types but breaks the API. >>>>>>>> >>>>>>>> One effect of 2. would be that we can add a reduceGroup() method that >>>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, thus >>>>>>>> it would allow people to implement user functions without having to cast >>>>>>>> the various Iterator/Iterable parameters. >>>>>>>> >>>>>>>> Either way, people would have to adapt their code when moving to Scala >>>>>>>> 2.12 in some way, depending on what style of methods they use. >>>>>>>> >>>>>>>> There is also solution 2.5: >>>>>>>> >>>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep the >>>>>>>> old method names for Scala 2.11. This would require some infrastructure >>>>>>> and >>>>>>>> I don't yet know how it can be done in a sane way. >>>>>>>> >>>>>>>> What do you think? I personally would be in favour of 2. but it breaks >>>>>>> the >>>>>>>> existing API. >>>>>>>> >>>>>>>> Best, >>>>>>>> Aljoscha >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >> > |
Free forum by Nabble | Edit this page |