Hi Flink Community! The release of Apache Flink 1.5 has happened (yay!) - so it is a good time to start talking about what to do for release 1.6. == Suggested release timeline == I would propose to release around end of July (that is 8-9 weeks from now). The rational behind that: There was a lot of effort in release testing automation (end-to-end tests, scripted stress tests) as part of release 1.5. You may have noticed the big set of new modules under "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 release a bit, and needs to continue as part of the coming release cycle, but should help make releasing more lightweight from now on. (Side note: There are also some nightly stress tests that we created and run at data Artisans, and where we are looking whether and in which way it would make sense to contribute them to Flink.) == Features and focus areas == We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new network stack, recovery, SQL joins and client, ... Following something like a "tick-tock-model", I would suggest to focus the next release more on integrations, tooling, and reducing user friction. Of course, this does not mean that no other pull request gets reviewed, an no other topic will be examined - it is simply meant as a help to understand where to expect more activity during the next release cycle. Note that these are really the coarse focus areas - don't read this as a comprehensive list. This list is my first suggestion, based on discussions with committers, users, and mailing list questions. - Support Java 9 and Scala 2.12 - Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. - Polish the remaing parts of the FLIP-6 rewrite - Improve state backends with asynchronous timer snapshots, efficient timer deletes, state TTL, and broadcast state support in RocksDB. - Extends Streaming Sinks: - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) - Support ElasticSearch's new REST API - Smoothen State Evolution to support type conversion on snapshot restore - Enhance Stream SQL and CEP - Add support for "update by key" Table Sources - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores) - Expand SQL client - Integrate CEP and SQL, through MATCH_RECOGNIZE clause - Improve CEP Performance of SharedBuffer on RocksDB |
Will we remove the legacy mode for 1.6?
I can see value in keeping it for now so that legacy issues are still visible on master, but at the same time removing this code would reduce a lot of complexity and ambiguity in the codebase... On 04.06.2018 11:21, Stephan Ewen wrote:
|
In reply to this post by Stephan Ewen
Hi Stephan, could you please also consider the "Elastic Filter " feature discussioned in thread http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/PROPOSAL-Introduce-Elastic-Bloom-Filter-For-Flink-td22430.html ? Best, Sihua
On 06/4/2018 17:21,[hidden email] wrote:
Hi Flink Community! |
In reply to this post by Stephan Ewen
Hi Stephen,
Is it planned to consider this ticket https://issues.apache.org/jira/browse/FLINK-7883 about an atomic cancel-with-savepoint ? It is my main concern about Flink and I have to maintain a fork myself as we can't afford dupplicate events due to reprocess of messages between a savepoint and real job stop each time we deploy a new version of our job. Antoine Le lun. 4 juin 2018 à 11:21, Stephan Ewen <[hidden email]> a écrit :
|
Before removing the legacy code, I would still wait a bit and see what the user feedback is. The legacy mode is a good safety net against severe deployment regressions. Thus, it should be a very conscious decision to remove the code. As far as I know, there is currently nobody actively working on FLINK-7883. Fixing it properly requires a bit of work (e.g. redesigning the source interface) and will need a committer to act as a shepherd. But it is definitely a feature which is quite important to have. Cheers, Till On Mon, Jun 4, 2018 at 11:50 AM Antoine Philippot <[hidden email]> wrote:
|
In reply to this post by Stephan Ewen
Hi Stephan, Hi Flink Community! The release of Apache Flink 1.5 has happened (yay!) - so it is a good time to start talking about what to do for release 1.6. == Suggested release timeline == I would propose to release around end of July (that is 8-9 weeks from now). The rational behind that: There was a lot of effort in release testing automation (end-to-end tests, scripted stress tests) as part of release 1.5. You may have noticed the big set of new modules under "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 release a bit, and needs to continue as part of the coming release cycle, but should help make releasing more lightweight from now on. (Side note: There are also some nightly stress tests that we created and run at data Artisans, and where we are looking whether and in which way it would make sense to contribute them to Flink.) == Features and focus areas == We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new network stack, recovery, SQL joins and client, ... Following something like a "tick-tock-model", I would suggest to focus the next release more on integrations, tooling, and reducing user friction. Of course, this does not mean that no other pull request gets reviewed, an no other topic will be examined - it is simply meant as a help to understand where to expect more activity during the next release cycle. Note that these are really the coarse focus areas - don't read this as a comprehensive list. This list is my first suggestion, based on discussions with committers, users, and mailing list questions. - Support Java 9 and Scala 2.12 - Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. - Polish the remaing parts of the FLIP-6 rewrite - Improve state backends with asynchronous timer snapshots, efficient timer deletes, state TTL, and broadcast state support in RocksDB. - Extends Streaming Sinks: - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) - Support ElasticSearch's new REST API - Smoothen State Evolution to support type conversion on snapshot restore - Enhance Stream SQL and CEP - Add support for "update by key" Table Sources - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores) - Expand SQL client - Integrate CEP and SQL, through MATCH_RECOGNIZE clause - Improve CEP Performance of SharedBuffer on RocksDB |
Hi Stephan, Will [ https://issues.apache.org/jira/browse/FLINK-5479 ] (Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions) be included in 1.6? As we are seeing more users with this issue on the mailing lists. Thanks. Ben 2018-06-05 5:29 GMT+08:00 Che Lui Shum <[hidden email]>:
|
adding my vote to K8S Job mode, maybe it is this? > Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <[hidden email]> wrote:
|
+1 on K8s integration
|
+1 on https://issues.apache.org/jira/browse/FLINK-5479
From: Rico Bergmann <[hidden email]>
Sent: Tuesday, June 5, 2018 9:12:00 PM To: Hao Sun Cc: [hidden email]; user Subject: Re: [DISCUSS] Flink 1.6 features +1 on K8s integration
|
Hi all!
Thanks for the discussion and good input. Many suggestions fit well with the proposal above. Please bear in mind that with a time-based release model, we would release whatever is mature by end of July. The good thing is we could schedule the next release not too far after that, so that the features that did not quite make it will not be delayed too long. In some sense, you could read this as as "what to do first" list, rather than "this goes in, other things stay out". Some thoughts on some of the suggestions
Kubernetes integration: An opaque integration with Kubernetes should be supported through the "as a library" mode. For a deeper integration, I know that some committers have experimented with some PoC code. I would let Till add some thoughts, he has worked the most on the deployment parts recently. Per partition watermarks with idleness: Good point, could one implement that on the current interface, with a periodic watermark extractor? Atomic cancel-with-savepoint: Agreed, this is important. Making this work with all sources needs a bit more work. We should have this in the roadmap. Elastic Bloomfilters: This seems like an interesting new feature - the above suggested feature set was more about addressing some longer standing issues/requests. However, nothing should prevent contributors to work on that. Best, Stephan On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <[hidden email]> wrote:
|
Since wishes are free: - Standalone cluster job isolation: https://issues.apache.org/jira/browse/FLINK-8886 - Proper sliding window joins (not overlapping hoping window joins): https://issues.apache.org/jira/browse/FLINK-6243 - Sharing state across operators: https://issues.apache.org/jira/browse/FLINK-6239 - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 Seconded: - Atomic cancel-with-savepoint: https://issues.apache.org/jira/browse/FLINK-7634 - Support dynamically changing CEP patterns : https://issues.apache.org/jira/browse/FLINK-7129 On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <[hidden email]> wrote:
|
In reply to this post by Stephan Ewen
Hi Stephan, Thanks very much for your response! That gave me the confidence to continue to work on the Elastic Filter. But even though we have implemented it(based on 1.3.2) and used it on production for a several months, If there's one commiter is willing to guide me(since it's not a very trivial work, and IMO our current implementation base on 1.3.2 is a bit hacker, a design reviews would be really helpful) to bring it into flink, I will be very grateful.
Best, Sihua
On 06/9/2018 04:31,[hidden email] wrote:
|
In reply to this post by Elias Levy
One more, since it we have to deal with it often: - Idling sources (Kafka in particular) and proper watermark propagation: FLINK-5018 / FLINK-5479 On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <[hidden email]> wrote:
|
We are eagerly waiting for - Extends Streaming Sinks: - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) Especially for ORC and Parquet sinks. Since, We are planning to use Kafka-jdbc to move data from rdbms to hdfs. Thanks, On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <[hidden email]> wrote:
-- Cheers,
Sagar |
Hi, Sagar There already has relative JIRAs for ORC and Parquet, you can take a look here: https://issues.apache.org/jira/browse/FLINK-9407 and https://issues.apache.org/jira/browse/FLINK-9411 For ORC format, Currently only support basic data types, such as Long, Boolean, Short, Integer, Float, Double, String. Best Zhangminglei
|
Actually, I have been an idea, how about support hive on flink ? Since lots of business are written by hive sql. And users wants to transform map reduce to fink without changing the sql.
Zhangminglei > 在 2018年6月17日,下午8:11,zhangminglei <[hidden email]> 写道: > > Hi, Sagar > > There already has relative JIRAs for ORC and Parquet, you can take a look here: > > https://issues.apache.org/jira/browse/FLINK-9407 <https://issues.apache.org/jira/browse/FLINK-9407> and https://issues.apache.org/jira/browse/FLINK-9411 <https://issues.apache.org/jira/browse/FLINK-9411> > > For ORC format, Currently only support basic data types, such as Long, Boolean, Short, Integer, Float, Double, String. > > Best > Zhangminglei > > > >> 在 2018年6月17日,上午11:11,sagar loke <[hidden email]> 写道: >> >> We are eagerly waiting for >> >> - Extends Streaming Sinks: >> - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) >> >> Especially for ORC and Parquet sinks. Since, We are planning to use Kafka-jdbc to move data from rdbms to hdfs. >> >> Thanks, >> >> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <[hidden email] <mailto:[hidden email]>> wrote: >> One more, since it we have to deal with it often: >> >> - Idling sources (Kafka in particular) and proper watermark propagation: FLINK-5018 / FLINK-5479 >> >> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <[hidden email] <mailto:[hidden email]>> wrote: >> Since wishes are free: >> >> - Standalone cluster job isolation: https://issues.apache.org/jira/browse/FLINK-8886 <https://issues.apache.org/jira/browse/FLINK-8886> >> - Proper sliding window joins (not overlapping hoping window joins): https://issues.apache.org/jira/browse/FLINK-6243 <https://issues.apache.org/jira/browse/FLINK-6243> >> - Sharing state across operators: https://issues.apache.org/jira/browse/FLINK-6239 <https://issues.apache.org/jira/browse/FLINK-6239> >> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 <https://issues.apache.org/jira/browse/FLINK-4558> >> >> Seconded: >> - Atomic cancel-with-savepoint: https://issues.apache.org/jira/browse/FLINK-7634 <https://issues.apache.org/jira/browse/FLINK-7634> >> - Support dynamically changing CEP patterns : https://issues.apache.org/jira/browse/FLINK-7129 <https://issues.apache.org/jira/browse/FLINK-7129> >> >> >> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <[hidden email] <mailto:[hidden email]>> wrote: >> Hi all! >> >> Thanks for the discussion and good input. Many suggestions fit well with the proposal above. >> >> Please bear in mind that with a time-based release model, we would release whatever is mature by end of July. >> The good thing is we could schedule the next release not too far after that, so that the features that did not quite make it will not be delayed too long. >> In some sense, you could read this as as "what to do first" list, rather than "this goes in, other things stay out". >> >> Some thoughts on some of the suggestions >> >> Kubernetes integration: An opaque integration with Kubernetes should be supported through the "as a library" mode. For a deeper integration, I know that some committers have experimented with some PoC code. I would let Till add some thoughts, he has worked the most on the deployment parts recently. >> >> Per partition watermarks with idleness: Good point, could one implement that on the current interface, with a periodic watermark extractor? >> >> Atomic cancel-with-savepoint: Agreed, this is important. Making this work with all sources needs a bit more work. We should have this in the roadmap. >> >> Elastic Bloomfilters: This seems like an interesting new feature - the above suggested feature set was more about addressing some longer standing issues/requests. However, nothing should prevent contributors to work on that. >> >> Best, >> Stephan >> >> >> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <[hidden email] <mailto:[hidden email]>> wrote: >> +1 on https://issues.apache.org/jira/browse/FLINK-5479 <https://issues.apache.org/jira/browse/FLINK-5479> >> [FLINK-5479] Per-partition watermarks in ... <https://issues.apache.org/jira/browse/FLINK-5479> >> issues.apache.org <http://issues.apache.org/> >> Reported in ML: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html> It's normally not a common case to have Kafka partitions not producing any data, but it'll probably be good to handle this as well. I ... >> >> From: Rico Bergmann <[hidden email] <mailto:[hidden email]>> >> Sent: Tuesday, June 5, 2018 9:12:00 PM >> To: Hao Sun >> Cc: [hidden email] <mailto:[hidden email]>; user >> Subject: Re: [DISCUSS] Flink 1.6 features >> >> +1 on K8s integration >> >> >> >> Am 06.06.2018 um 00:01 schrieb Hao Sun <[hidden email] <mailto:[hidden email]>>: >> >>> adding my vote to K8S Job mode, maybe it is this? >>>> Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. >>> >>> >>> >>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <[hidden email] <mailto:[hidden email]>> wrote: >>> Hi Stephan, >>> >>> Will [ https://issues.apache.org/jira/browse/FLINK-5479 <https://issues.apache.org/jira/browse/FLINK-5479> ] (Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions) be included in 1.6? As we are seeing more users with this issue on the mailing lists. >>> >>> Thanks. >>> Ben >>> >>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <[hidden email] <mailto:[hidden email]>>: >>> Hi Stephan, >>> >>> Will FLINK-7129 (Support dynamically changing CEP patterns) be included in 1.6? There were discussions about possibly including it in 1.6: >>> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=OU7gru2O9JtoWXn1Lc1F7NKcxAyN6A3e58kxctb4b508RQ@...%3e <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=OU7gru2O9JtoWXn1Lc1F7NKcxAyN6A3e58kxctb4b508RQ@...%3e> >>> >>> Thanks, >>> Shirley Shum >>> >>> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The release of Apache Flink 1.5 has happened (yay!) - so it is a good time >>> >>> From: Stephan Ewen <[hidden email] <mailto:[hidden email]>> >>> To: [hidden email] <mailto:[hidden email]>, user <[hidden email] <mailto:[hidden email]>> >>> Date: 06/04/2018 02:21 AM >>> Subject: [DISCUSS] Flink 1.6 features >>> >>> >>> >>> Hi Flink Community! >>> >>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good time to start talking about what to do for release 1.6. >>> >>> == Suggested release timeline == >>> >>> I would propose to release around end of July (that is 8-9 weeks from now). >>> >>> The rational behind that: There was a lot of effort in release testing automation (end-to-end tests, scripted stress tests) as part of release 1.5. You may have noticed the big set of new modules under "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 release a bit, and needs to continue as part of the coming release cycle, but should help make releasing more lightweight from now on. >>> >>> (Side note: There are also some nightly stress tests that we created and run at data Artisans, and where we are looking whether and in which way it would make sense to contribute them to Flink.) >>> >>> == Features and focus areas == >>> >>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new network stack, recovery, SQL joins and client, ... Following something like a "tick-tock-model", I would suggest to focus the next release more on integrations, tooling, and reducing user friction. >>> >>> Of course, this does not mean that no other pull request gets reviewed, an no other topic will be examined - it is simply meant as a help to understand where to expect more activity during the next release cycle. Note that these are really the coarse focus areas - don't read this as a comprehensive list. >>> >>> This list is my first suggestion, based on discussions with committers, users, and mailing list questions. >>> >>> - Support Java 9 and Scala 2.12 >>> >>> - Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. >>> >>> - Polish the remaing parts of the FLIP-6 rewrite >>> >>> - Improve state backends with asynchronous timer snapshots, efficient timer deletes, state TTL, and broadcast state support in RocksDB. >>> >>> - Extends Streaming Sinks: >>> - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) >>> - Support ElasticSearch's new REST API >>> >>> - Smoothen State Evolution to support type conversion on snapshot restore >>> >>> - Enhance Stream SQL and CEP >>> - Add support for "update by key" Table Sources >>> - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores) >>> - Expand SQL client >>> - Integrate CEP and SQL, through MATCH_RECOGNIZE clause >>> - Improve CEP Performance of SharedBuffer on RocksDB >>> >>> >>> >>> >>> >> >> >> >> -- >> Cheers, >> Sagar > |
Agree, two missing pieces I think could make Flink more competitive against Spark SQL/Stream and Kafka Stream
1. Flink over Hive or Flink SQL hive table source and sink 2. Flink ML on stream > On Jun 17, 2018, at 8:34 AM, zhangminglei <[hidden email]> wrote: > > Actually, I have been an idea, how about support hive on flink ? Since lots of business are written by hive sql. And users wants to transform map reduce to fink without changing the sql. > > Zhangminglei > > > >> 在 2018年6月17日,下午8:11,zhangminglei <[hidden email]> 写道: >> >> Hi, Sagar >> >> There already has relative JIRAs for ORC and Parquet, you can take a look here: >> >> https://issues.apache.org/jira/browse/FLINK-9407 <https://issues.apache.org/jira/browse/FLINK-9407> and https://issues.apache.org/jira/browse/FLINK-9411 <https://issues.apache.org/jira/browse/FLINK-9411> >> >> For ORC format, Currently only support basic data types, such as Long, Boolean, Short, Integer, Float, Double, String. >> >> Best >> Zhangminglei >> >> >> >>> 在 2018年6月17日,上午11:11,sagar loke <[hidden email]> 写道: >>> >>> We are eagerly waiting for >>> >>> - Extends Streaming Sinks: >>> - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) >>> >>> Especially for ORC and Parquet sinks. Since, We are planning to use Kafka-jdbc to move data from rdbms to hdfs. >>> >>> Thanks, >>> >>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <[hidden email] <mailto:[hidden email]>> wrote: >>> One more, since it we have to deal with it often: >>> >>> - Idling sources (Kafka in particular) and proper watermark propagation: FLINK-5018 / FLINK-5479 >>> >>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <[hidden email] <mailto:[hidden email]>> wrote: >>> Since wishes are free: >>> >>> - Standalone cluster job isolation: https://issues.apache.org/jira/browse/FLINK-8886 <https://issues.apache.org/jira/browse/FLINK-8886> >>> - Proper sliding window joins (not overlapping hoping window joins): https://issues.apache.org/jira/browse/FLINK-6243 <https://issues.apache.org/jira/browse/FLINK-6243> >>> - Sharing state across operators: https://issues.apache.org/jira/browse/FLINK-6239 <https://issues.apache.org/jira/browse/FLINK-6239> >>> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 <https://issues.apache.org/jira/browse/FLINK-4558> >>> >>> Seconded: >>> - Atomic cancel-with-savepoint: https://issues.apache.org/jira/browse/FLINK-7634 <https://issues.apache.org/jira/browse/FLINK-7634> >>> - Support dynamically changing CEP patterns : https://issues.apache.org/jira/browse/FLINK-7129 <https://issues.apache.org/jira/browse/FLINK-7129> >>> >>> >>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <[hidden email] <mailto:[hidden email]>> wrote: >>> Hi all! >>> >>> Thanks for the discussion and good input. Many suggestions fit well with the proposal above. >>> >>> Please bear in mind that with a time-based release model, we would release whatever is mature by end of July. >>> The good thing is we could schedule the next release not too far after that, so that the features that did not quite make it will not be delayed too long. >>> In some sense, you could read this as as "what to do first" list, rather than "this goes in, other things stay out". >>> >>> Some thoughts on some of the suggestions >>> >>> Kubernetes integration: An opaque integration with Kubernetes should be supported through the "as a library" mode. For a deeper integration, I know that some committers have experimented with some PoC code. I would let Till add some thoughts, he has worked the most on the deployment parts recently. >>> >>> Per partition watermarks with idleness: Good point, could one implement that on the current interface, with a periodic watermark extractor? >>> >>> Atomic cancel-with-savepoint: Agreed, this is important. Making this work with all sources needs a bit more work. We should have this in the roadmap. >>> >>> Elastic Bloomfilters: This seems like an interesting new feature - the above suggested feature set was more about addressing some longer standing issues/requests. However, nothing should prevent contributors to work on that. >>> >>> Best, >>> Stephan >>> >>> >>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <[hidden email] <mailto:[hidden email]>> wrote: >>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 <https://issues.apache.org/jira/browse/FLINK-5479> >>> [FLINK-5479] Per-partition watermarks in ... <https://issues.apache.org/jira/browse/FLINK-5479> >>> issues.apache.org <http://issues.apache.org/> >>> Reported in ML: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html> It's normally not a common case to have Kafka partitions not producing any data, but it'll probably be good to handle this as well. I ... >>> >>> From: Rico Bergmann <[hidden email] <mailto:[hidden email]>> >>> Sent: Tuesday, June 5, 2018 9:12:00 PM >>> To: Hao Sun >>> Cc: [hidden email] <mailto:[hidden email]>; user >>> Subject: Re: [DISCUSS] Flink 1.6 features >>> >>> +1 on K8s integration >>> >>> >>> >>> Am 06.06.2018 um 00:01 schrieb Hao Sun <[hidden email] <mailto:[hidden email]>>: >>> >>>> adding my vote to K8S Job mode, maybe it is this? >>>>> Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. >>>> >>>> >>>> >>>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <[hidden email] <mailto:[hidden email]>> wrote: >>>> Hi Stephan, >>>> >>>> Will [ https://issues.apache.org/jira/browse/FLINK-5479 <https://issues.apache.org/jira/browse/FLINK-5479> ] (Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions) be included in 1.6? As we are seeing more users with this issue on the mailing lists. >>>> >>>> Thanks. >>>> Ben >>>> >>>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <[hidden email] <mailto:[hidden email]>>: >>>> Hi Stephan, >>>> >>>> Will FLINK-7129 (Support dynamically changing CEP patterns) be included in 1.6? There were discussions about possibly including it in 1.6: >>>> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=OU7gru2O9JtoWXn1Lc1F7NKcxAyN6A3e58kxctb4b508RQ@...%3e <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=OU7gru2O9JtoWXn1Lc1F7NKcxAyN6A3e58kxctb4b508RQ@...%3e> >>>> >>>> Thanks, >>>> Shirley Shum >>>> >>>> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The release of Apache Flink 1.5 has happened (yay!) - so it is a good time >>>> >>>> From: Stephan Ewen <[hidden email] <mailto:[hidden email]>> >>>> To: [hidden email] <mailto:[hidden email]>, user <[hidden email] <mailto:[hidden email]>> >>>> Date: 06/04/2018 02:21 AM >>>> Subject: [DISCUSS] Flink 1.6 features >>>> >>>> >>>> >>>> Hi Flink Community! >>>> >>>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good time to start talking about what to do for release 1.6. >>>> >>>> == Suggested release timeline == >>>> >>>> I would propose to release around end of July (that is 8-9 weeks from now). >>>> >>>> The rational behind that: There was a lot of effort in release testing automation (end-to-end tests, scripted stress tests) as part of release 1.5. You may have noticed the big set of new modules under "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 release a bit, and needs to continue as part of the coming release cycle, but should help make releasing more lightweight from now on. >>>> >>>> (Side note: There are also some nightly stress tests that we created and run at data Artisans, and where we are looking whether and in which way it would make sense to contribute them to Flink.) >>>> >>>> == Features and focus areas == >>>> >>>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new network stack, recovery, SQL joins and client, ... Following something like a "tick-tock-model", I would suggest to focus the next release more on integrations, tooling, and reducing user friction. >>>> >>>> Of course, this does not mean that no other pull request gets reviewed, an no other topic will be examined - it is simply meant as a help to understand where to expect more activity during the next release cycle. Note that these are really the coarse focus areas - don't read this as a comprehensive list. >>>> >>>> This list is my first suggestion, based on discussions with committers, users, and mailing list questions. >>>> >>>> - Support Java 9 and Scala 2.12 >>>> >>>> - Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. >>>> >>>> - Polish the remaing parts of the FLIP-6 rewrite >>>> >>>> - Improve state backends with asynchronous timer snapshots, efficient timer deletes, state TTL, and broadcast state support in RocksDB. >>>> >>>> - Extends Streaming Sinks: >>>> - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) >>>> - Support ElasticSearch's new REST API >>>> >>>> - Smoothen State Evolution to support type conversion on snapshot restore >>>> >>>> - Enhance Stream SQL and CEP >>>> - Add support for "update by key" Table Sources >>>> - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores) >>>> - Expand SQL client >>>> - Integrate CEP and SQL, through MATCH_RECOGNIZE clause >>>> - Improve CEP Performance of SharedBuffer on RocksDB >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Cheers, >>> Sagar >> > > |
But if we do hive on flink , I think it should be a very big project.
> 在 2018年6月17日,下午9:36,Will Du <[hidden email]> 写道: > > Agree, two missing pieces I think could make Flink more competitive against Spark SQL/Stream and Kafka Stream > 1. Flink over Hive or Flink SQL hive table source and sink > 2. Flink ML on stream > > >> On Jun 17, 2018, at 8:34 AM, zhangminglei <[hidden email]> wrote: >> >> Actually, I have been an idea, how about support hive on flink ? Since lots of business are written by hive sql. And users wants to transform map reduce to fink without changing the sql. >> >> Zhangminglei >> >> >> >>> 在 2018年6月17日,下午8:11,zhangminglei <[hidden email]> 写道: >>> >>> Hi, Sagar >>> >>> There already has relative JIRAs for ORC and Parquet, you can take a look here: >>> >>> https://issues.apache.org/jira/browse/FLINK-9407 <https://issues.apache.org/jira/browse/FLINK-9407> and https://issues.apache.org/jira/browse/FLINK-9411 <https://issues.apache.org/jira/browse/FLINK-9411> >>> >>> For ORC format, Currently only support basic data types, such as Long, Boolean, Short, Integer, Float, Double, String. >>> >>> Best >>> Zhangminglei >>> >>> >>> >>>> 在 2018年6月17日,上午11:11,sagar loke <[hidden email]> 写道: >>>> >>>> We are eagerly waiting for >>>> >>>> - Extends Streaming Sinks: >>>> - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) >>>> >>>> Especially for ORC and Parquet sinks. Since, We are planning to use Kafka-jdbc to move data from rdbms to hdfs. >>>> >>>> Thanks, >>>> >>>> On Sat, Jun 16, 2018 at 5:08 PM Elias Levy <[hidden email] <mailto:[hidden email]>> wrote: >>>> One more, since it we have to deal with it often: >>>> >>>> - Idling sources (Kafka in particular) and proper watermark propagation: FLINK-5018 / FLINK-5479 >>>> >>>> On Fri, Jun 8, 2018 at 2:58 PM, Elias Levy <[hidden email] <mailto:[hidden email]>> wrote: >>>> Since wishes are free: >>>> >>>> - Standalone cluster job isolation: https://issues.apache.org/jira/browse/FLINK-8886 <https://issues.apache.org/jira/browse/FLINK-8886> >>>> - Proper sliding window joins (not overlapping hoping window joins): https://issues.apache.org/jira/browse/FLINK-6243 <https://issues.apache.org/jira/browse/FLINK-6243> >>>> - Sharing state across operators: https://issues.apache.org/jira/browse/FLINK-6239 <https://issues.apache.org/jira/browse/FLINK-6239> >>>> - Synchronizing streams: https://issues.apache.org/jira/browse/FLINK-4558 <https://issues.apache.org/jira/browse/FLINK-4558> >>>> >>>> Seconded: >>>> - Atomic cancel-with-savepoint: https://issues.apache.org/jira/browse/FLINK-7634 <https://issues.apache.org/jira/browse/FLINK-7634> >>>> - Support dynamically changing CEP patterns : https://issues.apache.org/jira/browse/FLINK-7129 <https://issues.apache.org/jira/browse/FLINK-7129> >>>> >>>> >>>> On Fri, Jun 8, 2018 at 1:31 PM, Stephan Ewen <[hidden email] <mailto:[hidden email]>> wrote: >>>> Hi all! >>>> >>>> Thanks for the discussion and good input. Many suggestions fit well with the proposal above. >>>> >>>> Please bear in mind that with a time-based release model, we would release whatever is mature by end of July. >>>> The good thing is we could schedule the next release not too far after that, so that the features that did not quite make it will not be delayed too long. >>>> In some sense, you could read this as as "what to do first" list, rather than "this goes in, other things stay out". >>>> >>>> Some thoughts on some of the suggestions >>>> >>>> Kubernetes integration: An opaque integration with Kubernetes should be supported through the "as a library" mode. For a deeper integration, I know that some committers have experimented with some PoC code. I would let Till add some thoughts, he has worked the most on the deployment parts recently. >>>> >>>> Per partition watermarks with idleness: Good point, could one implement that on the current interface, with a periodic watermark extractor? >>>> >>>> Atomic cancel-with-savepoint: Agreed, this is important. Making this work with all sources needs a bit more work. We should have this in the roadmap. >>>> >>>> Elastic Bloomfilters: This seems like an interesting new feature - the above suggested feature set was more about addressing some longer standing issues/requests. However, nothing should prevent contributors to work on that. >>>> >>>> Best, >>>> Stephan >>>> >>>> >>>> On Wed, Jun 6, 2018 at 6:23 AM, Yan Zhou [FDS Science] <[hidden email] <mailto:[hidden email]>> wrote: >>>> +1 on https://issues.apache.org/jira/browse/FLINK-5479 <https://issues.apache.org/jira/browse/FLINK-5479> >>>> [FLINK-5479] Per-partition watermarks in ... <https://issues.apache.org/jira/browse/FLINK-5479> >>>> issues.apache.org <http://issues.apache.org/> >>>> Reported in ML: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html> It's normally not a common case to have Kafka partitions not producing any data, but it'll probably be good to handle this as well. I ... >>>> >>>> From: Rico Bergmann <[hidden email] <mailto:[hidden email]>> >>>> Sent: Tuesday, June 5, 2018 9:12:00 PM >>>> To: Hao Sun >>>> Cc: [hidden email] <mailto:[hidden email]>; user >>>> Subject: Re: [DISCUSS] Flink 1.6 features >>>> >>>> +1 on K8s integration >>>> >>>> >>>> >>>> Am 06.06.2018 um 00:01 schrieb Hao Sun <[hidden email] <mailto:[hidden email]>>: >>>> >>>>> adding my vote to K8S Job mode, maybe it is this? >>>>>> Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. >>>>> >>>>> >>>>> >>>>> On Mon, Jun 4, 2018 at 11:01 PM Ben Yan <[hidden email] <mailto:[hidden email]>> wrote: >>>>> Hi Stephan, >>>>> >>>>> Will [ https://issues.apache.org/jira/browse/FLINK-5479 <https://issues.apache.org/jira/browse/FLINK-5479> ] (Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions) be included in 1.6? As we are seeing more users with this issue on the mailing lists. >>>>> >>>>> Thanks. >>>>> Ben >>>>> >>>>> 2018-06-05 5:29 GMT+08:00 Che Lui Shum <[hidden email] <mailto:[hidden email]>>: >>>>> Hi Stephan, >>>>> >>>>> Will FLINK-7129 (Support dynamically changing CEP patterns) be included in 1.6? There were discussions about possibly including it in 1.6: >>>>> http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=OU7gru2O9JtoWXn1Lc1F7NKcxAyN6A3e58kxctb4b508RQ@...%3e <http://mail-archives.apache.org/mod_mbox/flink-user/201803.mbox/%3cCAMq=OU7gru2O9JtoWXn1Lc1F7NKcxAyN6A3e58kxctb4b508RQ@...%3e> >>>>> >>>>> Thanks, >>>>> Shirley Shum >>>>> >>>>> Stephan Ewen ---06/04/2018 02:21:47 AM---Hi Flink Community! The release of Apache Flink 1.5 has happened (yay!) - so it is a good time >>>>> >>>>> From: Stephan Ewen <[hidden email] <mailto:[hidden email]>> >>>>> To: [hidden email] <mailto:[hidden email]>, user <[hidden email] <mailto:[hidden email]>> >>>>> Date: 06/04/2018 02:21 AM >>>>> Subject: [DISCUSS] Flink 1.6 features >>>>> >>>>> >>>>> >>>>> Hi Flink Community! >>>>> >>>>> The release of Apache Flink 1.5 has happened (yay!) - so it is a good time to start talking about what to do for release 1.6. >>>>> >>>>> == Suggested release timeline == >>>>> >>>>> I would propose to release around end of July (that is 8-9 weeks from now). >>>>> >>>>> The rational behind that: There was a lot of effort in release testing automation (end-to-end tests, scripted stress tests) as part of release 1.5. You may have noticed the big set of new modules under "flink-end-to-end-tests" in the Flink repository. It delayed the 1.5 release a bit, and needs to continue as part of the coming release cycle, but should help make releasing more lightweight from now on. >>>>> >>>>> (Side note: There are also some nightly stress tests that we created and run at data Artisans, and where we are looking whether and in which way it would make sense to contribute them to Flink.) >>>>> >>>>> == Features and focus areas == >>>>> >>>>> We had a lot of big and heavy features in Flink 1.5, with FLIP-6, the new network stack, recovery, SQL joins and client, ... Following something like a "tick-tock-model", I would suggest to focus the next release more on integrations, tooling, and reducing user friction. >>>>> >>>>> Of course, this does not mean that no other pull request gets reviewed, an no other topic will be examined - it is simply meant as a help to understand where to expect more activity during the next release cycle. Note that these are really the coarse focus areas - don't read this as a comprehensive list. >>>>> >>>>> This list is my first suggestion, based on discussions with committers, users, and mailing list questions. >>>>> >>>>> - Support Java 9 and Scala 2.12 >>>>> >>>>> - Smoothen the integration in Container environment, like "Flink as a Library", and easier integration with Kubernetes services and other proxies. >>>>> >>>>> - Polish the remaing parts of the FLIP-6 rewrite >>>>> >>>>> - Improve state backends with asynchronous timer snapshots, efficient timer deletes, state TTL, and broadcast state support in RocksDB. >>>>> >>>>> - Extends Streaming Sinks: >>>>> - Bucketing Sink should support S3 properly (compensate for eventual consistency), work with Flink's shaded S3 file systems, and efficiently support formats that compress/index arcoss individual rows (Parquet, ORC, ...) >>>>> - Support ElasticSearch's new REST API >>>>> >>>>> - Smoothen State Evolution to support type conversion on snapshot restore >>>>> >>>>> - Enhance Stream SQL and CEP >>>>> - Add support for "update by key" Table Sources >>>>> - Add more table sources and sinks (Kafka, Kinesis, Files, K/V stores) >>>>> - Expand SQL client >>>>> - Integrate CEP and SQL, through MATCH_RECOGNIZE clause >>>>> - Improve CEP Performance of SharedBuffer on RocksDB >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Cheers, >>>> Sagar >>> >> >> |
Thanks @zhangminglei for replying. I agree, hive on Flink would be a big project. By the way, i looked at the Jira ticket related to ORC format which you shared. Couple of comments/requests about the pull request in th ticket: 1. Sorry for nitpicking but meatSchema is mispelled. I think it should be metaSchema. 2. Will you be able to add more unit tests in the commit ? Eg. Writing some example data with simple schema which will initialize OrcWriter object and sinking it to local hdfs node ? 3. Are there plans to add support for other data types ? Thanks, Sagar On Sun, Jun 17, 2018 at 6:45 AM zhangminglei <[hidden email]> wrote: But if we do hive on flink , I think it should be a very big project. -- Cheers,
Sagar |
Free forum by Nabble | Edit this page |