Hi,
Good to see another interesting apache incubator project but I am just curious on what is flink trying to solve which spark is not currently addressing? I am sure you get this question a lot.. :) Thanks Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates |
In fact, the use-cases of Spark and Flink overlap a bit. However, the technology used under the hood is quite different. Flink shares a lot of similarities with relational DBMS. Data is serialized in byte buffers and processed a lot in binary representation. This also allows for fine-grained memory control. Flink uses a pipelined processing model and it has a cost-based optimizer that selects execution strategies and avoids expensive partitioning and sorting steps. Moreover, Flink features a special kind of iterations (delta-iterations) that can significantly reduce the amount of computations as iterations go on (the vertex-centric computing model of Pregel / Giraph is a special kind of that). Btw. The academic projects from which Spark and Flink originated started about the same time ;-) Best, Fabian 2014-10-19 20:33 GMT+02:00 Mohit Singh <[hidden email]>:
|
Cool... Thanks for the update.. What would be a good way to start contributing into flink. I am comfortable with java but not so much on scala end but I would love to pick it up as I go.. But basically, is there a good starting place for start pitching in and contribute? On Sun, Oct 19, 2014 at 12:50 PM, Fabian Hueske <[hidden email]> wrote:
Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates |
Hey Mohit,
On 21 Oct 2014, at 02:11, Mohit Singh <[hidden email]> wrote: > Cool... Thanks for the update.. > What would be a good way to start contributing into flink. > I am comfortable with java but not so much on scala end but I would love to pick it up as I go.. > But basically, is there a good starting place for start pitching in and contribute? Great to hear that you are interested in contributing. :) It is not necessary to know Scala, because the core runtime is written in Java for the most part. As first steps I would suggest the following: - Have a look at the contribution guide here [1]. - We try to assign all issues to system components [2]. This might help you to get an overview about which parts of the system are interesting to you. Does this help as a starting point? We can then discuss specific issues here on the mailing list or in the respective issue. – Ufuk [1] http://flink.incubator.apache.org/how-to-contribute.html [2] https://issues.apache.org/jira/browse/FLINK?selectedTab=com.atlassian.jira.jira-projects-plugin:components-panel |
Maybe I can start here: https://issues.apache.org/jira/browse/FLINK-1168 ?? On Tue, Oct 21, 2014 at 2:26 AM, Ufuk Celebi <[hidden email]> wrote: Hey Mohit, Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates |
Hi Mohit, that would be a good issue to start with. Unfortunately, I assigned the issue to myself an started working on it. However, I am not done yet. If you like, you can pick up the issue and either continue with what I did so far [1] or start all over. Just let me know... Another option would be to extend the Hadoop Compatibility Layer. Right now, we have wrappers for Hadoop's mapred-API function (Mapper, Reducer), but not for the mapreduce-API functions [2]. Having wrappers for mapreduce-API functions would also be cool. There is no JIRA for this issue yet. And then there are of course plenty of other issues ;-) Cheers, Fabian 2014-10-21 23:56 GMT+02:00 Mohit Singh <[hidden email]>:
|
Hi Mohit, I saw you created a JIRA issue for the Hadoop mapreduce function wrappers. Shall I assign the issue to you, so everybody knows you are working on it? Best, Fabian 2014-10-22 0:14 GMT+02:00 Fabian Hueske <[hidden email]>:
|
Hi Fabian, Yeah, that would be great :) Thanks On Wed, Oct 22, 2014 at 12:06 AM, Fabian Hueske <[hidden email]> wrote:
Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates |
Done! I'm very happy that you're joining the community :-) Welcome! Fabian 2014-10-22 9:11 GMT+02:00 Mohit Singh <[hidden email]>:
|
Great.
And So am I. :) Looking forward to learn things and contribute back. On Wed, Oct 22, 2014 at 12:22 AM, Fabian Hueske <[hidden email]> wrote:
Mohit "When you want success as badly as you want the air, then you will get it. There is no other secret of success." -Socrates |
Sorry for the intrusion but you mentioned the differences between Spark and Flink..what is not clear to me if Apache Drill is a less architectured version of Flink or something very similar..could someone give me a clarification on this?
Best, Flavio On Wed, Oct 22, 2014 at 9:26 AM, Mohit Singh <[hidden email]> wrote:
|
Sure, no problem :-) Drill is a SQL engine and therefore in the same league as Apache Hive, Apache Tajo, or Cloudera's Impala. Flink (and Spark) focus on use cases that exceed pure SQL (+ a few UDFs) such as Graph processing, Machine Learning, and very custom data flows. Best, Fabian 2014-10-22 10:58 GMT+02:00 Flavio Pompermaier <[hidden email]>:
|
I should add that Flink also works well for relational use cases (see the examples). However, It does not offer a SQL interface which means that a query must be implemented as a "handcrafted" data flow consisting of operators such as filter, join, project, and group. Adding a SQL interface (such as Spark) should be possible without changing a lot of the internals. However, nobody in the community is currently putting a focus on that, AFAIK. 2014-10-22 11:14 GMT+02:00 Fabian Hueske <[hidden email]>:
|
Free forum by Nabble | Edit this page |