Hi,
I browsed Flink documentation but I don't find a deep comparison between the feature of Flink in standalone deployment/YARN/Mesos except technical guides to setup them. I'm a newbie in cluster computing so I have never used YARN or Mesos. I've just learned something about their functionalities on google. Anyway I would like to understand how can I choose which deployment mode I have to use. Apparently it seems that with YARN or Mesos I will have automatic failure recovery but if this is true... which are the reason to setup Flink without them? I would to develop a cluster where a stream computing is performed, I use Kafka as source and Cassandra as sink of data so the cluster is just a real-time processor which doesn't store data: they just pass through it. There are no pre-existing cluster, I have research interest so I would to understand better setup to fit my purpose. Can you explain me how to choose between these cluster setup or provide me some links to learn it? Thanks in advance, Andrea |
Hi Andrea,
If you are using Flink for research and/or testing purpose, standalone Flink is more or less sufficient. Although if you have a huge amount of data, it may take forever to process data with only one node/machine and that's where a cluster would be needed. A yarn and mesos cluster could provide you high availability and fault tolerance so that you don't lose your data if something happens to one of the nodes in your cluster setup. Also, AFAIK, flink relies on a resource manager like Yarn or Mesos to distribute the task between multiple nodes so that you don't have to worry about that distribution. For the rest, I would like the experts here to correct me and add more info here. |
Hello,
just quickly chiming in for clarification/correction purpose: Flink can work in multi-node environments without yarn/mesos. If you are only starting out, or have short-lived workloads (i.e. a job that does not have to run for days/weeks), I would recommend standalone mode for easy-of-entry. Also, Flink does not use resource managers to distribute tasks, but to manage Job- and TaskManagers, which in standalone mode you have to do yourself. On 16.06.2017 15:37, Biplob Biswas wrote: > Hi Andrea, > > If you are using Flink for research and/or testing purpose, standalone Flink > is more or less sufficient. Although if you have a huge amount of data, it > may take forever to process data with only one node/machine and that's where > a cluster would be needed. A yarn and mesos cluster could provide you high > availability and fault tolerance so that you don't lose your data if > something happens to one of the nodes in your cluster setup. Also, AFAIK, > flink relies on a resource manager like Yarn or Mesos to distribute the task > between multiple nodes so that you don't have to worry about that > distribution. > > For the rest, I would like the experts here to correct me and add more info > here. > > > > -- > View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-choose-between-YARN-Mesos-StandAlone-Flink-tp13793p13801.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com. > |
Ok I understand standalone mode it will be sufficient, but for my thesis I would like to setup a well performed ready-to-use infrastructure. My workload it's not heavy, about 35 millions of messages a day (35 gb) but it should be easily expandable and running for many days... due to this I would like to setup Flink on top of a cluster manager.
|
Free forum by Nabble | Edit this page |