|
Thanks for your answers :)
Best regards/祝好,
Chang Liu 刘畅
Hi Chiang,
Some of the answers you can find in line:
Dear All,
I am helping my team setup a Flink cluster and we would like to have high availability and easy to scale.
We would like to setup a minimal cluster environment but can be easily scaled in the future. This is my simple proposal: - 2 nodes
- each node is running a Flink instance, a YARN, and a HDFS
- Flink, YARN and HDFS are all running in cluster mode.
<image.jpeg>
Based on it, my questions are: - By using HDFS as the file system, we can achieve fault tolerant (by recovering the checkpoint states when job fails). Question: so Flink itself is not capable of keeping and maintaining distributed state persistence just using local Linux file system, right?
- Then, my follow-up is: if you have a Flink cluster (multiple nodes), and you use local Linux file system keeping the state checkpoints, what will happen if Flink job failed and Flink start to restart the job and recover the state from checkpoints?
For both the above: When a task fails, the whole job (all the tasks) are restarted, and are rescheduled on different machines. If you use a local FS and you try to fetch state remotely upon recovery, how would the new nodes be able to locate the state on a remote machine?
This is why Flink uses a distributed file system. - If the Flink is deployed and managed on YARN, does that mean: if YARN is down, Flink is down?
Well, it depends on which component fails. And I am not sure about all of them, but you could try it and see.
YARN can make sure that a new job master starts, but that master will have to fetch the state of the previous job master in order to know which jobs are running, their progress, etc. - How do you think I can keep different components of the architecture in different nodes (servers)? Do I keep every instance of Flink/YARN/HDFS on every single server, or I put each of them on completely different servers. Some of my considerations:
- if we put them on different servers, there will be many latency over the network between Flink <-> HDFS, and YARN <-> HDFS
- But if I each all of the 3 components Flink/YARN/HDFS on every server, they can also fight against each other for resources, right?
You are right that you have to consider the above before deciding on your setup. - Correct me if i am wrong: one thing for sure is that, for every new where there is a Flink instance running, there should be a YARN running right?
Many thanks in advance! Best regards/祝好,
Chang Liu 刘畅
I hope this helps, Kostas
|