Hi,
I think
1. should not be a problem if the machine has enough capacities to run both.
2. is not truly harmful if you have more than one Zookeeper node, but in case the machine of your JM goes down, it also takes off one ZK node. It is no problem if the remaining ZK nodes can take over to recover your job manager, but running JM on a different node than the ZK nodes can potentially leave you with one more ZK node when the JM machine goes down and it really matters to have ZK available for recovery.
3. yes, that is why it is only called „rule of thumb“. You can always tune the number of slots for the specifics of your job, one if wich can be I/O-heavy vs compute-heavy.
Best,
Stefan
Questions about standalone cluster configuration:
- Is it considered bad practice to have standby JobManagers co-located on the same machines as TaskManagers?
- Is it considered bad practice to have zookeeper installed on the same machines as the JobManager leader and standby machines? (the docs say "In production setups, it is recommended to manage your own ZooKeeper installation.", but I'm assuming it's still okay to co-locate ZK on with JobManager?)
- In another thread, I read that the rule of thumb for taskmanager.numberOfTaskSlots = number of cores. Doesn't this ignore cases where threads have a high proportion of idle time (i.e. waiting on an I/O call)? If the total number of task slot limits my degree of parallelism, but most parallel copies of a subtask are idle at any given time, it seems that I would want to have # of task slots equal to some multiple of the number of cores.
Thanks,
Edward