Hi,
The current start/stop scripts SSH worker nodes each time they appear in the slaves file. When spawning multiple TMs (like 24 per node), this is very inefficient. I've changed the scripts to do one SSH per node and spawn a given N number of TMs afterwards. I can make a pull request if this seems usable to others. For now, I assume slaves file will indicate the number of TMs per slave in "IP N" format. Thank you, Saliya Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
Hi,
I think this would be a nice addition especially for Flink clusters running on big machines where you might want to run multiple task managers just to split the memory between multiple java processes. In any case the previous config format should also be supported as the default. I am curious what other developers/users think about this. Cheers, Gyula Saliya Ekanayake <[hidden email]> ezt írta (időpont: 2016. júl. 9., Szo, 9:44):
|
Thank you. Yes, the previous format is still supported. If a number is specified after the hostname then only it'll kick in this change. On Sun, Jul 10, 2016 at 5:42 PM, Gyula Fóra <[hidden email]> wrote:
Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
In reply to this post by Saliya Ekanayake
Hi Saliya, Would you happen to have pdsh (parallel distributed shell) installed? If so the TaskManager startup in start-cluster.sh will run in parallel. As to running 24 TaskManagers together, are these running across multiple NUMA nodes? I had filed FLINK-3163 (https://issues.apache.org/jira/browse/FLINK-3163) last year as I have seen that even with only two NUMA nodes performance is improved by binding TaskManagers, both memory and CPU. I think we can improve configuration of task slots as we do with memory, where the latter can be a fixed measure or a fraction relative to total memory. Greg On Sat, Jul 9, 2016 at 3:44 AM, Saliya Ekanayake <[hidden email]> wrote:
|
pdsh is available in head node only, but when I tried to do start-cluster from head node (note Job manager node is not head node) it didn't work, which is why I modified the scripts. Yes, exactly, this is what I was trying to do. My research area has been on these NUMA related issues and binding a process to a socket (CPU) and then its thread to individual cores have shown great advantage. I actually have Java code that automatically (user configurable as well) bind processes and threads. For Flink, I've manually done this using shell script that scans TMs in a node and pin them appropriately. This approach is OK, but it's better if the support is integrated to Flink. On Sun, Jul 10, 2016 at 8:33 PM, Greg Hogan <[hidden email]> wrote:
Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
I'd definitely be interested to hear any insight into what failed when starting the taskmanagers with pdsh. Did the command fail, or fallback to standard ssh, a parse error on the slaves file? I'm wondering if we need to escape PDSH_SSH_ARGS_APPEND=$FLINK_SSH_OPTS PDSH_SSH_ARGS_APPEND="${FLINK_SSH_OPTS}" On Mon, Jul 11, 2016 at 12:02 AM, Saliya Ekanayake <[hidden email]> wrote:
|
I am running some jobs now. I'll stop and restart using pdsh to see what was the issue again On Mon, Jul 11, 2016 at 12:15 PM, Greg Hogan <[hidden email]> wrote:
Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
I meant, I'll check when current jobs are done and will let you know. On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake <[hidden email]> wrote:
Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
Looking at what happens with pdsh, there are two things that go wrong. 1. pdsh is installed in a node other than where the job manager would run, so invoking start-cluster from there does not spawn a job manager. Only if I do start-cluster from the node I specify as the job manager's node that it'll be created. 2. If the slaves file has the same IP more than once, then the following error happens trying to move log files. For example I had node j-029 specified twice in my slaves file. j-020: mv: cannot move `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.log' to `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.log.1': No such file or directory j-020: mv: cannot move `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.out' to `/N/u/sekanaya/sali/software/flink-1.0.3/log/flink-sekanaya-taskmanager-26-j-020.out.1': No such file or directory On Mon, Jul 11, 2016 at 12:19 PM, Saliya Ekanayake <[hidden email]> wrote:
Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
pdsh is only used for starting taskmanagers. How did you work around this? You are able to passwordless-ssh to the jobmanager? The error looks to be from config.sh:318 in rotateLogFile. The way we generate the taskmanager index assumes that taskmanagers are started sequentially (flink-daemon.sh:108).On Mon, Jul 11, 2016 at 2:59 PM, Saliya Ekanayake <[hidden email]> wrote:
|
Yes, I've password-less SSH to the job manager node. On Mon, Jul 11, 2016 at 4:53 PM, Greg Hogan <[hidden email]> wrote:
Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington |
Free forum by Nabble | Edit this page |