r - Turn on all CPUs for all nodes on a cluster: snow/snowfall package -
i working on cluster , using snowfall
package establish socket cluster on 5 nodes 40 cpus each following command:
> sfinit(parallel=true, cpus = 200, type="sock", sockethosts=c("host1", "host2", "host3", "host4", "host5")); r version: r version 3.1.0 (2014-04-10) snowfall 1.84-6 initialized (using snow 0.3-13): parallel execution on 5 cpus.
i seeing lower load on slaves expected when check cluster report , disconcerted fact says "parallel execution on 5 cpus" instead of "parallel execution on 200 cpus". merely ambiguous reference cpus or hosts running 1 cpu each?
edit: here example of why concerns me, if use local machine , specify max number of cores, have:
> sfinit(parallel=true, type="sock", cpus = 40); snowfall 1.84-6 initialized (using snow 0.3-13): parallel execution on 40 cpus.
i ran identical job on single node, 40 cpu cluster , took 1.4 minutes while 5 node, apparently 5 cpu cluster took 5.22 minutes. me confirms suspicions running parallelism on 5 nodes turning on 1 of cpus on each node.
my question then: how turn on cpus use across available nodes?
edit: @simong used underlying snow
package's intialization , can see 5 nodes being turned on:
> cl <- makesockcluster(names = c("host1", "host2", "host3", "host4", "host5"), count = 200) > clustercall(cl, runif, 3) [[1]] [1] 0.9854311 0.5737885 0.8495582 [[2]] [1] 0.7272693 0.3157248 0.6341732 [[3]] [1] 0.26411931 0.36189866 0.05373248 [[4]] [1] 0.3400387 0.7014877 0.6894910 [[5]] [1] 0.2922941 0.6772769 0.7429913 > stopcluster(cl) > cl <- makesockcluster(names = rep("localhost", 40), count = 40) > clustercall(cl, runif, 3) [[1]] [1] 0.6914666 0.7273244 0.8925275 [[2]] [1] 0.3844729 0.7743824 0.5392220 [[3]] [1] 0.2989990 0.7256851 0.6390770 [[4]] [1] 0.07114831 0.74290601 0.57995908 [[5]] [1] 0.4813375 0.2626619 0.5164171 . . . [[39]] [1] 0.7912749 0.8831164 0.1374560 [[40]] [1] 0.2738782 0.4100779 0.0310864
i think shows pretty clearly. tried in desperation:
> cl <- makesockcluster(names = rep(c("host1", "host2", "host3", "host4", "host5"), each = 40), count = 200)
and predictably got:
error in socketconnection(port = port, server = true, blocking = true, : connections in use
after thoroughly reading snow
documentation, have come (partial) solution.
i read 128 connections may opened @ once distributed r version, , have found true. can open 25 cpus on each node, cluster not start if try start 26 on each. here proper structure of host list needs passed makecluster
:
> library(snow); > unixhost13 <- list(host = "host1"); > unixhost14 <- list(host = "host2"); > unixhost19 <- list(host = "host3"); > unixhost29 <- list(host = "host4"); > unixhost30 <- list(host = "host5"); > kcpus <- 25; > hostlist <- c(rep(list(unixhost13), kcpus), rep(list(unixhost14), kcpus), rep(list(unixhost19), kcpus), rep(list(unixhost29), kcpus), rep(list(unixhost30), kcpus)); > cl <- makecluster(hostlist, type = "sock") > clustercall(cl, runif, 3) [[1]] [1] 0.08430941 0.64479036 0.90402362 [[2]] [1] 0.1821656 0.7689981 0.2001639 [[3]] [1] 0.5917363 0.4461787 0.8000013 . . . [[123]] [1] 0.6495153 0.6533647 0.2636664 [[124]] [1] 0.75175580 0.09854553 0.66568129 [[125]] [1] 0.79336203 0.61924813 0.09473841
i found reference saying in order connections, r needed rebuilt nconnections set higher (see here).
Comments
Post a Comment