akka - Scala distributed execution of function objects -


given following function objects,

val f : int => double = (i:int) => + 0.1  val g1 : double => double = (x:double) => x*10  val g2 : double => double = (x:double) => x/10  val h : (double,double) => double = (x:double,y:double) => x+y 

and instance 3 remote servers or nodes (ip xxx.xxx.xxx.1, ip 2 , ip 3), how distribute execution of program,

val fx = f(1) val g1x = g1( fx ) val g2x = g2( fx ) val res = h ( g1x, g2x ) 

so

  • fx computed in ip 1,
  • g1x computed in ip 2,
  • g2x computed in ip 3,
  • res computed in ip 1

may scala akka or apache spark provide simple approach ?

update

  • rpc (remote procedure call) finagle suggested @pkinsky may feasible choice.
  • consider load-balancing policies mechanism selecting node execution, @ least any free available node policy.

i can speak apache spark. can looking code below. it's not designed kind of parallel computation. designed parallel computation have large amount of parallel data distributed on many machines. solution looks bit silly, distribute single integer across single machine example (for f(1)).

also, spark designed run same computation on data. running g1() , g2() in parallel goes bit against design. (it's possible, not elegant, see.)

// distribute input (1) across 1 machine. val rdd1 = sc.parallelize(seq(1), numslices = 1) // run f() on input, collect results , take first (and only) result. val fx = rdd1.map(f(_)).collect.head // next stage's input (1, fx), (2, fx) distributed across 2 machines. val rdd2 = sc.parallelize(seq((1, fx), (2, fx)), numslices = 2) // run g1() on 1 machine, g2() on other. val gxs = rdd2.map {   case (1, x) => g1(x)   case (2, x) => g2(x) }.collect val g1x = gxs(0) val g2x = gxs(1) // same deal h() f(). input (g1x, g2x), distributed 1 machine. val rdd3 = sc.parallelize(seq((g1x, g2x)), numslices = 1) val res = rdd3.map { case (g1x, g2x) => h(g1x, g2x) }.collect.head 

you can see spark code based around concept of rdds. rdd array, except it's partitioned across multiple machines. sc.parallelize() creates such parallel collection local collection. example rdd2 in above code created local collection seq((1, fx), (2, fx)) , split across 2 machines. 1 machine have seq((1, fx)), other have seq((2, fx)).

next transformation on rdd. map common transformation creates new rdd of same length applying function each element. (same scala's map.) map run on rdd2 replace (1, x) g1(x) , (2, x) g2(x). on 1 machine cause g1() run, while on other g2() run.

transformations run lazily, when want access results. methods access results called actions. straightforward example collect, downloads contents of entire rdd cluster local machine. (it opposite of sc.parallelize().)

you can try , see if download spark, start bin/spark-shell, , copy function definitions , above code shell.


Comments

Popular posts from this blog

javascript - Jquery show_hide, what to add in order to make the page scroll to the bottom of the hidden field once button is clicked -

javascript - Highcharts multi-color line -

javascript - Enter key does not work in search box -