Hi everyone,
we have just published a new open source Truffle project, FastR-Flink. It is available in https://bitbucket.org/allr/fastr-flink FastR is an implementation of the R language on top of Truffle and Graal [3] developed by Purdue University, Johannes Kepler University and Oracle Labs [1]. FastR-Flink is fork of FastR project which includes Apache Flink [2] connection, allowing to execute distributed stream and batch data processing applications from R programming language by using FastR +Graal. This is a prototype we have been working on in Oracle Labs during my internship. See README_FASTR_FLINK in https://bitbucket.org/allr/fastr-flink for more information about how to compile and start running FastR+Flink applications. I also included a directory with a few examples. Juan [1] FastR: https://bitbucket.org/allr/fastr [2] Apache Flink: https://flink.apache.org/ [3] Graal: http://openjdk.java.net/projects/graal/ |
Hi Juan, Thanks for sharing. This looks very promising. A great way for people who want to use R and Flink without compromising much on performance.Best, Here are the supported operations from the Wiki:Operations supportedFastR operation Description flink.localApply(input, func, scope=c("")) Blocking sapply function in R. It runs with local Flink configuration flink.remoteApply(input, func, scope=c("")) Blocking remote sapply. Remote enviroment flink.map(x, func, scope=c(""), local=FALSE) Non-blocking sapply. It allows to use pattern composition. If LOCAL=FALSE, it uses remote Flink configuration flink.connectToJobManager(ip) It establishes the IP for remote execution flink.setParallelism(nThreads) Set the number of threads/workers flink.readTextFile(path) Read HDFS file flink.writeAsText(path) Write file into HDFS flink.flatMap(x, func, scope=c("")) Non blocking flatMap in Flink flink.groupBy(object) GroupBy previous operation (object) flink.sum(object) Sum previous operation (object) flink.execute() It cosumes the operation pipeline defined flink.collect(object) It consumes the pipeline and return the R result flink.filter(object, func) Filter previous operation (only map supported) according to the function passed as argument. This operation is experimental.On Mon, Oct 26, 2015 at 1:09 PM, Juan Fumero <[hidden email]> wrote: Hi everyone, |
Hi Max,
yes, we started running some benchmarks, but still this is very preliminary version. Concerning performance what I can tell is we have very good speedups on shared memory compared to fastR. Concerning cluster applications we do not have good speedups yet for big R applications. We are studying/investigating the problem. General speaking, for each Flink thread, we build the AST which represents the R code (user function) and then it will be specialized (partial evaluation). After a while, the R code will be compiled to an optimize machine code via Truffle+Graal. In the link you showed are the operations we support for the moment. We plan to support more operations. Juan On Mon, 2015-10-26 at 13:43 +0100, Maximilian Michels wrote: > Hi Juan, > > > Thanks for sharing. This looks very promising. A great way for people > who want to use R and Flink without compromising much on performance. > > > I would be curious about some cluster performance tests. Have you run > any yet? > > > Best, > > max > > https://bitbucket.org/allr/fastr-flink/src/71cf3f264a1faf98182781c9cdabef1b677a3bc6/README_FASTR_FLINK.md > > > Here are the supported operations from the Wiki: > > Operations supported > FastR operation Description flink.localApply(input, func, scope=c("")) > Blocking sapply function in R. It runs with local Flink configuration > flink.remoteApply(input, func, scope=c("")) Blocking remote sapply. > Remote enviroment flink.map(x, func, scope=c(""), local=FALSE) > Non-blocking sapply. It allows to use pattern composition. If > LOCAL=FALSE, it uses remote Flink configuration > flink.connectToJobManager(ip) It establishes the IP for remote > execution flink.setParallelism(nThreads) Set the number of > threads/workers flink.readTextFile(path) Read HDFS file > flink.writeAsText(path) Write file into HDFS flink.flatMap(x, func, > scope=c("")) Non blocking flatMap in Flink flink.groupBy(object) > GroupBy previous operation (object) flink.sum(object) Sum previous > operation (object) flink.execute() It cosumes the operation pipeline > defined flink.collect(object) It consumes the pipeline and return the > R result flink.filter(object, func) Filter previous operation (only > map supported) according to the function passed as argument. This > operation is experimental. > > > On Mon, Oct 26, 2015 at 1:09 PM, Juan Fumero > <[hidden email]> wrote: > Hi everyone, > we have just published a new open source Truffle project, > FastR-Flink. It is available in > https://bitbucket.org/allr/fastr-flink > > FastR is an implementation of the R language on top of Truffle > and Graal > [3] developed by Purdue University, Johannes Kepler University > and > Oracle Labs [1]. > > FastR-Flink is fork of FastR project which includes Apache > Flink [2] > connection, allowing to execute distributed stream and batch > data > processing applications from R programming language by using > FastR > +Graal. This is a prototype we have been working on in Oracle > Labs > during my internship. > > See README_FASTR_FLINK in > https://bitbucket.org/allr/fastr-flink > for more information about how to compile and start running > FastR+Flink > applications. I also included a directory with a few examples. > > Juan > > [1] FastR: https://bitbucket.org/allr/fastr > [2] Apache Flink: https://flink.apache.org/ > [3] Graal: http://openjdk.java.net/projects/graal/ > > > |
Free forum by Nabble | Edit this page |