FastR-Flink: a new open source Truffle project

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

FastR-Flink: a new open source Truffle project

Juan Fumero
Hi everyone,
   we have just published a new open source Truffle project,
FastR-Flink. It is available in https://bitbucket.org/allr/fastr-flink 

FastR is an implementation of the R language on top of Truffle and Graal
[3] developed by Purdue University, Johannes Kepler University and
Oracle Labs [1].

FastR-Flink is fork of FastR project which includes Apache Flink [2]
connection, allowing to execute distributed stream and batch data
processing applications from R programming language by using FastR
+Graal. This is a prototype we have been working on in Oracle Labs
during my internship.

See README_FASTR_FLINK in  https://bitbucket.org/allr/fastr-flink
for more information about how to compile and start running FastR+Flink
applications. I also included a directory with a few examples.

Juan

[1] FastR: https://bitbucket.org/allr/fastr
[2] Apache Flink: https://flink.apache.org/ 
[3] Graal: http://openjdk.java.net/projects/graal/ 

Reply | Threaded
Open this post in threaded view
|

Re: FastR-Flink: a new open source Truffle project

Maximilian Michels
Hi Juan,

Thanks for sharing. This looks very promising. A great way for people who want to use R and Flink without compromising much on performance.

I would be curious about some cluster performance tests. Have you run any yet?

Best,
Here are the supported operations from the Wiki:

Operations supported

FastR operation Description flink.localApply(input, func, scope=c("")) Blocking sapply function in R. It runs with local Flink configuration flink.remoteApply(input, func, scope=c("")) Blocking remote sapply. Remote enviroment flink.map(x, func, scope=c(""), local=FALSE) Non-blocking sapply. It allows to use pattern composition. If LOCAL=FALSE, it uses remote Flink configuration flink.connectToJobManager(ip) It establishes the IP for remote execution flink.setParallelism(nThreads) Set the number of threads/workers flink.readTextFile(path) Read HDFS file flink.writeAsText(path) Write file into HDFS flink.flatMap(x, func, scope=c("")) Non blocking flatMap in Flink flink.groupBy(object) GroupBy previous operation (object) flink.sum(object) Sum previous operation (object) flink.execute() It cosumes the operation pipeline defined flink.collect(object) It consumes the pipeline and return the R result flink.filter(object, func) Filter previous operation (only map supported) according to the function passed as argument. This operation is experimental.

On Mon, Oct 26, 2015 at 1:09 PM, Juan Fumero <[hidden email]> wrote:
Hi everyone,
   we have just published a new open source Truffle project,
FastR-Flink. It is available in https://bitbucket.org/allr/fastr-flink

FastR is an implementation of the R language on top of Truffle and Graal
[3] developed by Purdue University, Johannes Kepler University and
Oracle Labs [1].

FastR-Flink is fork of FastR project which includes Apache Flink [2]
connection, allowing to execute distributed stream and batch data
processing applications from R programming language by using FastR
+Graal. This is a prototype we have been working on in Oracle Labs
during my internship.

See README_FASTR_FLINK in  https://bitbucket.org/allr/fastr-flink
for more information about how to compile and start running FastR+Flink
applications. I also included a directory with a few examples.

Juan

[1] FastR: https://bitbucket.org/allr/fastr
[2] Apache Flink: https://flink.apache.org/
[3] Graal: http://openjdk.java.net/projects/graal/


Reply | Threaded
Open this post in threaded view
|

Re: FastR-Flink: a new open source Truffle project

Juan Fumero
Hi Max,
   yes, we started running some benchmarks, but still this is very
preliminary version. Concerning performance what I can tell is we have
very good speedups on shared memory compared to fastR. Concerning
cluster applications we do not have good speedups yet for big R
applications. We are studying/investigating the problem.

General speaking, for each Flink thread, we build the AST which
represents the R code (user function) and then it will be specialized
(partial evaluation). After a while, the R code will be compiled to an
optimize machine code via Truffle+Graal.

In the link you showed are the operations we support for the moment. We
plan to support more operations.

Juan

On Mon, 2015-10-26 at 13:43 +0100, Maximilian Michels wrote:

> Hi Juan,
>
>
> Thanks for sharing. This looks very promising. A great way for people
> who want to use R and Flink without compromising much on performance.
>
>
> I would be curious about some cluster performance tests. Have you run
> any yet?
>
>
> Best,
>
> max
>
> https://bitbucket.org/allr/fastr-flink/src/71cf3f264a1faf98182781c9cdabef1b677a3bc6/README_FASTR_FLINK.md
>
>
> Here are the supported operations from the Wiki:
>
> Operations supported
> FastR operation Description flink.localApply(input, func, scope=c(""))
> Blocking sapply function in R. It runs with local Flink configuration
> flink.remoteApply(input, func, scope=c("")) Blocking remote sapply.
> Remote enviroment flink.map(x, func, scope=c(""), local=FALSE)
> Non-blocking sapply. It allows to use pattern composition. If
> LOCAL=FALSE, it uses remote Flink configuration
> flink.connectToJobManager(ip) It establishes the IP for remote
> execution flink.setParallelism(nThreads) Set the number of
> threads/workers flink.readTextFile(path) Read HDFS file
> flink.writeAsText(path) Write file into HDFS flink.flatMap(x, func,
> scope=c("")) Non blocking flatMap in Flink flink.groupBy(object)
> GroupBy previous operation (object) flink.sum(object) Sum previous
> operation (object) flink.execute() It cosumes the operation pipeline
> defined flink.collect(object) It consumes the pipeline and return the
> R result flink.filter(object, func) Filter previous operation (only
> map supported) according to the function passed as argument. This
> operation is experimental.
>
>
> On Mon, Oct 26, 2015 at 1:09 PM, Juan Fumero
> <[hidden email]> wrote:
>         Hi everyone,
>            we have just published a new open source Truffle project,
>         FastR-Flink. It is available in
>         https://bitbucket.org/allr/fastr-flink
>        
>         FastR is an implementation of the R language on top of Truffle
>         and Graal
>         [3] developed by Purdue University, Johannes Kepler University
>         and
>         Oracle Labs [1].
>        
>         FastR-Flink is fork of FastR project which includes Apache
>         Flink [2]
>         connection, allowing to execute distributed stream and batch
>         data
>         processing applications from R programming language by using
>         FastR
>         +Graal. This is a prototype we have been working on in Oracle
>         Labs
>         during my internship.
>        
>         See README_FASTR_FLINK in
>         https://bitbucket.org/allr/fastr-flink
>         for more information about how to compile and start running
>         FastR+Flink
>         applications. I also included a directory with a few examples.
>        
>         Juan
>        
>         [1] FastR: https://bitbucket.org/allr/fastr
>         [2] Apache Flink: https://flink.apache.org/
>         [3] Graal: http://openjdk.java.net/projects/graal/
>        
>
>