1 R Java Development

1.1 Call R from Java

1.1.2 Rserve

An R server listens and processes incoming requests of clients. It passes commands to R which were sent by clients.

  • TCP/IP based, with no failure handling (e.g. if the R server fails, you need special Java code to handle it - reestablish a new connection to a new server, load your models, etc.
  • Not thread safe! Each thread needs its own connection
  • Stateful - each connection needs to be initialized before it is used (e.g. if you are predicting using a model, you will need to define/train your model for each connection before using it)

in R:

library(Rserve) # install.packages("Rserve")
Rserve()

in Java:

import org.rosuda.REngine.*;
import org.rosuda.REngine.Rserve.*;
...

try {
  RConnection c = new RConnection();// a local connection on default port (6311)
  double d[] = c.eval("rnorm(10)").asDoubles();
  System.out.println(d);
} catch (REngineException e) {}       

1.1.3 JRI

Java/R Interface

It use uses the JNI and is part of rJava. http://www.rforge.net/JRI/

Install rJava in R install.packages("rJava")

JRI can be found be the R library directory: C:\Program Files\R\R-2.15.0\library\rJava\jri

Run example in the Windows CMD. (i386 for 32bit)

set R_HOME= C:\Program Files\R\R-2.15.0
# The environment variable R_HOME has to be set globally for the system, not only for the current user (restart of the system!) 
set path= %PATH%;C:\Program Files\R\R-2.15.0\bin\i386;C:\Program Files\R\R-2.15.0\library\rJava\jri
cd C:\Program Files\R\R-2.15.0\library\rJava\jri\
java -cp JRI.jar;examples rtest

Development in Eclipse Java project

 build path > add external class folder : C:\Program Files\R\R-2.15.0\library\rJava\jri

In Linux,

# sudo R CMD javareconf

sudo apt-get install r-cran-rjava

export JAVA_HOME=/usr/lib/jvm/java-6-sun
export R_HOME=/usr/lib/R
# export PATH=$PATH:/usr/lib/R/site-library/rJava/jri

cd /usr/lib/R/site-library/rJava/jri
java -cp JRI.jar:examples rtest

# sudo cp /usr/lib/R/site-library/rJava/jri libjri.so  /usr/lib

JRI example:

import org.rosuda.JRI.REXP;
import org.rosuda.JRI.Rengine;
public class RTest {
 public static void main(String[] args) {
  Rengine engine = new Rengine(new String[]{"--no-save"}, false, null);
  engine.assign("a", new int[]{81});
  REXP result = engine.eval("sqrt(a)");
  System.out.println(result.asDouble());
  engine.end();
 }
}

1.1.4 Rsession

multiple R sessions are allowed for Rserve https://github.com/yannrichet/rsession

1.2 Call Java from R

2 Performance optimization

# ------- use compiler
# compilied R code can speed up functions by a factor or 3 or 4 in some cases. not sure if they are turned on by default since R 3.41 or higher
library(compiler)
enableJIT(3) # compile nested functions nested within the function
f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)
compiled_f <- cmpfun(f)

# ------- use fast_strptime
+ require(lubridate)
- data$datetime <- strptime(data$time, "%Y-%m-%dT%H:%M:%OSZ", tz="UTC")
+ data <- data %>% mutate(datetime = fast_strptime(time, "%Y-%m-%dT%H:%M:%OSZ", tz="UTC"))

# -------- filter for subset
df <- data[data$id>6, ]

library(dlpyr)
df <- data %>% filter(id>6)

#---------- vertorization 

http://www.noamross.net/blog/2014/4/16/vectorization-in-r--why.html

# ------ Pre-allocating memory, avoiding side effects.
# bad: 
j <- 1
for (i in 1:10) {
    j[i] = 10
}

# good:
j <- rep(NA, 10)
for (i in 1:10) {
    j[i] = 10
}

performance profile

funAgg = function(x) {
  res = NULL
  n = nrow(x)
  for (i in 1:n) {
    if (!any(is.na(x[i,]))) res = rbind(res, x[i,])
  }
  res
}

xx = matrix(rnorm(10000),10000,20)
xx[xx>2] = NA

# profiling of both time and memory
tmp <- tempfile()
Rprof(tmp, memory.profiling=T)
y <- funAgg(xx)
Rprof(NULL)
summaryRprof(tmp, memory="both")

http://www.noamross.net/blog/2013/4/25/faster-talk.html

3 Pass by values or references

http://stackoverflow.com/questions/2603184/r-pass-by-reference

4 Other

RPig: Concise Programming Framework by Integrating R with Pig for Big Data Analytics

Home Page