Freitag, 6. Dezember 2013

Wordcloud

This code yields interesting insights:

library(wordcloud)
txt <- c("hrcds_b","amb_spit_nvl_b","hpp_fg","spez_kons_n","treu_fg",
         "qz_n","notfall_n","stat_pv_bl_b","vp_at_p","hosp_n","weibl_p")
n <- c(0.247,0.167,0.084,0.04,
       -0.009,-0.012,-0.062,-0.14,-0.143,-0.229,-0.363)

txt <- Sort(data.frame(word=txt, freq=abs(n*1000),
                       col=PalHelsana()[c("rot", "hellblau")][(n>0)*1+1],
                       stringsAsFactors=FALSE),2, decr=TRUE)

wordcloud(txt$word, txt$freq, colors=txt$col, ordered.colors=TRUE)


Montag, 4. November 2013

How to operate in parallel

Doing parallel calculations will spare you much time... There are three points deserving our attention. %dopar% does parallel calculation, the combining function after having the work done can be defined, either c or rbind or whatever and - last but not least - the packages used within the loop must be defined in the specific parameter.

library(doParallel)
cl <- makeCluster(3)   # the number of cores to be used
registerDoParallel(cl)
getDoParWorkers()      # are they ready?


# remind defining packages if they're used within the loop
res <- foreach(i=1:5, .combine=c, .packages="DescTools") %dopar% {
  Primes(i)

res

stopCluster(cl)       # release your slaves again


Boot already has native support for parallel working:

library(boot)
slopeFun <- function(df, i) {
  #df must be a data frame. 

  #i is the vector of row indices that boot will pass 
  xResamp <- df[i, ]
  slope <- lm(hp ~ cyl + disp, data=xResamp)$coef[2]
}


ptime <- system.time({
  b <- boot(mtcars, slopeFun, R=50000, ncpus=6, parallel="snow")
})[3]

ptime

Dienstag, 27. August 2013

Tables in R

We have quite a few table functions in R, all of them solving a specific problem.

library(DescTools)
# Let's define a 3-dim table
# source: http://www.math.wpi.edu/saspdf/stat/chap28.pdf, page 1248
school <- as.table(array(c(35,29, 14,27, 32,10, 53,23), dim=c(2,2,2),
                   dimnames=list(enrollment=c("yes","no"),
                                 internship=c("yes","no"), gender=c("boys","girls"))))
school

# the percentages
prop.table(school)

# printed as flat table
ftable(school, row.vars=c(3,2))

# aggregate and layout
xtabs(Freq ~ enrollment + gender, data=school)

# and the inverse Operation
tab <- xtabs(Freq ~ enrollment + gender, data=school)
d.frm <- as.data.frame.table(tab)

# factor form with frequencies
data.frame(school)

# get the raw data back in a data.frame
head(Untable(school))

# describe with bells and whistles
Desc(xtabs(Freq ~ internship + enrollment, data=school) )
Assocs(xtabs(Freq ~ internship + enrollment, data=school) )

# how to make a data.frame out of a table
tab <- table(d.pizza$driver, d.pizza$area)
d.frm <- data.frame(unclass(tab))


# or even simpler with base function:
d.frm <- as.data.frame.matrix(tab)

# combine several arrays:
abs <- apply(Titanic, 1:length(dim(Titanic)), formatC, width=3)
rel <- apply(prop.table(Titanic), 1:length(dim(Titanic)), FormatFix, after=3)

ftable(abind(abs, rel))

Donnerstag, 13. Juni 2013

Plot intersecting areas

Ever had to plot intersecting areas?
From now on I would use rgeos to do similar things. The following does the trick:

library(rgeos)
library(DescTools)


Canvas(, xpd=TRUE)
p1 <- as(DrawCircle(0.5)[[1]], "gpc.poly")
p2 <- as(DrawCircle(-0.5,0.5)[[1]], "gpc.poly")
p3 <- as(DrawCircle(0,-0.5)[[1]], "gpc.poly")

plot(append.poly(append.poly(p1, p2), p3), axes=FALSE, frame.plot=FALSE, xlab="", ylab="")
plot(intersect(p1, p2), poly.args = list(col = "red"), add = TRUE)
plot(intersect(intersect(p1, p2), p3), poly.args = list(col = "blue"), add = TRUE)



Samstag, 20. April 2013

Second axis

Just a short snippet for a plot with 2 axes for demonstrating the technique.


b <- barplot(1:10, ylim=c(0,20))

par(new=TRUE)
plot(x=b, y=rep(60, 10), xlim=par("usr")[1:2], xaxs="i", yaxs="i"
     , frame.plot=FALSE, axes=FALSE, xlab="", ylab="", type="b", pch=15
     , ylim=c(0,80))

axis(side=4)



And the result: