Dienstag, 4. Dezember 2012

Barplot in Tibco Style

Tibco has a nice software which produces good looking plots:

tibco.colors <- apply( mcol <- matrix(c(
  0,91,0, 0,157,69, 253,1,97, 60,120,177,
  156,205,36, 244,198,7, 254,130,1,
  96,138,138, 178,113,60
), ncol=3, byrow=TRUE), 1, function(x) rgb(x[1], x[2], x[3], max=255))

x <- c(12,8,6,4,3,1,1)

par(mgp=c(2.2,0.7,0) )     # decrease distance label to axis

b <- barplot( x, col=rev(tibco.colors), main="Tibco style", border=NA, cex.axis=0.7, las=1, ylim=c(0,13), yaxt="n")

text( b, x+0.6, label=x, cex=0.7)

axis(side=1, at=b, labels=c("EX","VIT","EC","CE","LA","CO","PA"), cex.axis=0.7, las=2, col="grey", las=1, tck=0, xaxs="i")
abline(h=0, col="grey")    # just to ensure the axis going through (0,0)

# find the centers of the bars and the gaps
run.mean <- filter( b, filter=c(0.5,0.5))[-length(b)]
gapx <- c(run.mean[1]-diff(b)[1], run.mean, run.mean+diff(b) )
rug(gapx, -0.025, col="grey") 

axis(side=2, at=0:12, labels=0:12, cex.axis=0.7, las=2, col="grey", las=1, tck=-0.025)

legend(x="topright", legend=c("Marking","","CONMEDS","DEMOGRAPHICS","ECG","EXPOSURE","LAB_RESULT")
  , fill=c(tibco.colors[1],NA,tibco.colors[2:10]), cex=0.7, box.col=NA, border=c("black",NA,rep("black",10)))


Samstag, 17. November 2012

dots arguments

How can we parse the dots-arguments in a function?

dots <- function(...){
# was cex in the dots-args? parse dots.arguments
cex <- unlist(match.call(expand.dots=FALSE)$...["cex"])
if(is.null(cex)) print("No cex supplied")
else print(gettextf("cex was %s", cex))
}

dots (cex=0.9, another=4)
dots (another=5)

Donnerstag, 4. Oktober 2012

Get all loaded data.frames

Which data.frames have we loaded?
ls()[ lapply( lapply(ls(), function(x) gettextf("class(%s)", x)), function(x) eval(parse(text=x))) == "data.frame" ]

All available functions in a package:
ls.str(pos = 1, mode = "function")

All installed packages:
search()
(.packages())
(installed.packages())


All available function on search() path:
strsplit( lsf.str(pos = length(search())), " :")

Dienstag, 18. September 2012

Conditional Mosaics: Cotabplot

Fine implementation, thanks to Achim Zeileis.

library(vcd)
data("Punishment", package = "vcd")
pun <- xtabs(Freq ~ memory + attitude + age + education, data = Punishment)
cotabplot(~ memory + attitude | age + education, data = pun, panel = cotab_coindep,
  n = 5000, type = "mosaic", test = "maxchisq", interpolate = 1:2 )

Donnerstag, 13. September 2012

Conversions

How can we split a matrix columnwise to a list?
m <- matrix(rnorm(10 * 2), ncol = 2)

The first solution uses a double conversion via a data.frame:
as.list(as.data.frame(m))

The second uses split:
split(x=t(m), f=1:ncol(m))

The third solution works with lapply:
lapply( seq_len(ncol(m)), function(k) m[,k])

Dienstag, 4. September 2012

Flexible arguments for R functions

How can we remain flexible concerning arguments of an R function used within a user function. Say we want a histogram to be included in our plot and be able to define all of it's arguments.
I found a good solution in some R-Core-code (don't remember exactly where...).
It goes like that:

myFunction <- function( x, main = "", args.hist = NULL, ... ) {

  # define default arguments
  args.hist1 <- list(x = x, xlab = "", ylab = "", freq = FALSE,
      xaxt = "n", xlim = NULL, ylim = NULL, main = main, las = 1,
      col =  "white", cex.axis = 1.2)
  # override default arguments with user defined ones
  if (!is.null(args.hist)) {
    args.hist1[names(args.hist)] <- args.hist
  }
  # call function by means of do.call with the list of arguments
  # we can even filter the arguments by name here...
  res <- do.call("hist", c(args.hist1[names(args.hist1) %in%
      c("x", "breaks", "include.lowest", "right", "nclass")],
      plot = FALSE))
}

The function call would then be:
  myFunction( x=1:10, args.hist = list(right = TRUE) )

So we have a fine control over which arguments we want to be passed (include.lowest, right) and which ones not (ex. col, cex.axis).

Mittwoch, 16. Mai 2012

Find all subsets

How can we get all subsets of a vector's elements?

all.subsets <- function(x, min.n=1, max.n=length(x)){
  # return a list with all subsets of x
  # Caution: This can be heavy for moderate lengths of x11
  lst <- lapply( min.n:max.n, function(i) {
    m <- combn(x,i) # this is a matrix, split into it's columns
    lapply( seq_len(ncol(m)), function(k) m[,k])
  } )
  # Alternative:
  # lst <- lapply(min.n:max.n, function(i) lapply(apply(combn(x,i),2,list),unlist))

  # and now flatten the list of lists into one list
  lst <- split(unlist(lst), rep(1:length(idx <- rapply(lst, length)), idx))
  return(lst)
}

# example:
y <- letters[1:5]
all.subsets(y)

Dienstag, 8. Mai 2012

Run all examples of a package

Get all the base and recommended packages:
installed.packages(priority = c("base","recommended"))

A list of the functions in a package can easily be created by:
ls(pos = "package:MASS")

Run the examples for the first 10 functions of the package MASS:
for( x in paste("example(`", ls(pos="package:MASS")[1:10],"`)", sep="")){
  eval(parse(text=x))
}

Freitag, 23. März 2012

Running mean, median and others

The running mean (moving average) can be calculated in R by means of the function
x <- c(1,2,4,2,3,4,2,3)
filter(x, rep(1/3,3) )

and the running median by
runmed(x,3)

sequential differences by
diff(x)

for other statistics use the library zoo (and there maybe rollapply).
The time series functions can also be useful for similar problems (e.g. deltat, cycle).

package {caTools} has implemented some fast algorithms for similar purposes:
runmean(x), rumin(x), runmax(x), runquantile(x), runmad(x), runsd(x)

Donnerstag, 15. März 2012

regexpr examples (just to remember)

Find names ending in "_id" or in "_c" (this would be the same as x like any ('%_id', '%_c') in SQL flavour):

x <- c("vp_id","man_id","min_A_d","min_B_d","count_n",
       "type_c","birth_d","gender_c","age_n","hist_y")

x[grep("_(id)|c$", x)]
[1] "vp_id" "man_id" "type_c" "gender_c"


Get rid of everything but the digits:
x <- c("485.2.362.q", "222-445", "889 99 8")
gsub(pattern="[[:punct:]]|[[:alpha:]]|[[:blank:]]", replacement="", x)


gsub(pattern="[^0-9]", x=x, replacement="")
gsub(pattern="[^[:digit:]]", x=x, replacement="")


Extract uppercase words from the beginning of a string following the idea "delete everything which is not uppercase words":

x <- c("RONALD AYLMER Fisher", "CHARLES Pearson", "John Tukey")
sapply(x, function(x) StrTrim(sub(
       pattern=sub(pattern="^[A-ZÄÜÖ -]+\\b\\W+\\b", repl="", x=x)
       , repl="", x, fixed=TRUE)))



... and the fine link:
http://www.powerbasic.com/support/help/pbcc/regexpr_statement.htm

Donnerstag, 19. Januar 2012

Swap variables

... just because of it's elegance:
x <- 2; y <- 5
c(x, y)

y <- (x + y) - (x <- y)
c(x, y)