Freitag, 23. März 2012

Running mean, median and others

The running mean (moving average) can be calculated in R by means of the function
x <- c(1,2,4,2,3,4,2,3)
filter(x, rep(1/3,3) )

and the running median by
runmed(x,3)

sequential differences by
diff(x)

for other statistics use the library zoo (and there maybe rollapply).
The time series functions can also be useful for similar problems (e.g. deltat, cycle).

package {caTools} has implemented some fast algorithms for similar purposes:
runmean(x), rumin(x), runmax(x), runquantile(x), runmad(x), runsd(x)

Donnerstag, 15. März 2012

regexpr examples (just to remember)

Find names ending in "_id" or in "_c" (this would be the same as x like any ('%_id', '%_c') in SQL flavour):

x <- c("vp_id","man_id","min_A_d","min_B_d","count_n",
       "type_c","birth_d","gender_c","age_n","hist_y")

x[grep("_(id)|c$", x)]
[1] "vp_id" "man_id" "type_c" "gender_c"


Get rid of everything but the digits:
x <- c("485.2.362.q", "222-445", "889 99 8")
gsub(pattern="[[:punct:]]|[[:alpha:]]|[[:blank:]]", replacement="", x)


gsub(pattern="[^0-9]", x=x, replacement="")
gsub(pattern="[^[:digit:]]", x=x, replacement="")


Extract uppercase words from the beginning of a string following the idea "delete everything which is not uppercase words":

x <- c("RONALD AYLMER Fisher", "CHARLES Pearson", "John Tukey")
sapply(x, function(x) StrTrim(sub(
       pattern=sub(pattern="^[A-ZÄÜÖ -]+\\b\\W+\\b", repl="", x=x)
       , repl="", x, fixed=TRUE)))



... and the fine link:
http://www.powerbasic.com/support/help/pbcc/regexpr_statement.htm