Montag, 5. Oktober 2009

Dummy-Kodierung

Gesucht ist die Dummy-Kodierung eines Faktors:

d.frm <- data.frame(
    name=c("Max","Max","Max","Max","Max","Moritz","Moritz","Moritz")
  , typ=c("rot","blau","grün","blau","grün","rot","rot","blau")
  , anz=c(5,4,5,8,3,2,9,1)
)

d.frm
    name  typ anz
1    Max  rot   5
2    Max blau   4
3    Max grün   5
4    Max blau   8
5    Max grün   3
6 Moritz  rot   2
7 Moritz  rot   9
8 Moritz blau   1

Gewünschtes liefert

model.matrix(~d.frm$typ)[,-1]
  d.frm$typgrün d.frm$typrot
1             0            1
2             0            0
3             1            0
4             0            0
5             1            0
6             0            1
7             0            1
8             0            0

oder alternativ die Funktion class.ind aus der library(nnet):

library(nnet)
class.ind( df$typ )

blau grün rot
[1,] 0 0 1
[2,] 1 0 0
[3,] 0 1 0
[4,] 1 0 0
[5,] 0 1 0
[6,] 0 0 1
[7,] 0 0 1
[8,] 1 0 0

Still another brilliant Ripley solution:

ff <- factor(sample(letters[1:5], 25, replace=TRUE))
diag(nlevels(ff))[ff,]