Friday 22 January 2010

Haplotype names in R

Emmanuel Paradis, the mastermind behind 'ape' has struck again. This time he brings us the 'pegas' package, the Population and Evolutionary Genetic Analysis system. This package has a function that collapses the haplotypes (unique DNA sequences) in a DNA alignment, something which is extremely useful in various analyses and in the calculation of genetic diversity.

library(ape)
library(pegas)
data(woodmouse)

x<-woodmouse[sample(15, size=110, replace=TRUE), ]

h<-haplotype(x)

h

attr(h, "labels")

Unfortunately, the haplotypes are rather opaquely numbered by Roman numerals and makes it difficult to figure out where these samples came from. The attribute function above tells you which sequences in x make up which haplotypes in h but it's a bit tedious, particularly when dealing with large data sets. To combat this, I've written a function to label each of the haplotypes with the name given in the original DNAbin object:


haploName<-function(hap, dat){
dat<-as.matrix(dat)
nam<-dimnames(dat)[[1]]
for(i in 1:dim(hap)[1]) attr(hap, "dimnames")[[1]][i]<-nam[attr(hap, "index")[[i]][1]]
hap
}

haploName(h, x)
'hap' is the haplotype/DNAbin object obtained from running haplotype, while 'dat' is the original DNAbin object.

Let me know how it goes...

No comments: