r - Alternative for BAF calculations loop -


i working in loop 'b allele frequency' (baf) calculations on data frame of genetic markers. based on mathematical logic behind baf concept tried write code perform it, however, inefficient.

my input:

theta <- 'probe sample1 sample2 sample3 sample4 sample5   aam   abm   bbm          ax-1   0.674   0.756   0.694   0.671   0.754   0.167   0.281 0.671          ax-2   0.117   0.907   0.501   0.904   0.548   0.116   0.506 0.903          ax-3   0.068   0.075   0.071   0.208   0.038   0.06    0.445 0.846' theta <- read.table(text=theta, header=t) 

my script:

theta.split <- split(theta, 1:nrow(theta))  for(k in 1:length(theta.split)){   thetax <- as.data.frame(theta.split[[k]])   for(i in 2:(ncol(thetax)-3)){      if(as.numeric(as.character(thetax[1,i])) < as.numeric(as.character(thetax$aam))){     thetax[1,i] <- 0}      if(as.numeric(as.character(thetax[1,i])) >= as.numeric(as.character(thetax$aam)) && as.numeric(as.character(thetax[1,i])) < as.numeric(as.character(thetax$abm))){       thex <- as.numeric(as.character(thetax[1,i]))       theaa <- as.numeric(as.character(thetax$aam))       theab <- as.numeric(as.character(thetax$abm))       bafx <- ((0.5)*(thex - theaa))/(theab - theaa)       thetax[1,i] <- bafx}      if(as.numeric(as.character(thetax[1,i])) >= as.numeric(as.character(thetax$abm)) && as.numeric(as.character(thetax[1,i])) < as.numeric(as.character(thetax$bbm))){       thex <- as.numeric(as.character(thetax[1,i]))       theab <- as.numeric(as.character(thetax$abm))       thebb <- as.numeric(as.character(thetax$bbm))       bafx <- 0.5 + ((0.5)*(thex-theab)/(thebb-theab))       thetax[1,i] <- bafx}      if(as.numeric(as.character(thetax[1,i])) >= as.numeric(as.character(thetax$bbm))){       thetax[1,i] <- 1}    }   theta[k,] <- thetax } out <- theta 

my expected output:

out <- 'probe  sample1 sample2  sample3  sample4     sample5     aam       abm   bbm         ax-1    1.000   1.000   1.000   1.000   1.000  0.167  0.281 0.671         ax-2    0.001   1.000   0.493   1.000   0.552  0.116  0.506 0.903         ax-3    0.010   0.019   0.014   0.192   0.000  0.06   0.445 0.846' out <- read.table(text=out, header=t) 

i grateful ideas make code smarter.

you can take advantage of both apply , vectorised calculations avoid loops. following takes on third of time:

library(dplyr)  #take main code in loops out function #using vectorised logical calcs instead of if statements #samplevec vector , thetadf original theta dataframe bafxfn <- function(samplevec, thetadf) {    testaam <- samplevec < thetadf$aam   samplevec <- samplevec * (1 - testaam)    testaamabm <- (samplevec >= thetadf$aam) * (samplevec < thetadf$abm)   bafx <- ((0.5) * (samplevec - thetadf$aam)) / (thetadf$abm - thetadf$aam)   samplevec <- testaamabm * bafx + (1 - testaamabm) * samplevec    testabmbbm <- (samplevec >= thetadf$abm) * (samplevec < thetadf$bbm)   bafx <- 0.5 + ((0.5) * (samplevec - thetadf$abm)) / (thetadf$bbm - thetadf$abm)   samplevec <- testabmbbm * bafx + (1 - testabmbbm) * samplevec    testbbm <- samplevec >= thetadf$bbm   samplevec <- testbbm * 1 + (1 - testbbm) * samplevec    samplevec }  #subset original data frame leave sample columns (using dplyr's select function) sampledf <-    theta %>% select(-probe, -aam, -abm, -bbm)  #use apply loop through columns of remaining data #passing columns in vectors outsampledf <-    sampledf %>%   apply(2, bafxfn, thetadf = theta) %>%   as.data.frame()  #and bind results (using dplyr's bind_cols) outdf <-    bind_cols(     theta %>% select(probe),     outsampledf,     theta %>% select(aam, abm, bbm)   ) 

there may neater way deal of subsetting tried generalise in case you've more 5 sample columns.

outdf source: local data frame [3 x 9]     probe     sample1    sample2    sample3   sample4   sample5   aam   abm   bbm   (fctr)       (dbl)      (dbl)      (dbl)     (dbl)     (dbl) (dbl) (dbl) (dbl) 1   ax-1 1.000000000 1.00000000 1.00000000 1.0000000 1.0000000 0.167 0.281 0.671 2   ax-2 0.001282051 1.00000000 0.49358974 1.0000000 0.5528967 0.116 0.506 0.903 3   ax-3 0.010389610 0.01948052 0.01428571 0.1922078 0.0000000 0.060 0.445 0.846 

Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -