r - Summing the counts in a data frame using sliding window -


i new r. have data frame in r following

df <- data.frame(id=c(rep("a1",10),rep("a2",13),rep("a3",12)),                  values=c(10,2,4,23,10,5,20,15,13,21,15,9,19,5,14,25,18,19,31,26,4,21,4,6,7,12,15,18,25,20,16,29,21,19,10)) 

for every id sum counts in column "values" in sliding windows every 3 positions. following data frame excerpt df includes records corresponding a1:

id    values a1     10 a1      2 a1      4 a1     23 a1     10 a1      5 a1     20 a1     15 a1     13 a1     21 

i take 3 entries @ time , sum , move next 3 entries. when sliding windows can't accommodate 3 entries skip values.

for example, window_1 starts first value (10) while window_2 starts second value (2) , window_3 starts third value (4).

 window_1 = [10+2+4] + [23+10+5] + [20+15+13] = 102   window_2 = [2+4+23] + [10+5+20] + [15+13+21] = 113  window_3 = [4+23+10] + [5+20+15] = 77 

and report in data frame following:

id  window_1 window_2 window_3 a1   102       113      77 

likewise sum counts in column values everyid in data frame "df" , report in data.frmae following:

id    window_1   window_2   window_3 a1      102       113         77 a2      206       195         161 a3      198       163         175 

i tried following code

sum_win_3=0 sum_win_2=0 sum_win_1=0 win_1_counts=0 win_2_counts=0 win_3_counts=0  (i in seq(1,length(df$values),3)) {    if((i+i+1+i+2) %% 3 == 0)   {     win_1_counts=df$values[i]+df$values[i+1]+df$values[i+2]     win_1_counts[is.na(win_1_counts)]=0     #print(win_1_counts)   }   sum_win_1=sum_win_1+win_1_counts } #print(sum_win_1)  (j in seq(2,length(df$values),3)) {   if((j+j+1+j+2) %% 3 == 0)   {     win_2_counts=df$values[j]+df$values[j+1]+df$values[j+2]     win_2_counts[is.na(win_2_counts)]=0     #print(win_2_counts)   }   sum_win_2=sum_win_2+win_2_counts } #print(sum_win_2)  (k in seq(3,length(df$values),3)) {   if((k+k+1+k+2) %% 3 == 0)   {     win_3_counts=df$values[k]+df$values[k+1]+df$values[k+2]     win_3_counts[is.na(win_3_counts)]=0     #print(win_3_counts)   }   #sum_win_3=sum_win_3+win_3_counts } print(sum_win_3) output=data.frame(id=df[1],window_1=sum_win_1,window_2=sum_win_2,window_3=sum_win_3) 

the above code sums counts window_1, windows_2 , window_3 taking ids rather working on every id separately.
kindly guide me in getting the output in desired format stated above. in advance

using data.table package, approach follows:

library(data.table) setdt(df)[, .(w1 = sum(values[1:(3*(.n%/%3))]),               w2 = sum(values[2:(3*((.n-1)%/%3)+1)]),               w3 = sum(values[3:(3*((.n-2)%/%3)+2)])), = id] 

which gives:

   id  w1  w2  w3 1: a1 102 113  77 2: a2 206 195 161 3: a3 198 163 175 

or avoid repetition (thanx @cath):

setdt(df)[, lapply(1:3, function(i) {sum(values[i:(3*((.n-i+1)%/%3)+(i-1))])}), = id] 

which gives:

   id  v1  v2  v3 1: a1 102 113  77 2: a2 206 195 161 3: a3 198 163 175 

if want rename v1, v2 & v3 variables, can afterwards, can do:

cols <- c("w1","w2","w3") setdt(df)[, (cols) := lapply(1:3, function(i) {sum(values[i:(3*((.n-i+1)%/%3)+(i-1))])}), = id] 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -