r - Summing the counts in a data frame using sliding window -
i new r. have data frame in r following
df <- data.frame(id=c(rep("a1",10),rep("a2",13),rep("a3",12)), values=c(10,2,4,23,10,5,20,15,13,21,15,9,19,5,14,25,18,19,31,26,4,21,4,6,7,12,15,18,25,20,16,29,21,19,10))
for every id sum counts in column "values" in sliding windows every 3 positions. following data frame excerpt df
includes records corresponding a1
:
id values a1 10 a1 2 a1 4 a1 23 a1 10 a1 5 a1 20 a1 15 a1 13 a1 21
i take 3 entries @ time , sum , move next 3 entries. when sliding windows can't accommodate 3 entries skip values.
for example, window_1
starts first value (10
) while window_2
starts second value (2
) , window_3 starts third value (4
).
window_1 = [10+2+4] + [23+10+5] + [20+15+13] = 102 window_2 = [2+4+23] + [10+5+20] + [15+13+21] = 113 window_3 = [4+23+10] + [5+20+15] = 77
and report in data frame following:
id window_1 window_2 window_3 a1 102 113 77
likewise sum counts in column values
everyid in data frame "df" , report in data.frmae following:
id window_1 window_2 window_3 a1 102 113 77 a2 206 195 161 a3 198 163 175
i tried following code
sum_win_3=0 sum_win_2=0 sum_win_1=0 win_1_counts=0 win_2_counts=0 win_3_counts=0 (i in seq(1,length(df$values),3)) { if((i+i+1+i+2) %% 3 == 0) { win_1_counts=df$values[i]+df$values[i+1]+df$values[i+2] win_1_counts[is.na(win_1_counts)]=0 #print(win_1_counts) } sum_win_1=sum_win_1+win_1_counts } #print(sum_win_1) (j in seq(2,length(df$values),3)) { if((j+j+1+j+2) %% 3 == 0) { win_2_counts=df$values[j]+df$values[j+1]+df$values[j+2] win_2_counts[is.na(win_2_counts)]=0 #print(win_2_counts) } sum_win_2=sum_win_2+win_2_counts } #print(sum_win_2) (k in seq(3,length(df$values),3)) { if((k+k+1+k+2) %% 3 == 0) { win_3_counts=df$values[k]+df$values[k+1]+df$values[k+2] win_3_counts[is.na(win_3_counts)]=0 #print(win_3_counts) } #sum_win_3=sum_win_3+win_3_counts } print(sum_win_3) output=data.frame(id=df[1],window_1=sum_win_1,window_2=sum_win_2,window_3=sum_win_3)
the above code sums counts window_1, windows_2 , window_3 taking ids rather working on every id separately.
kindly guide me in getting the output in desired format stated above. in advance
using data.table package, approach follows:
library(data.table) setdt(df)[, .(w1 = sum(values[1:(3*(.n%/%3))]), w2 = sum(values[2:(3*((.n-1)%/%3)+1)]), w3 = sum(values[3:(3*((.n-2)%/%3)+2)])), = id]
which gives:
id w1 w2 w3 1: a1 102 113 77 2: a2 206 195 161 3: a3 198 163 175
or avoid repetition (thanx @cath):
setdt(df)[, lapply(1:3, function(i) {sum(values[i:(3*((.n-i+1)%/%3)+(i-1))])}), = id]
which gives:
id v1 v2 v3 1: a1 102 113 77 2: a2 206 195 161 3: a3 198 163 175
if want rename v1, v2 & v3 variables, can afterwards, can do:
cols <- c("w1","w2","w3") setdt(df)[, (cols) := lapply(1:3, function(i) {sum(values[i:(3*((.n-i+1)%/%3)+(i-1))])}), = id]
Comments
Post a Comment