r - Loop through one table to get counts in another table -
in r, have seed table looks this:
seed_table
|========|================| | date | classification | |========|================| | 201501 | | | 201501 | | | 201501 | | | 201502 | b | | 201502 | b | | 201502 | b | | ... | ... |
and data table looks this
data:
|========|================|===========|================| | id | create_date | end_date | classification | |========|================|===========|================| | 1 | 201501 | 201601 | | | 2 | 201501 | 201605 | b | | 3 | 201502 | 201601 | b | | 4 | 201412 | 201501 | | | 5 | 201412 | 201502 | b | | 6 | 201502 | 201503 | | | ... | ... | ... | ... |
i writing following code number of "active observations" each month , classification in seed table. active observation observation created_date <= month
of row in seed table , end_date >= month
of row in seed table:
n <- nrow(seed_table) num_obs <- numeric(n) (row in 1:n) { num_obs[row] <- (sum( data$created_date >= seed_table[row, "date"] & data$end_date <= seed_table[row, "date"] & data$classification == seed_table[row, "classification"])) cat(n - row) }
however code extremely slow. have 2054 rows in seed table (~13 months, 158 classification levels month)
is there way make performant?
as @eric-fail suggested, should use dput()
share data. example:
seed_table <- structure(list( date = c(201501l, 201501l, 201502l), classification = structure( c(1l, 1l, 2l), .label = c("a", "b"), class = "factor")), .names = c("date", "classification"), row.names = c(1l, 2l, 4l), class = "data.frame") data <- structure(list( id = 1:6, create_date = c(201501l, 201501l, 201502l, 201412l, 201412l, 201502l), end_date = c(201601l, 201605l, 201601l, 201501l, 201502l, 201503l), classification = structure(c(1l, 2l, 2l, 1l, 2l, 1l), .label = c("a", "b"), class = "factor")), .names = c("id", "create_date", "end_date", "classification"), class = "data.frame", row.names = c(na, -6l))
i did not speed comparison, getting rid of for()
loop , using outer()
function instead might speed calculations. give try:
m1 <- outer(seed_table$date, data$create_date, ">=") m2 <- outer(seed_table$date, data$end_date, "<=") m3 <- outer(seed_table$classification, data$classification, "==") m <- m1 & m2 & m3 num_obs <- apply(m, 1, sum)
note had errors in code. referred created_date
instead of create_date
, , (i believe) had inequalities (>=
and <=
) reversed.
Comments
Post a Comment