r - Loop through one table to get counts in another table -


in r, have seed table looks this:

seed_table

|========|================| | date   | classification | |========|================| | 201501 |              | | 201501 |              | | 201501 |              | | 201502 | b              | | 201502 | b              | | 201502 | b              | | ...    | ...            | 

and data table looks this

data:

|========|================|===========|================| | id     | create_date    | end_date  | classification | |========|================|===========|================| | 1      | 201501         | 201601    |              | | 2      | 201501         | 201605    | b              | | 3      | 201502         | 201601    | b              | | 4      | 201412         | 201501    |              | | 5      | 201412         | 201502    | b              | | 6      | 201502         | 201503    |              | | ...    | ...            | ...       | ...            | 

i writing following code number of "active observations" each month , classification in seed table. active observation observation created_date <= month of row in seed table , end_date >= month of row in seed table:

n <- nrow(seed_table) num_obs <- numeric(n) (row in 1:n) {     num_obs[row] <- (sum(         data$created_date >= seed_table[row, "date"] &             data$end_date <= seed_table[row, "date"] &             data$classification == seed_table[row, "classification"]))     cat(n - row) }   

however code extremely slow. have 2054 rows in seed table (~13 months, 158 classification levels month)

is there way make performant?

as @eric-fail suggested, should use dput() share data. example:

seed_table <- structure(list(   date = c(201501l, 201501l, 201502l),    classification = structure(     c(1l, 1l, 2l), .label = c("a", "b"), class = "factor")),    .names = c("date", "classification"),    row.names = c(1l, 2l, 4l), class = "data.frame") data <- structure(list(   id = 1:6,    create_date = c(201501l, 201501l, 201502l, 201412l, 201412l, 201502l),    end_date = c(201601l, 201605l, 201601l, 201501l, 201502l, 201503l),    classification = structure(c(1l, 2l, 2l, 1l, 2l, 1l),      .label = c("a", "b"), class = "factor")),    .names = c("id", "create_date", "end_date", "classification"),    class = "data.frame", row.names = c(na, -6l)) 

i did not speed comparison, getting rid of for() loop , using outer() function instead might speed calculations. give try:

m1 <- outer(seed_table$date, data$create_date, ">=") m2 <- outer(seed_table$date, data$end_date, "<=") m3 <- outer(seed_table$classification, data$classification, "==") m <- m1 & m2 & m3 num_obs <- apply(m, 1, sum) 

note had errors in code. referred created_date instead of create_date, , (i believe) had inequalities (>=and <=) reversed.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -