r - How to do a data.table rolling join? -
i have 2 data tables i'm trying merge. 1 data on company market values through time , other company dividend history through time. i'm trying find out how each company has paid each quarter , put value next market value data through time.
library(magrittr) library(data.table) library(zoo) library(lubridate) set.seed(1337) # data table of company market values companies <- data.table(companyid = 1:10, sedol = rep(c("91772e", "7a662b"), each = 5), date = (as.date("2005-04-01") + months(seq(0, 12, 3))) - days(1), mktcap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5)))) %>% setkey(sedol, date) # data table of dividends dividends <- data.table(divid = 1:7, sedol = c(rep('91772e', each = 4), rep('7a662b', each = 3)), date = as.date(c('2004-11-19', '2005-01-13', '2005-01-29', '2005-10-01', '2005-06-29', '2005-06-30', '2006-04-17')), divamnt = rnorm(7, .8, .3)) %>% setkey(sedol, date)
i believe situation use data.table rolling join, like:
dividends[companies, roll = "nearest"]
to try , dataset looks
divid sedol date divamnt companyid mktcap 1: na 7a662b <na> na 6 61.21061 2: 5 7a662b 2005-06-29 0.7772631 7 66.92951 3: 6 7a662b 2005-06-30 1.1815343 7 66.92951 4: na 7a662b <na> na 8 78.33914 5: na 7a662b <na> na 9 88.92473 6: na 7a662b <na> na 10 87.85067 7: 2 91772e 2005-01-13 0.2964291 1 105.19249 8: 3 91772e 2005-01-29 0.8472649 1 105.19249 9: na 91772e <na> na 2 108.74579 10: 4 91772e 2005-10-01 1.2467408 3 113.42261 11: na 91772e <na> na 4 120.04491 12: na 91772e <na> na 5 124.35588
(note i've matched dividends company market values exact quarter)
but i'm not sure how execute it. cran pdf rather vague number or should if roll
value (can pass dates? number quantify days forward carry? number of obersvations?) , changing rollends
around doesn't seem me want.
in end, ended mapping dividend dates quarter end , joining on that. solution, not useful if end needing know how perform rolling joins. in answer, describe situation rolling joins solution me understand how perform them?
instead of rolling join, may want use overlap join foverlaps
function of data.table:
# create interval in 'companies' datatable companies[, `:=` (start = compdate - days(90), end = compdate + days(15))] # create second date in 'dividends' datatable dividends[, date2 := divdate] # set keys 2 datatable setkey(companies, sedol, start, end) setkey(dividends, sedol, ddate, date2) # create vector of columnnames can removed afterwards deletecols <- c("date2","start","end") # perform overlap join , remove helper columns res <- foverlaps(companies, dividends)[, (deletecols) := null]
the result:
> res sedol divid divdate divamnt companyid compdate mktcap 1: 7a662b na <na> na 6 2005-03-31 61.21061 2: 7a662b 5 2005-06-29 0.7772631 7 2005-06-30 66.92951 3: 7a662b 6 2005-06-30 1.1815343 7 2005-06-30 66.92951 4: 7a662b na <na> na 8 2005-09-30 78.33914 5: 7a662b na <na> na 9 2005-12-31 88.92473 6: 7a662b na <na> na 10 2006-03-31 87.85067 7: 91772e 2 2005-01-13 0.2964291 1 2005-03-31 105.19249 8: 91772e 3 2005-01-29 0.8472649 1 2005-03-31 105.19249 9: 91772e na <na> na 2 2005-06-30 108.74579 10: 91772e 4 2005-10-01 1.2467408 3 2005-09-30 113.42261 11: 91772e na <na> na 4 2005-12-31 120.04491 12: 91772e na <na> na 5 2006-03-31 124.35588
used data (the same in question, without creation of keys):
set.seed(1337) companies <- data.table(companyid = 1:10, sedol = rep(c("91772e", "7a662b"), each = 5), compdate = (as.date("2005-04-01") + months(seq(0, 12, 3))) - days(1), mktcap = c(100 + cumsum(rnorm(5,5)), 50 + cumsum(rnorm(5,1,5)))) dividends <- data.table(divid = 1:7, sedol = c(rep('91772e', each = 4), rep('7a662b', each = 3)), divdate = as.date(c('2004-11-19','2005-01-13','2005-01-29','2005-10-01','2005-06-29','2005-06-30','2006-04-17')), divamnt = rnorm(7, .8, .3))
Comments
Post a Comment