python - How do I apply a value from a dataframe based on the value of a multi-index of another dataframe? -
i have following:
dataframe 1 (multi-index dataframe):
| assay_a | --------------------------------------------------- index_a | index_b | index_c | mean | std | count | --------------------------------------------------- 128 12345 aaa 123 2 4 dataframe 2:
index | col_a | col_b | col_c | mean ------------------------------------- 1 128 12345 aaa 456 where col_x = index_x a,b,c.
i have been spending morning trying following:
how pick correct mean in dataframe 2 (which has match on col abc) can mathematical operations on it. example, want take mean of dataframe 1 , divide correctly chosen mean of dataframe 2.
ideally, want store results of operation in new column. final output should this:
| assay_a | ------------------------------------------------------------ index_a | index_b | index_c | mean | std | count | result | ------------------------------------------------------------ 128 12345 aaa 123 2 4 0.26 perhaps there easier way open such suggestions well.
what suggest 1) rename columns of dataframe 2 respective names of index columns of dataframe 1, 2) reset index on dataframe 1, , 3) merge 2 tables based on matching column names. afterwards can compute whatever like. multiindex on columns of dataframe 2 adds bit of additional overhead.
explicitly:
import pandas pd # re-create table1 row_index = pd.multiindex.from_tuples([(128, 12345, 'aaa')]) row_index.names=['index_a', 'index_b', 'index_c'] table1 = pd.dataframe(data={'mean': 123, 'std': 2, 'count': 4}, index=row_index) table1.columns = pd.multiindex.from_tuples(zip(['assay a'] * 3, table1.columns)) print "*** table 1:" print table1 print "" # re-create table2 table2 = pd.dataframe([{'col_a': 128, 'col_b': 12345, 'col_c': 'aaa', 'mean': 456}], index=[1]) table2.index.name = 'index' print "*** table 2:" print table2 print "" # re-name columns of table2 match names of respective index columns in table1 table2 = table2.rename(columns={'col_a': 'index_a', 'col_b': 'index_b', 'col_c': 'index_c'}) # drop 'assay a' index level on columns of table1; # without doing that, following reset_index() produce column multi-index # index_a/b/c, column names not match simple column index of table2_renamed. # if need keep 'assay a' level here, need construct column # multi-index table2_renamed (with empty values second level). table1.columns = table1.columns.levels[1] # move index columns of table1 regular columns table1 = table1.reset_index() # merge 2 tables on common column names. 'mean' appears in both tables, # give column table2 suffix '_2'. joint = pd.merge(table1.reset_index(), table2, on=['index_a', 'index_b', 'index_c'], suffixes={'', '_2'}) print "*** joint, before re-setting index:" print joint print "" # restore index of joint table joint = joint.set_index(['index_a', 'index_b', 'index_c']) # compute 'result' joint['result'] = joint['mean'] / joint['mean_2'] # drop unused columns joint = joint.drop(['index', 'mean_2'], axis=1) # restore column index level joint.columns = pd.multiindex.from_tuples(zip(['assay a'] * 4, joint.columns)) print "*** final result:" print joint print "" the script output is:
*** table 1: assay count mean std index_a index_b index_c 128 12345 aaa 4 123 2 *** table 2: col_a col_b col_c mean index 1 128 12345 aaa 456 *** joint, before re-setting index: index index_a index_b index_c count mean std mean_2 0 0 128 12345 aaa 4 123 2 456 *** final result: assay count mean std result index_a index_b index_c 128 12345 aaa 4 123 2 0.269737 hope helps!
Comments
Post a Comment