python - How do I apply a value from a dataframe based on the value of a multi-index of another dataframe? -

January 15, 2010

i have following:

dataframe 1 (multi-index dataframe):

                             |      assay_a       | --------------------------------------------------- index_a | index_b | index_c  | mean | std | count | ---------------------------------------------------    128     12345      aaa      123    2     4

dataframe 2:

    index | col_a | col_b | col_c | mean     -------------------------------------       1      128    12345   aaa     456

where col_x = index_x a,b,c.

i have been spending morning trying following:

how pick correct mean in dataframe 2 (which has match on col abc) can mathematical operations on it. example, want take mean of dataframe 1 , divide correctly chosen mean of dataframe 2.

ideally, want store results of operation in new column. final output should this:

                             |            assay_a          | ------------------------------------------------------------ index_a | index_b | index_c  | mean | std | count | result | ------------------------------------------------------------    128     12345      aaa      123    2     4       0.26

perhaps there easier way open such suggestions well.

what suggest 1) rename columns of dataframe 2 respective names of index columns of dataframe 1, 2) reset index on dataframe 1, , 3) merge 2 tables based on matching column names. afterwards can compute whatever like. multiindex on columns of dataframe 2 adds bit of additional overhead.

explicitly:

import pandas pd  # re-create table1 row_index = pd.multiindex.from_tuples([(128, 12345, 'aaa')]) row_index.names=['index_a', 'index_b', 'index_c'] table1 = pd.dataframe(data={'mean': 123, 'std': 2, 'count': 4}, index=row_index) table1.columns = pd.multiindex.from_tuples(zip(['assay a'] * 3, table1.columns)) print "*** table 1:" print table1 print ""  # re-create table2 table2 = pd.dataframe([{'col_a': 128, 'col_b': 12345, 'col_c': 'aaa', 'mean': 456}], index=[1]) table2.index.name = 'index' print "*** table 2:" print table2 print ""  # re-name columns of table2 match names of respective index columns in table1 table2 = table2.rename(columns={'col_a': 'index_a', 'col_b': 'index_b', 'col_c': 'index_c'})  # drop 'assay a' index level on columns of table1; # without doing that, following reset_index() produce column multi-index # index_a/b/c, column names not match simple column index of table2_renamed. # if need keep 'assay a' level here, need construct column # multi-index table2_renamed (with empty values second level). table1.columns = table1.columns.levels[1]  # move index columns of table1 regular columns table1 = table1.reset_index()  # merge 2 tables on common column names. 'mean' appears in both tables, # give column table2 suffix '_2'. joint = pd.merge(table1.reset_index(), table2, on=['index_a', 'index_b', 'index_c'], suffixes={'', '_2'}) print "*** joint, before re-setting index:" print joint print ""  # restore index of joint table joint = joint.set_index(['index_a', 'index_b', 'index_c'])  # compute 'result' joint['result'] = joint['mean'] / joint['mean_2']  # drop unused columns joint = joint.drop(['index', 'mean_2'], axis=1)  # restore column index level joint.columns = pd.multiindex.from_tuples(zip(['assay a'] * 4, joint.columns))  print "*** final result:" print joint print ""

the script output is:

*** table 1:                         assay                                    count mean std index_a index_b index_c                  128     12345   aaa           4  123   2  *** table 2:        col_a  col_b col_c  mean index                           1        128  12345   aaa   456  *** joint, before re-setting index:    index  index_a  index_b index_c  count  mean  std  mean_2 0      0      128    12345     aaa      4   123    2     456  *** final result:                         assay                                              count mean std    result index_a index_b index_c                            128     12345   aaa           4  123   2  0.269737

hope helps!

Search This Blog

Live one

python - How do I apply a value from a dataframe based on the value of a multi-index of another dataframe? -

Comments

Post a Comment

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

python - GitPython: check if git is available -

c# - ASP.NET MVC Enumerable to delimited HTML in Razor -