python - Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array -


question

is there way transform dataframe n-level index n-d numpy array (a.k.a n-tensor)?


example

suppose set dataframe like

from pandas import dataframe, multiindex  index = range(2), range(3) value = range(2 * 3) frame = dataframe(value, columns=['value'],                   index=multiindex.from_product(index)).drop((1, 0)) print frame 

which outputs

     value 0 0      0   1      1   2      3 1 1      5   2      6 

the index 2-level hierarchical index. can extract 2-d numpy array data using

print frame.unstack().values 

which outputs

[[  0.   1.   2.]  [ nan   4.   5.]] 

how generalize n-level index?

playing unstack(), seems can used massage 2-d shape of dataframe, not add axis.

i cannot use e.g. frame.values.reshape(x, y, z), since require frame contains x * y * z rows, cannot guaranteed. tried demonstrate drop()ing row in above example.

any suggestions highly appreciated.

edit. approach more elegant (and 2 orders of magnitude faster) 1 gave below.

# create empty array of nan of right dimensions shape = map(len, frame.index.levels) arr = np.full(shape, np.nan)  # fill using numpy's advanced indexing arr[frame.index.labels] = frame.values.flat 

original solution. given setup similar above, in 3-d,

from pandas import dataframe, multiindex itertools import product  index = range(2), range(2), range(2) value = range(2 * 2 * 2) frame = dataframe(value, columns=['value'],                   index=multiindex.from_product(index)).drop((1, 0, 1)) print(frame) 

we have

       value 0 0 0      0     1      1   1 0      2     1      3 1 0 0      4   1 0      6     1      7 

now, proceed using reshape() route, preprocessing ensure length along each dimension consistent.

first, reindex data frame full cartesian product of dimensions. nan values inserted needed. operation can both slow , consume lot of memory, depending on number of dimensions , on size of data frame.

levels = map(tuple, frame.index.levels) index = list(product(*levels)) frame = frame.reindex(index) print(frame) 

which outputs

       value 0 0 0      0     1      1   1 0      2     1      3 1 0 0      4     1    nan   1 0      6     1      7 

now, reshape() work intended.

shape = map(len, frame.index.levels) print(frame.values.reshape(shape)) 

which outputs

[[[  0.   1.]   [  2.   3.]]   [[  4.  nan]   [  6.   7.]]] 

the (rather ugly) one-liner is

frame.reindex(list(product(*map(tuple, frame.index.levels)))).values\      .reshape(map(len, frame.index.levels)) 

Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -