python - Transform Pandas DataFrame with n-level hierarchical index into n-D Numpy array -
question
is there way transform dataframe n-level index n-d numpy array (a.k.a n-tensor)?
example
suppose set dataframe like
from pandas import dataframe, multiindex index = range(2), range(3) value = range(2 * 3) frame = dataframe(value, columns=['value'], index=multiindex.from_product(index)).drop((1, 0)) print frame which outputs
value 0 0 0 1 1 2 3 1 1 5 2 6 the index 2-level hierarchical index. can extract 2-d numpy array data using
print frame.unstack().values which outputs
[[ 0. 1. 2.] [ nan 4. 5.]] how generalize n-level index?
playing unstack(), seems can used massage 2-d shape of dataframe, not add axis.
i cannot use e.g. frame.values.reshape(x, y, z), since require frame contains x * y * z rows, cannot guaranteed. tried demonstrate drop()ing row in above example.
any suggestions highly appreciated.
edit. approach more elegant (and 2 orders of magnitude faster) 1 gave below.
# create empty array of nan of right dimensions shape = map(len, frame.index.levels) arr = np.full(shape, np.nan) # fill using numpy's advanced indexing arr[frame.index.labels] = frame.values.flat original solution. given setup similar above, in 3-d,
from pandas import dataframe, multiindex itertools import product index = range(2), range(2), range(2) value = range(2 * 2 * 2) frame = dataframe(value, columns=['value'], index=multiindex.from_product(index)).drop((1, 0, 1)) print(frame) we have
value 0 0 0 0 1 1 1 0 2 1 3 1 0 0 4 1 0 6 1 7 now, proceed using reshape() route, preprocessing ensure length along each dimension consistent.
first, reindex data frame full cartesian product of dimensions. nan values inserted needed. operation can both slow , consume lot of memory, depending on number of dimensions , on size of data frame.
levels = map(tuple, frame.index.levels) index = list(product(*levels)) frame = frame.reindex(index) print(frame) which outputs
value 0 0 0 0 1 1 1 0 2 1 3 1 0 0 4 1 nan 1 0 6 1 7 now, reshape() work intended.
shape = map(len, frame.index.levels) print(frame.values.reshape(shape)) which outputs
[[[ 0. 1.] [ 2. 3.]] [[ 4. nan] [ 6. 7.]]] the (rather ugly) one-liner is
frame.reindex(list(product(*map(tuple, frame.index.levels)))).values\ .reshape(map(len, frame.index.levels))
Comments
Post a Comment