python - Rolling window or occurrences for 2D matrix in Numpy per row? -
looking occurrences of pattern on each row of matrix, found there not clear solution on python big matrix having performance.
i have matrix similar to
matrix = np.array([[0,1,1,0,1,0], [0,1,1,0,1,0]]) print 'matrix: ', matrix
where want check occurreces of patterns [0,0], [0,1] [1,0] , [1,1] on each rowconsidering overlapping. example given, both rows equal,ther result equal each pattern:
- pattern[0,0] = [0,0]
- pattern[0,1] = [2,2]
- pattern[1,0] = [2,2]
- pattern[1,1] = [1,1]
the matrix in example quite small, looking performance have huge matrix. can test matrix matrix = numpy.random.randint(2, size=(100000,10))
or bigger example see differences
first though on possible answer converting rows strings , looking occurrences based on this answer (string count overlapping occurrences):
def string_occurrences(matrix): print '\n===== string count overlapping =====' numrow,numcol = np.shape(matrix) ocur = np.zeros((numrow,4)) in range(numrow): strlist = ''.join(map(str,matrix[i,:])) ocur[i,0] = occurrences(strlist,'00') ocur[i,1] = occurrences(strlist,'01') ocur[i,2] = occurrences(strlist,'10') ocur[i,3] = occurrences(strlist,'11') return ocur
using function occurrences
of answer
def occurrences(string, sub): count = start = 0 while true: start = string.find(sub, start) + 1 if start > 0: count+=1 else: return count
but considering real array huge, solution very slow uses loops, strings,... looking numpy solution used trick compare values pattern , roll matrix on axis=1
check occurrences. call pseudo rolling window on 2d window not square , way of calculation different. there 2 options, second (option 2) faster because avoids calculation of numpy.roll
def pseudo_rolling_window_opt12(matrix): print '\n===== pseudo_rolling_window =====' numrow,numcol = np.shape(matrix) ocur = np.zeros((numrow,4)) index = 0 in np.arange(2): j in np.arange(2): #pattern = -9*np.ones(numcol) # option 1 pattern = -9*np.ones(numcol+1) # option 2 pattern[0] = pattern[1] = j idcol in range(numcol-1): #ocur[:,index] += np.sum(np.roll(matrix,-idcol, axis=1) == pattern, axis=1) == 2 # option 1: 219.398691893 seconds (for real matrix) ocur[:,index] += np.sum(matrix[:,idcol:] == pattern[:-(idcol+1)], axis=1) == 2 # option 2: 80.929688930 seconds (for real matrix) index += 1 return ocur
searching other possibilities, found "rolling window" seemed god answer performance used numpy function. looking this answer (rolling window 1d arrays in numpy?) , links on it, checked following function. really, not understand output seems calculations of window matching expecting result.
def rolling_window(a, size): shape = a.shape[:-1] + (a.shape[-1] - size + 1, size) strides = a.strides + (a.strides[-1],) return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
used as:
a = rolling_window(matrix, 2) print == np.array([0,1]) print np.all(rolling_window(matrix, 2) == [0,1], axis=1)
does know wrong on last case? or possibility better performance?
you using wrong axis of numpy array. should change axis in np.all 1 2. using following code:
a = rolling_window(matrix, 2) print np.all(rolling_window(matrix, 2) == [0,1], axis=2)
you get:
>>>[[ true false false true false] [ true false false true false]]
so, in order results looking for:
print np.sum(np.all(rolling_window(matrix, 2) == [0,1], axis=2),axis=1) >>>[2 2]
Comments
Post a Comment