python - Rolling window or occurrences for 2D matrix in Numpy per row? -


looking occurrences of pattern on each row of matrix, found there not clear solution on python big matrix having performance.

i have matrix similar to

matrix = np.array([[0,1,1,0,1,0],                          [0,1,1,0,1,0]]) print 'matrix: ', matrix 

where want check occurreces of patterns [0,0], [0,1] [1,0] , [1,1] on each rowconsidering overlapping. example given, both rows equal,ther result equal each pattern:

  • pattern[0,0] = [0,0]
  • pattern[0,1] = [2,2]
  • pattern[1,0] = [2,2]
  • pattern[1,1] = [1,1]

the matrix in example quite small, looking performance have huge matrix. can test matrix matrix = numpy.random.randint(2, size=(100000,10)) or bigger example see differences

first though on possible answer converting rows strings , looking occurrences based on this answer (string count overlapping occurrences):

def string_occurrences(matrix):     print '\n===== string count overlapping ====='     numrow,numcol = np.shape(matrix)     ocur = np.zeros((numrow,4))     in range(numrow):         strlist = ''.join(map(str,matrix[i,:]))         ocur[i,0] = occurrences(strlist,'00')         ocur[i,1] = occurrences(strlist,'01')         ocur[i,2] = occurrences(strlist,'10')         ocur[i,3] = occurrences(strlist,'11')     return ocur 

using function occurrences of answer

def occurrences(string, sub):     count = start = 0     while true:         start = string.find(sub, start) + 1         if start > 0:             count+=1         else:             return count 

but considering real array huge, solution very slow uses loops, strings,... looking numpy solution used trick compare values pattern , roll matrix on axis=1 check occurrences. call pseudo rolling window on 2d window not square , way of calculation different. there 2 options, second (option 2) faster because avoids calculation of numpy.roll

def pseudo_rolling_window_opt12(matrix):     print '\n===== pseudo_rolling_window ====='     numrow,numcol = np.shape(matrix)     ocur = np.zeros((numrow,4))     index = 0     in np.arange(2):         j in np.arange(2):             #pattern = -9*np.ones(numcol)   # option 1             pattern = -9*np.ones(numcol+1)  # option 2             pattern[0] =             pattern[1] = j             idcol in range(numcol-1):                 #ocur[:,index] += np.sum(np.roll(matrix,-idcol, axis=1) == pattern, axis=1) == 2    # option 1: 219.398691893 seconds (for real matrix)                 ocur[:,index] += np.sum(matrix[:,idcol:] == pattern[:-(idcol+1)], axis=1) == 2      # option 2:  80.929688930 seconds (for real matrix)             index += 1     return ocur 

searching other possibilities, found "rolling window" seemed god answer performance used numpy function. looking this answer (rolling window 1d arrays in numpy?) , links on it, checked following function. really, not understand output seems calculations of window matching expecting result.

def rolling_window(a, size):     shape = a.shape[:-1] + (a.shape[-1] - size + 1, size)     strides = a.strides + (a.strides[-1],)     return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) 

used as:

a = rolling_window(matrix, 2) print == np.array([0,1]) print np.all(rolling_window(matrix, 2) == [0,1], axis=1) 

does know wrong on last case? or possibility better performance?

you using wrong axis of numpy array. should change axis in np.all 1 2. using following code:

a = rolling_window(matrix, 2) print np.all(rolling_window(matrix, 2) == [0,1], axis=2) 

you get:

>>>[[ true false false  true false]     [ true false false  true false]] 

so, in order results looking for:

print np.sum(np.all(rolling_window(matrix, 2) == [0,1], axis=2),axis=1)  >>>[2 2] 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -