numpy - Python load large number of files -
i'm trying load large number of files saved in ensight gold format numpy array. in order conduct read i've written own class libvec reads geometry file , preallocates arrays python use save data shown in code below.
n = len(file_list) # create class object , read geometry file gvec = vec.libvec(os.path.join(current_dir,casefile)) x,y,z = gvec.xyz() # preallocate arrays u_temp = np.zeros((len(y),len(x),n),dtype=np.dtype('f4')) v_temp = np.zeros((len(y),len(x),n),dtype=np.dtype('f4')) u_temp = np.zeros((len(x),len(x),n),dtype=np.dtype('f4')) v_temp = np.zeros((len(x),len(y),n),dtype=np.dtype('f4')) # read individual files allocated arrays idx,current_file in enumerate(file_list): u,v =gvec.readvec(os.path.join(current_dir,current_file)) u_temp[:,:,idx] = u v_temp[:,:,idx] = v del u,v
however takes seemingly forever wondering if have idea how speed process? code reading individual files array structure can seen below:
def readvec(self,filename): # supposing moment naming scheme piv__vxy.case piv__vxy.geo not changes should # not case appropriate changes have made corresponding file data_temp = np.loadtxt(filename, dtype=np.dtype('f4'), delimiter=none, converters=none, skiprows=4) # u value in range(len(self.__y)): # x value counter j in range(len(self.__x)): # y value counter self.__u[i,j]=data_temp[i*len(self.__x)+j] # v value in range(len(self.__y)): # x value counter j in range(len(self.__x)): # y value counter self.__v[i,j]=data_temp[len(self.__x)*len(self.__y)+i*len(self.__x)+j] # w value if len(self.__z)>1: in range(len(self.__y)): # x value counter j in range(len(self.__xd)): # y value counter self.__w[i,j]=data_temp[2*len(self.__x)*len(self.__y)+i*len(self.__x)+j] return self.__u,self.__v,self.__w else: return self.__u,self.__v
thanks lot in advance , best regards,
j
it'a bit hard without test input\output compare against. think give same u\v
arrays nested loops in readvec
. method should considerably faster loops.
u = data[:size_x*size_y].reshape(size_x, size_y) v = data[size_x*size_y:].reshape(size_x, size_y)
returning these directly u_temp
, v_temp
should help. right you're doing 3(?) copies of data them u_temp
, v_temp
- from file temp_data
- from temp_data self.__u\v
- from u\v u\v_temp
although guess 2 nested loop, , accessing 1 element @ time causing slowness
Comments
Post a Comment