python - How to read filenames included into a gz file -
i've tried read gz file:
with open(os.path.join(storage_path,file), "rb") gzipfile: gzip.gzipfile(fileobj=gzipfile) datafile: data = datafile.read()
it works need filenames , size of every file included gz file. code print out content of included file archive.
how can read filenames included gz file?
the python gzip
module not provide access information.
the source code skips on without ever storing it:
if flag & fname: # read , discard null-terminated string containing filename while true: s = self.fileobj.read(1) if not s or s=='\000': break
the filename component optional, not guaranteed present (the commandline gzip -c
decompression option use original filename sans .gz
in case, think). uncompressed filesize not stored in header; can find in last 4 bytes instead.
to read filename header yourself, you'd need recreate file header reading code, , retain filename bytes instead. following function returns that, plus decompressed size:
import struct gzip import fextra, fname def read_gzip_info(gzipfile): gf = gzipfile.fileobj pos = gf.tell() # read archive size gf.seek(-4, 2) size = struct.unpack('<i', gf.read())[0] gf.seek(0) magic = gf.read(2) if magic != '\037\213': raise ioerror('not gzipped file') method, flag, mtime = struct.unpack("<bbixx", gf.read(8)) if not flag & fname: # not stored in header, use filename sans .gz gf.seek(pos) fname = gzipfile.name if fname.endswith('.gz'): fname = fname[:-3] return fname, size if flag & fextra: # read & discard field, if present gf.read(struct.unpack("<h", gf.read(2))) # read null-terminated string containing filename fname = [] while true: s = gf.read(1) if not s or s=='\000': break fname.append(s) gf.seek(pos) return ''.join(fname), size
use above function already-created gzip.gzipfile
object:
filename, size = read_gzip_info(gzipfileobj)
Comments
Post a Comment