python - Asyncio decode utf-8 with StreamReader -
i getting used asyncio , find task handling quite nice, can difficult mix async libraries traditional io libraries. problem facing how decode async streamreader.
the simplest solution read()
chunks of byte strings, , decode each chunk - see code below. (in program, wouldn't print each chunk, decode string , send method processing):
import asyncio import aiohttp async def get_data(port): url = 'http://localhost:{}/'.format(port) r = await aiohttp.get(url) stream = r.content while not stream.at_eof(): data = await stream.read(4) print(data.decode('utf-8'))
this works fine, until there utf-8 character split between chunks. example if response b'm\xc3\xa4dchen mit bi\xc3\x9f\n'
, reading chunks of 3 work, chunks of 4 not (as \xc3
, \x9f
in different chunks , decoding chunk ending \xc3
raise following error:
unicodedecodeerror: 'utf-8' codec can't decode byte 0xc3 in position 3: unexpected end of data
i looked @ proper solutions problem, , @ least in blocking world, seems either io.textiowrapper or codecs.streamreaderwriter (the differences of discussed in pep 0400). however, both of these rely on typical blocking streams.
i spent 30 minutes searching examples asyncio , kept finding decode() solution. know of better solution or missing feature in python's asyncio?
for reference, here results using 2 "standard" decoders async streams.
using codec stream reader:
r = yield aiohttp.get(url) decoder = codecs.getreader('utf-8') stream = decoder(r.content)
exception:
file "echo_client.py", line 13, in get_data data = yield stream.read(4) file "/usr/lib/python3.5/codecs.py", line 497, in read data = self.bytebuffer + newdata typeerror: can't concat bytes generator
(it calls read() directly, rather yield from
or await
it)
i tried wrapping stream io.textiowrapper:
stream = textiowrapper(r.content)
but leads following:
file "echo_client.py", line 10, in get_data stream = textiowrapper(r.content) attributeerror: 'flowcontrolstreamreader' object has no attribute 'readable'
p.s. if want sample test case this, please @ this gist. can run python3.5 reproduce error. if change chunk size 4 3 (or 30), work correctly.
edit
the accepted answer fixed charm. thanks! if else has issue, here simple wrapper class made handle decoding on streamreader:
import codecs class decodingstreamreader: def __init__(self, stream, encoding='utf-8', errors='strict'): self.stream = stream self.decoder = codecs.getincrementaldecoder(encoding)(errors=errors) async def read(self, n=-1): data = await self.stream.read(n) if isinstance(data, (bytes, bytearray)): data = self.decoder.decode(data) return data def at_eof(self): return self.stream.at_eof()
you can use incrementaldecoder:
utf8decoder = codecs.getincrementaldecoder('utf-8')
with example:
decoder = utf8decoder(error='strict') while not stream.at_eof(): data = await stream.read(4) print(decoder.decode(data), end='')
output:
mädchen mit biß
Comments
Post a Comment