python - Asyncio decode utf-8 with StreamReader -

September 15, 2015

i getting used asyncio , find task handling quite nice, can difficult mix async libraries traditional io libraries. problem facing how decode async streamreader.

the simplest solution read() chunks of byte strings, , decode each chunk - see code below. (in program, wouldn't print each chunk, decode string , send method processing):

import asyncio import aiohttp  async def get_data(port):     url = 'http://localhost:{}/'.format(port)     r = await aiohttp.get(url)     stream = r.content     while not stream.at_eof():         data = await stream.read(4)         print(data.decode('utf-8'))

this works fine, until there utf-8 character split between chunks. example if response b'm\xc3\xa4dchen mit bi\xc3\x9f\n', reading chunks of 3 work, chunks of 4 not (as \xc3 , \x9f in different chunks , decoding chunk ending \xc3 raise following error:

unicodedecodeerror: 'utf-8' codec can't decode byte 0xc3 in position 3: unexpected end of data

i looked @ proper solutions problem, , @ least in blocking world, seems either io.textiowrapper or codecs.streamreaderwriter (the differences of discussed in pep 0400). however, both of these rely on typical blocking streams.

i spent 30 minutes searching examples asyncio , kept finding decode() solution. know of better solution or missing feature in python's asyncio?

for reference, here results using 2 "standard" decoders async streams.

using codec stream reader:

r = yield aiohttp.get(url) decoder = codecs.getreader('utf-8') stream = decoder(r.content)

exception:

file "echo_client.py", line 13, in get_data   data = yield stream.read(4) file "/usr/lib/python3.5/codecs.py", line 497, in read   data = self.bytebuffer + newdata typeerror: can't concat bytes generator

(it calls read() directly, rather yield from or await it)

i tried wrapping stream io.textiowrapper:

stream = textiowrapper(r.content)

but leads following:

file "echo_client.py", line 10, in get_data   stream = textiowrapper(r.content) attributeerror: 'flowcontrolstreamreader' object has no attribute 'readable'

p.s. if want sample test case this, please @ this gist. can run python3.5 reproduce error. if change chunk size 4 3 (or 30), work correctly.

edit

the accepted answer fixed charm. thanks! if else has issue, here simple wrapper class made handle decoding on streamreader:

import codecs  class decodingstreamreader:     def __init__(self, stream, encoding='utf-8', errors='strict'):         self.stream = stream         self.decoder = codecs.getincrementaldecoder(encoding)(errors=errors)      async def read(self, n=-1):         data = await self.stream.read(n)         if isinstance(data, (bytes, bytearray)):             data = self.decoder.decode(data)         return data      def at_eof(self):         return self.stream.at_eof()

you can use incrementaldecoder:

utf8decoder = codecs.getincrementaldecoder('utf-8')

with example:

decoder = utf8decoder(error='strict') while not stream.at_eof():     data = await stream.read(4)     print(decoder.decode(data), end='')

output:

mädchen mit biß

Search This Blog

Live one

python - Asyncio decode utf-8 with StreamReader -

Comments

Post a Comment

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

php - XML feed for Wordpress Social Board plugin modifications -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -