Python 3 Unicode len() function for Tamil characters -
when believe python 3 got right on unicode surprised while faced situation.
>>> amma = "அம்மா" >>> amma 'அம்மா' >>> len(amma) 5
apparently tamil string "அம்மா"
has 3 letters, return value of 5 len("அம்மா")
in no way can accepted or appreciated.
how other dravidian or brahmic scripts solve issue right string length?
edit #1: considering comment of @joey question can rephrased below.
how calculate grapheme length in python?
we know swift or perl6 default
2> let amma = "அம்மா".characters.count amma: distance = 3
it may have 3 letters, has 5 characters:
$ charinfo 'அம்மா' u+0b85 tamil letter [lo] u+0bae tamil letter ma [lo] u+0bcd tamil sign virama [mn] u+0bae tamil letter ma [lo] u+0bbe tamil vowel sign aa [mc]
if need more specific need count number of characters in letter category.
Comments
Post a Comment