Python 3 Unicode len() function for Tamil characters -


when believe python 3 got right on unicode surprised while faced situation.

>>> amma = "அம்மா" >>> amma 'அம்மா' >>> len(amma) 5 

apparently tamil string "அம்மா" has 3 letters, return value of 5 len("அம்மா") in no way can accepted or appreciated.

how other dravidian or brahmic scripts solve issue right string length?

edit #1: considering comment of @joey question can rephrased below.

how calculate grapheme length in python?

we know swift or perl6 default

  2> let amma = "அம்மா".characters.count amma: distance = 3 

it may have 3 letters, has 5 characters:

$ charinfo 'அம்மா' u+0b85 tamil letter [lo] u+0bae tamil letter ma [lo] u+0bcd tamil sign virama [mn] u+0bae tamil letter ma [lo] u+0bbe tamil vowel sign aa [mc] 

if need more specific need count number of characters in letter category.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -