regex - What is the equivalence of perluniprops in python? -


in perl, there's perluniprops index of unicode 7, http://perldoc.perl.org/perluniprops.html can following pad opening , closing punctuations:

s/(\p{open_punctuation})/ $1 /g; s/(\p{close_punctuation})/ $1 /g; 

what full list of opening/closing punctuations gets padded when using perl? , equivalence in python?

related question: padding multiple character space - python padding multiple character space - python; question asked separatedly answerer's vote should separate.

are asking how determine what's corresponding closing punctuation given open punctuation? unicode not define this. in fact, there's not 1:1 relationship.

$ unichars '\p{open_punctuation}' | wc -l 75  $ unichars '\p{close_punctuation}' | wc -l 73 

however, should relatively easy build own mapping.

$ unichars '\p{open_punctuation}' | cat  (  u+0028 left parenthesis  [  u+005b left square bracket  {  u+007b left curly bracket  ༺  u+0f3a tibetan mark gug rtags gyon  ༼  u+0f3c tibetan mark ang khang gyon  ᚛  u+169b ogham feather mark  ‚  u+201a single low-9 quotation mark  „  u+201e double low-9 quotation mark  ⁅  u+2045 left square bracket quill  ⁽  u+207d superscript left parenthesis  ₍  u+208d subscript left parenthesis  ⌈  u+2308 left ceiling  ⌊  u+230a left floor  〈 u+2329 left-pointing angle bracket  ❨  u+2768 medium left parenthesis ornament  ❪  u+276a medium flattened left parenthesis ornament  ❬  u+276c medium left-pointing angle bracket ornament  ❮  u+276e heavy left-pointing angle quotation mark ornament  ❰  u+2770 heavy left-pointing angle bracket ornament  ❲  u+2772 light left tortoise shell bracket ornament  ❴  u+2774 medium left curly bracket ornament  ⟅  u+27c5 left s-shaped bag delimiter  ⟦  u+27e6 mathematical left white square bracket  ⟨  u+27e8 mathematical left angle bracket  ⟪  u+27ea mathematical left double angle bracket  ⟬  u+27ec mathematical left white tortoise shell bracket  ⟮  u+27ee mathematical left flattened parenthesis  ⦃  u+2983 left white curly bracket  ⦅  u+2985 left white parenthesis  ⦇  u+2987 z notation left image bracket  ⦉  u+2989 z notation left binding bracket  ⦋  u+298b left square bracket underbar  ⦍  u+298d left square bracket tick in top corner  ⦏  u+298f left square bracket tick in bottom corner  ⦑  u+2991 left angle bracket dot  ⦓  u+2993 left arc less-than bracket  ⦕  u+2995 double left arc greater-than bracket  ⦗  u+2997 left black tortoise shell bracket  ⧘  u+29d8 left wiggly fence  ⧚  u+29da left double wiggly fence  ⧼  u+29fc left-pointing curved angle bracket  ⸢  u+2e22 top left half bracket  ⸤  u+2e24 bottom left half bracket  ⸦  u+2e26 left sideways u bracket  ⸨  u+2e28 left double parenthesis  ⹂  u+2e42 double low-reversed-9 quotation mark  〈 u+3008 left angle bracket  《 u+300a left double angle bracket  「 u+300c left corner bracket  『 u+300e left white corner bracket  【 u+3010 left black lenticular bracket  〔 u+3014 left tortoise shell bracket  〖 u+3016 left white lenticular bracket  〘 u+3018 left white tortoise shell bracket  〚 u+301a left white square bracket  〝 u+301d reversed double prime quotation mark  ﴿  u+fd3f ornate right parenthesis  ︗ u+fe17 presentation form vertical left white lenticular bracket  ︵ u+fe35 presentation form vertical left parenthesis  ︷ u+fe37 presentation form vertical left curly bracket  ︹ u+fe39 presentation form vertical left tortoise shell bracket  ︻ u+fe3b presentation form vertical left black lenticular bracket  ︽ u+fe3d presentation form vertical left double angle bracket  ︿ u+fe3f presentation form vertical left angle bracket  ﹁ u+fe41 presentation form vertical left corner bracket  ﹃ u+fe43 presentation form vertical left white corner bracket  ﹇ u+fe47 presentation form vertical left square bracket  ﹙ u+fe59 small left parenthesis  ﹛ u+fe5b small left curly bracket  ﹝ u+fe5d small left tortoise shell bracket  ( u+ff08 fullwidth left parenthesis  [ u+ff3b fullwidth left square bracket  { u+ff5b fullwidth left curly bracket  ⦅ u+ff5f fullwidth left white parenthesis  「  u+ff62 halfwidth left corner bracket 

$ unichars '\p{close_punctuation}' | cat  )  u+0029 right parenthesis  ]  u+005d right square bracket  }  u+007d right curly bracket  ༻  u+0f3b tibetan mark gug rtags gyas  ༽  u+0f3d tibetan mark ang khang gyas  ᚜  u+169c ogham reversed feather mark  ⁆  u+2046 right square bracket quill  ⁾  u+207e superscript right parenthesis  ₎  u+208e subscript right parenthesis  ⌉  u+2309 right ceiling  ⌋  u+230b right floor  〉 u+232a right-pointing angle bracket  ❩  u+2769 medium right parenthesis ornament  ❫  u+276b medium flattened right parenthesis ornament  ❭  u+276d medium right-pointing angle bracket ornament  ❯  u+276f heavy right-pointing angle quotation mark ornament  ❱  u+2771 heavy right-pointing angle bracket ornament  ❳  u+2773 light right tortoise shell bracket ornament  ❵  u+2775 medium right curly bracket ornament  ⟆  u+27c6 right s-shaped bag delimiter  ⟧  u+27e7 mathematical right white square bracket  ⟩  u+27e9 mathematical right angle bracket  ⟫  u+27eb mathematical right double angle bracket  ⟭  u+27ed mathematical right white tortoise shell bracket  ⟯  u+27ef mathematical right flattened parenthesis  ⦄  u+2984 right white curly bracket  ⦆  u+2986 right white parenthesis  ⦈  u+2988 z notation right image bracket  ⦊  u+298a z notation right binding bracket  ⦌  u+298c right square bracket underbar  ⦎  u+298e right square bracket tick in bottom corner  ⦐  u+2990 right square bracket tick in top corner  ⦒  u+2992 right angle bracket dot  ⦔  u+2994 right arc greater-than bracket  ⦖  u+2996 double right arc less-than bracket  ⦘  u+2998 right black tortoise shell bracket  ⧙  u+29d9 right wiggly fence  ⧛  u+29db right double wiggly fence  ⧽  u+29fd right-pointing curved angle bracket  ⸣  u+2e23 top right half bracket  ⸥  u+2e25 bottom right half bracket  ⸧  u+2e27 right sideways u bracket  ⸩  u+2e29 right double parenthesis  〉 u+3009 right angle bracket  》 u+300b right double angle bracket  」 u+300d right corner bracket  』 u+300f right white corner bracket  】 u+3011 right black lenticular bracket  〕 u+3015 right tortoise shell bracket  〗 u+3017 right white lenticular bracket  〙 u+3019 right white tortoise shell bracket  〛 u+301b right white square bracket  〞 u+301e double prime quotation mark  〟 u+301f low double prime quotation mark  ﴾  u+fd3e ornate left parenthesis  ︘ u+fe18 presentation form vertical right white lenticular bracket  ︶ u+fe36 presentation form vertical right parenthesis  ︸ u+fe38 presentation form vertical right curly bracket  ︺ u+fe3a presentation form vertical right tortoise shell bracket  ︼ u+fe3c presentation form vertical right black lenticular bracket  ︾ u+fe3e presentation form vertical right double angle bracket  ﹀ u+fe40 presentation form vertical right angle bracket  ﹂ u+fe42 presentation form vertical right corner bracket  ﹄ u+fe44 presentation form vertical right white corner bracket  ﹈ u+fe48 presentation form vertical right square bracket  ﹚ u+fe5a small right parenthesis  ﹜ u+fe5c small right curly bracket  ﹞ u+fe5e small right tortoise shell bracket  ) u+ff09 fullwidth right parenthesis  ] u+ff3d fullwidth right square bracket  } u+ff5d fullwidth right curly bracket  ⦆ u+ff60 fullwidth right white parenthesis  」  u+ff63 halfwidth right corner bracket 

after installing unichars cpan unicode::tussle, in python:

>>> import subprocess >>> cmd = "unichars '\p{open_punctuation}' | cut -f2 -d' ' | tr -d '\n'" >>> open_punct = subprocess.check_output(cmd, shell=true).decode('utf8') smartmatch experimental @ /usr/local/bin/unichars line 546. >>> print (open_punct) ([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -