regex - What is the equivalence of perluniprops in python? -
in perl
, there's perluniprops
index of unicode 7, http://perldoc.perl.org/perluniprops.html can following pad opening , closing punctuations:
s/(\p{open_punctuation})/ $1 /g; s/(\p{close_punctuation})/ $1 /g;
what full list of opening/closing punctuations gets padded when using perl? , equivalence in python
?
related question: padding multiple character space - python padding multiple character space - python; question asked separatedly answerer's vote should separate.
are asking how determine what's corresponding closing punctuation given open punctuation? unicode not define this. in fact, there's not 1:1 relationship.
$ unichars '\p{open_punctuation}' | wc -l 75 $ unichars '\p{close_punctuation}' | wc -l 73
however, should relatively easy build own mapping.
$ unichars '\p{open_punctuation}' | cat ( u+0028 left parenthesis [ u+005b left square bracket { u+007b left curly bracket ༺ u+0f3a tibetan mark gug rtags gyon ༼ u+0f3c tibetan mark ang khang gyon ᚛ u+169b ogham feather mark ‚ u+201a single low-9 quotation mark „ u+201e double low-9 quotation mark ⁅ u+2045 left square bracket quill ⁽ u+207d superscript left parenthesis ₍ u+208d subscript left parenthesis ⌈ u+2308 left ceiling ⌊ u+230a left floor 〈 u+2329 left-pointing angle bracket ❨ u+2768 medium left parenthesis ornament ❪ u+276a medium flattened left parenthesis ornament ❬ u+276c medium left-pointing angle bracket ornament ❮ u+276e heavy left-pointing angle quotation mark ornament ❰ u+2770 heavy left-pointing angle bracket ornament ❲ u+2772 light left tortoise shell bracket ornament ❴ u+2774 medium left curly bracket ornament ⟅ u+27c5 left s-shaped bag delimiter ⟦ u+27e6 mathematical left white square bracket ⟨ u+27e8 mathematical left angle bracket ⟪ u+27ea mathematical left double angle bracket ⟬ u+27ec mathematical left white tortoise shell bracket ⟮ u+27ee mathematical left flattened parenthesis ⦃ u+2983 left white curly bracket ⦅ u+2985 left white parenthesis ⦇ u+2987 z notation left image bracket ⦉ u+2989 z notation left binding bracket ⦋ u+298b left square bracket underbar ⦍ u+298d left square bracket tick in top corner ⦏ u+298f left square bracket tick in bottom corner ⦑ u+2991 left angle bracket dot ⦓ u+2993 left arc less-than bracket ⦕ u+2995 double left arc greater-than bracket ⦗ u+2997 left black tortoise shell bracket ⧘ u+29d8 left wiggly fence ⧚ u+29da left double wiggly fence ⧼ u+29fc left-pointing curved angle bracket ⸢ u+2e22 top left half bracket ⸤ u+2e24 bottom left half bracket ⸦ u+2e26 left sideways u bracket ⸨ u+2e28 left double parenthesis ⹂ u+2e42 double low-reversed-9 quotation mark 〈 u+3008 left angle bracket 《 u+300a left double angle bracket 「 u+300c left corner bracket 『 u+300e left white corner bracket 【 u+3010 left black lenticular bracket 〔 u+3014 left tortoise shell bracket 〖 u+3016 left white lenticular bracket 〘 u+3018 left white tortoise shell bracket 〚 u+301a left white square bracket 〝 u+301d reversed double prime quotation mark ﴿ u+fd3f ornate right parenthesis ︗ u+fe17 presentation form vertical left white lenticular bracket ︵ u+fe35 presentation form vertical left parenthesis ︷ u+fe37 presentation form vertical left curly bracket ︹ u+fe39 presentation form vertical left tortoise shell bracket ︻ u+fe3b presentation form vertical left black lenticular bracket ︽ u+fe3d presentation form vertical left double angle bracket ︿ u+fe3f presentation form vertical left angle bracket ﹁ u+fe41 presentation form vertical left corner bracket ﹃ u+fe43 presentation form vertical left white corner bracket ﹇ u+fe47 presentation form vertical left square bracket ﹙ u+fe59 small left parenthesis ﹛ u+fe5b small left curly bracket ﹝ u+fe5d small left tortoise shell bracket ( u+ff08 fullwidth left parenthesis [ u+ff3b fullwidth left square bracket { u+ff5b fullwidth left curly bracket ⦅ u+ff5f fullwidth left white parenthesis 「 u+ff62 halfwidth left corner bracket
$ unichars '\p{close_punctuation}' | cat ) u+0029 right parenthesis ] u+005d right square bracket } u+007d right curly bracket ༻ u+0f3b tibetan mark gug rtags gyas ༽ u+0f3d tibetan mark ang khang gyas ᚜ u+169c ogham reversed feather mark ⁆ u+2046 right square bracket quill ⁾ u+207e superscript right parenthesis ₎ u+208e subscript right parenthesis ⌉ u+2309 right ceiling ⌋ u+230b right floor 〉 u+232a right-pointing angle bracket ❩ u+2769 medium right parenthesis ornament ❫ u+276b medium flattened right parenthesis ornament ❭ u+276d medium right-pointing angle bracket ornament ❯ u+276f heavy right-pointing angle quotation mark ornament ❱ u+2771 heavy right-pointing angle bracket ornament ❳ u+2773 light right tortoise shell bracket ornament ❵ u+2775 medium right curly bracket ornament ⟆ u+27c6 right s-shaped bag delimiter ⟧ u+27e7 mathematical right white square bracket ⟩ u+27e9 mathematical right angle bracket ⟫ u+27eb mathematical right double angle bracket ⟭ u+27ed mathematical right white tortoise shell bracket ⟯ u+27ef mathematical right flattened parenthesis ⦄ u+2984 right white curly bracket ⦆ u+2986 right white parenthesis ⦈ u+2988 z notation right image bracket ⦊ u+298a z notation right binding bracket ⦌ u+298c right square bracket underbar ⦎ u+298e right square bracket tick in bottom corner ⦐ u+2990 right square bracket tick in top corner ⦒ u+2992 right angle bracket dot ⦔ u+2994 right arc greater-than bracket ⦖ u+2996 double right arc less-than bracket ⦘ u+2998 right black tortoise shell bracket ⧙ u+29d9 right wiggly fence ⧛ u+29db right double wiggly fence ⧽ u+29fd right-pointing curved angle bracket ⸣ u+2e23 top right half bracket ⸥ u+2e25 bottom right half bracket ⸧ u+2e27 right sideways u bracket ⸩ u+2e29 right double parenthesis 〉 u+3009 right angle bracket 》 u+300b right double angle bracket 」 u+300d right corner bracket 』 u+300f right white corner bracket 】 u+3011 right black lenticular bracket 〕 u+3015 right tortoise shell bracket 〗 u+3017 right white lenticular bracket 〙 u+3019 right white tortoise shell bracket 〛 u+301b right white square bracket 〞 u+301e double prime quotation mark 〟 u+301f low double prime quotation mark ﴾ u+fd3e ornate left parenthesis ︘ u+fe18 presentation form vertical right white lenticular bracket ︶ u+fe36 presentation form vertical right parenthesis ︸ u+fe38 presentation form vertical right curly bracket ︺ u+fe3a presentation form vertical right tortoise shell bracket ︼ u+fe3c presentation form vertical right black lenticular bracket ︾ u+fe3e presentation form vertical right double angle bracket ﹀ u+fe40 presentation form vertical right angle bracket ﹂ u+fe42 presentation form vertical right corner bracket ﹄ u+fe44 presentation form vertical right white corner bracket ﹈ u+fe48 presentation form vertical right square bracket ﹚ u+fe5a small right parenthesis ﹜ u+fe5c small right curly bracket ﹞ u+fe5e small right tortoise shell bracket ) u+ff09 fullwidth right parenthesis ] u+ff3d fullwidth right square bracket } u+ff5d fullwidth right curly bracket ⦆ u+ff60 fullwidth right white parenthesis 」 u+ff63 halfwidth right corner bracket
after installing unichars
cpan unicode::tussle
, in python:
>>> import subprocess >>> cmd = "unichars '\p{open_punctuation}' | cut -f2 -d' ' | tr -d '\n'" >>> open_punct = subprocess.check_output(cmd, shell=true).decode('utf8') smartmatch experimental @ /usr/local/bin/unichars line 546. >>> print (open_punct) ([{༺༼᚛‚„⁅⁽₍〈❨❪❬❮❰❲❴⟅⟦⟨⟪⟬⟮⦃⦅⦇⦉⦋⦍⦏⦑⦓⦕⦗⧘⧚⧼⸢⸤⸦⸨〈《「『【〔〖〘〚〝﴾︗︵︷︹︻︽︿﹁﹃﹇﹙﹛﹝([{⦅「
Comments
Post a Comment