python - What does "(?u)" do in a regex? -
i looked how tokenization implemented in scikit-learn , found regex (source):
token_pattern = r"(?u)\b\w\w+\b" the regex pretty straightforward have never seen (?u) part before. can explain me part doing?
it switches on re.u (re.unicode) flag expression.
from module documentation:
(?ilmsux)(one or more letters set
'i','l','m','s','u','x'.) group matches empty string; letters set corresponding flags:re.i(ignore case),re.l(locale dependent),re.m(multi-line),re.s(dot matches all),re.u(unicode dependent), ,re.x(verbose), entire regular expression. (the flags described in module contents.) useful if wish include flags part of regular expression, instead of passing flag argumentre.compile()function.
Comments
Post a Comment