python - What does "(?u)" do in a regex? -
i looked how tokenization implemented in scikit-learn , found regex (source):
token_pattern = r"(?u)\b\w\w+\b"
the regex pretty straightforward have never seen (?u)
part before. can explain me part doing?
it switches on re.u
(re.unicode
) flag expression.
from module documentation:
(?ilmsux)
(one or more letters set
'i'
,'l'
,'m'
,'s'
,'u'
,'x'
.) group matches empty string; letters set corresponding flags:re.i
(ignore case),re.l
(locale dependent),re.m
(multi-line),re.s
(dot matches all),re.u
(unicode dependent), ,re.x
(verbose), entire regular expression. (the flags described in module contents.) useful if wish include flags part of regular expression, instead of passing flag argumentre.compile()
function.
Comments
Post a Comment