python - What does "(?u)" do in a regex? -


i looked how tokenization implemented in scikit-learn , found regex (source):

token_pattern = r"(?u)\b\w\w+\b" 

the regex pretty straightforward have never seen (?u) part before. can explain me part doing?

it switches on re.u (re.unicode) flag expression.

from module documentation:

(?ilmsux)

(one or more letters set 'i', 'l', 'm', 's', 'u', 'x'.) group matches empty string; letters set corresponding flags: re.i (ignore case), re.l (locale dependent), re.m (multi-line), re.s (dot matches all), re.u (unicode dependent), , re.x (verbose), entire regular expression. (the flags described in module contents.) useful if wish include flags part of regular expression, instead of passing flag argument re.compile() function.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

javascript - Get parameter of GET request -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -