python - What does "(?u)" do in a regex? -


i looked how tokenization implemented in scikit-learn , found regex (source):

token_pattern = r"(?u)\b\w\w+\b" 

the regex pretty straightforward have never seen (?u) part before. can explain me part doing?

it switches on re.u (re.unicode) flag expression.

from module documentation:

(?ilmsux)

(one or more letters set 'i', 'l', 'm', 's', 'u', 'x'.) group matches empty string; letters set corresponding flags: re.i (ignore case), re.l (locale dependent), re.m (multi-line), re.s (dot matches all), re.u (unicode dependent), , re.x (verbose), entire regular expression. (the flags described in module contents.) useful if wish include flags part of regular expression, instead of passing flag argument re.compile() function.


Comments

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -