python - What does "(?u)" do in a regex? -


i looked how tokenization implemented in scikit-learn , found regex (source):

token_pattern = r"(?u)\b\w\w+\b" 

the regex pretty straightforward have never seen (?u) part before. can explain me part doing?

it switches on re.u (re.unicode) flag expression.

from module documentation:

(?ilmsux)

(one or more letters set 'i', 'l', 'm', 's', 'u', 'x'.) group matches empty string; letters set corresponding flags: re.i (ignore case), re.l (locale dependent), re.m (multi-line), re.s (dot matches all), re.u (unicode dependent), , re.x (verbose), entire regular expression. (the flags described in module contents.) useful if wish include flags part of regular expression, instead of passing flag argument re.compile() function.


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -