regex - Python regexp to match numbers with spaces -


i'm regexp newbie , need help. have text represents table. know consist of 7 columns:

| int | int mixed -. | unicode string | float | float | int | float | 

problem float expressed using spaces between thousands (1234,43 => 1 234,43). string can contain spaces , end numbers. tried (for each line striped of new line char):

regex = re.compile(r"(\d+) ([\d.-]+) (.*) ([\d+ ]?\d+,\d+) ([\d+ ]?\d+,\d+) (\d+) ([\d+ ]?\d+,\d+)$", re.unicode) w = regex.findall(line) 

unfortunately doesn't work in cases. test data:

49 602 dskod smcx 262,59 1 131,30 1 1 131,30 49 602 dskod smcx 3 5 262,59 1 131,30 1 1 131,30 50 61-201 łóćźż 1 2 669,50 334,75 1 334,75 51 1-214 aÓŻĆÓds" 70,35 350,18 3 105,53 

cases thousands problematic i'm getting:

[] [] [(u'50', u'61-201', u'\u0142\xf3\u0107\u017a\u017c 1 2', u'669,50', u'334,75', u'1', u'334,75')] [(u'51', u'1-214', u'a\xd3\u017b\u0106\xd3ds"', u'70,35', u'350,18', u'3', u'105,53')] 

in 3rd example 2 @ end of string in next column. know have clues how match properly, on python 2.7? i'll fight unicode later.

the problem [\d+ ]?. matches 0 or 1 of digit, plus, space. other problem using spaces column separators , within fields without kind of quoting, works data. changed 2nd column had . , 3rd column grab non-space characters:

#!python3 # coding: utf-8 import re  data = '''\ 49 602 dskod smcx 262,59 1 131,30 1 1 131,30 49 602 dskod smcx 3 5 262,59 1 131,30 1 1 131,30 50 61-201 łóćźż 1 2 669,50 334,75 1 334,75 50 61-201 łóćźż 1 669,50 334,75 1 334,75 51 1-214 aÓŻĆÓds" 70,35 350,18 3 105,53 '''.splitlines()  regex = re.compile(r"(\d+) ([\d-]+) (.*?) ((?:\d{1,3})?(?:\ \d{3})*,\d{2}) ((?:\d{1,3})?(?:\ \d{3})*,\d{2}) (\d+) ((?:\d{1,3})?(?:\ \d{3})*,\d{2})", re.unicode) line in data:     print(regex.match(line).groups()) 

output:

('49', '602', 'dskod smcx', '262,59', '1 131,30', '1', '1 131,30') ('49', '602', 'dskod smcx 3', '5 262,59', '1 131,30', '1', '1 131,30') ('50', '61-201', 'łóćźż 1', '2 669,50', '334,75', '1', '334,75') ('50', '61-201', 'łóćźż', '1 669,50', '334,75', '1', '334,75') ('51', '1-214', 'aÓŻĆÓds"', '70,35', '350,18', '3', '105,53') 

Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

How to get the ip address of VM and use it to configure SSH connection dynamically in Ansible -

javascript - Get parameter of GET request -