python - Reading csv with pandas - dealing with imbalanced rows -

June 15, 2013

i have more 1 million rows, , there long text field making of rows imbalanced. causes rows have more columns header. fixed following:

read_csv('filename.csv', error_bad_lines=false)

the problem here appears there rows witch less columns header. problem (some fields shift.)

how can fix this? there way (i blame long text field) act 1 field?

edit after comment

field delimiter comma. when run df.dtypes fields 1 seems object, have int, , datetime fields, read objects pandas.

edit after comment 2

here header have in .csv id(int),textfield(string),id2(char),score(int),type(string),length(int),name(string),datetime(datetime),size(int),email(string)

the main problem textfield area. others cannot have , foul characers escaping csv syntax. textfield created users, can in unicode; emojis, non english chars funny quote etc.

the main problem textfield area. others cannot have , foul characers escaping csv syntax. textfield created users, can in unicode; emojis, non english chars funny quote etc.

the textfield should surrounded double quotes, , quote inside field has escaped quote.

since field can contain character, chances of fields multiline, explain why rows have less columns while other data seems valid.

so make sure parser supports, , set use multiline. work if fields quoted.

Search This Blog

Live one

python - Reading csv with pandas - dealing with imbalanced rows -

Comments

Post a Comment

Popular posts from this blog

authentication - Mongodb revoke acccess to connect test database -

r - Update two sets of radiobuttons reactively - shiny -

ios - Realm over CoreData should I use NSFetchedResultController or a Dictionary? -