hadoop - Replace multiple space with single space, Remove " and leading 0 using APACHE PIG -


need pig

a = load 'input.txt' (line:chararray); b = foreach generate flatten(tobag(*)); c = foreach b generate replace(($0, '\\s+', ' ')   

need on last line replace multiple space single space, remove " (quotes) , leading 00 using apache pig

note:- approach should not field specific there more 70 fields, basically, expecting replace or strstring or regex function can perfomr mentioned operations on line.

input.txt

00595, ab 000cdef      california "state,   00usa 00733, 0ds ds "arizona 00state, usa 

expected output

595, ab cdef califormia state, usa 733, ds ds arizona state, usa 

you can use replace function in pig cleaning , loading int remove leading zeros number.

a = load '/usr/pigfiles/pigo.txt' using pigstorage(',') (value: int, state: chararray, country: chararray);   b = foreach generate value,replace(replace(state,'  ', ' ' ),'\\"',''),  country;  dump b; 

output


Comments

Popular posts from this blog

php - Wordpress website dashboard page or post editor content is not showing but front end data is showing properly -

javascript - Get parameter of GET request -

javascript - Twitter Bootstrap - how to add some more margin between tooltip popup and element -